On Fri, 25 Sep 2020 12:10:22 +0200, Zbigniew Jędrzejewski-Szmek wrote:
> I'm missing some good statistics.

I have 1.6TB of statistics, ask me anything. It is calculated by my scripts:
        https://git.jankratochvil.net/?p=massrebuild.git;a=tree
        git clone git://git.jankratochvil.net/massrebuild


> > * DWZ advantage: On the whole Fedora distro it saves 3.3% (5GB of the
> > 157GB distribution size)
> 
> What is this comparing? Is this the size of binary rpm or the
> installation-on-disk footprint?

I am usually talking about *-debuginfo.rpm size.

Another possible number is separate *.debug files download (DWZ is then 6%
bigger than -fdebug-types-section due to the associated DWZ common files).


> I would love to see a comparison of numbers for three things:
> - raw debuginfo without dwz or -fdebug-types-section

Oops, I do not have this number, I can run new massrebuild, it takes about
4 days (depending on availability of beefy machines).


> - debuginfo with dwz (current approach)

 rpm size:  35186079102
disk size: 177913332940

> - debuginfo with -fdebug-types-section

 rpm size:  37570327765
disk size: 214927514757

 = DWZ rpm size is smaller by 6.78%
 = DWZ on-disk size is smaller by 20.8%

It is based on 22080 Fedora Rawhide packages rebuilt on 2020-08-24.


> For each of those three categories both measures (rpm size and on-disk size)
> would be useful.

Another big variable is F-34 should be hopefully in DWARF-5 (F-33 is DWARF-4)
which will change the numbers a bit (unaware which way). Currently DWZ is not
yet ported to DWARF-5 so there is no way to compare it. Also DWZ does not plan
to support LLVM DWARF-5 so that will also skew such comparison even after its
port.

For on-disk size it will all get different by F-33 btrfs compression again
which should reduce the size by about 50% (which makes any
DWZ/-fdebug-types-section differences pointless). It will obviously make the
on-disk size difference smaller (than current 20.8%).

And finally on-disk size depends a lot on which *-debuginfo packages you have
installed which varies a lot when stddev is twice the average DWZ saving.


> Could you provide numbers like this for some subset of packages
> (20-30 packages that produce debuginfo would be enough to get a good measure).

Problem of these numbers is they depend too much on the chosen set of rpms
so 20-30 packages do not say anything.
DWZ against -fdebug-types-section saves for whole Rawhide 6.35% size total.
When averaged for each package it is 5.44% (that means DWZ saves more on
bigger-than-median packages) but stddev of the saving is +/-11%.
Packages where -fdebug-types-section is smallest (by difference in bytes):
        70.11: julia-1.5.0-1.fc33.src.rpm -fdebug-types-section size=866936043 
DWZ size=1236511762
        74.43: nodejs-14.7.0-1.fc33.src.rpm -fdebug-types-section 
size=921485027 DWZ size=1238008099
        77.84: mozjs78-78.1.0-1.fc33.src.rpm -fdebug-types-section 
size=623280098 DWZ size=800743010
Packages where DWZ is smallest (by difference in bytes):
        508.93: kea-1.7.9-3.fc33.src.rpm -fdebug-types-section size=1379013840 
DWZ size=270963319
        143.07: paraview-5.8.1-1.fc33.src.rpm -fdebug-types-section 
size=11462175974 DWZ size=8011695061
        196.49: hpx-1.4.1-4.fc33.src.rpm -fdebug-types-section size=10981369919 
DWZ size=5588742102
All these sizes are for *-debuginfo.rpm.

The sizes depend strongly on the chosen subset of packages:
For example for ELN-like (*) distro the saving is not 6.35% but only 0.28%.
For Fedora 32 packages on my personal machines it is not 6.35% but 0.72%.

(*) I did use Fedora Rawhide subset for packages present in CentOS-8.2.

Also there is an opportunity for new non-DWZ optimization (orthogonal to
DWZ/-fdebug-types-section) which can save 5.96% of *-debuginfo.rpm with
clang-only draft implementation which requires no DWARF consumers modification
and it is easier to implement than to upstream+maintain the DWZ support for
LLDB.


> I find that 3.3% number strange — it would mean that dwz is
> essentially useless, but maybe I'm misunderstanding how it's defined.

F-32 x86_64 has 157GB total, debug/ is 82GB (6GB is *-debugsource):
        6.35% * (82-6) / 157 = 3.07%
approx., the 3.3% was calculated with more exact distro size numbers.


> I think we need to get some better understanding what the effects of
> various approaches are before discussing which to pick.

Thanks for this discussion.


Jan
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to