Rich Freeman <ri...@gentoo.org> wrote:
>
> Clearly it doesn't increase by a factor of 1 every year

The yearly increase of the factor is rather precisely 1:
According to current data, it is .95, see below.
With xz compression for squashfs, it is even 1.4!

(Note: increase _of_ the factor, not _by_ the factor, of course;
we are speaking about a linear increase, not an exponential one.)

More precisely: If in both cases you extremeley optimize for space
(details see below) then a change from rsync to git (non-shallow)
costs you

a) now: the factor 2.6 of needed disk space

b) in future for every year this factor is increased
by the summand 1.4. For example, in 2.5 years you will need roughly
2.6 + (1.4 * 2.5) = 6.1 times the disk space than for rsync.
After 2.5 more years, the factor will be more than 10.

For a) I assumed that in both cases the current repository is kept
compressed with squashfs (xz). This first factor will be much
larger, of course, if you omit squashfs when you switch to git.
(You must take measurements to keep the checked-out repository separate:
you cannot use standard emerge --sync to get this optimization.)

For both numbers, I even optimized the .git compression by
executing repeatedly
        git prune; git repack -a -d; git gc --agressive
which for the historical repository took several hours;
thus, unless you use a cron-job, this is not realistic.
Without this optimization, both numbers would be even larger.

Here are the plain data I used for the calculation:

1. RSYNC =    84,062,208
   (rsync gentoo repository, compressed with squashfs (-comp xz).)

2. GIT   =   136,322,616
   (Current .git data, without checked-out tree;
   compression optimized by the time-costly commands above.)

3. FULL  = 1,923,685,435
   (.git data as in 2, but with history added)

4. YEARS = 15
   (length of the historical data: first checkin was June 2000;
   change to git was IIRC somewhere in middle 2015).

So the number from a) is

 size with git      $GIT + $RSYNC
 --------------- =  ------------- ~ 2.6
 size with rysnc       $RSYNC

The number from b) is

 size of history increase per year    ($FULL - $GIT) / $YEARS
 --------------------------------- = ------------------------ ~ 1.4
         size with rsync                      $RSYNC

In the previus postings, I was assuming the much faster squashfs
compression -comp lz4 -Xhc instead of -comp xz. In this case,
the number from 1 changes to

   RSYNC =    125784064

which leads to the factor .95 ~ 1 for b) which I mentioned in the
beginning.


Reply via email to