Re: [fossil-users] Repository size - Fossil v. Git
On Mon, Nov 27, 2017 at 08:53:06PM -0500, David Mason wrote: > Does it stay that size with moderate activity, or does it start growing > significantly? Incremental fastimport isn't that bad, but occassional repacks would still help. Of course, github doesn't allow triggering those remotely and the only option is to delete the repo and repush from scratch. ...which in turn breaks any forks. Basically, if you convert a larger existing repo with fastimport, remember to repack aggressively before pushing to github. > Does the pack format slow it down, or speed it up? Neither compared to fossil. I don't think either storage format is optimized for reducing seeking (which matters for spinning disks) and I'm not sure how well sqlite benefits from mmap. Everything else is mostly a function of the delta chain length. > Given that the Git version only has 93% of what the Fossil repo has, I'd > say Fossil is doing quite well. The sqlite tree is also both flat and shallow, which is one of the biggest costs for the NetBSD repos. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Repository size - Fossil v. Git
Does it stay that size with moderate activity, or does it start growing significantly? Does the pack format slow it down, or speed it up? Given that the Git version only has 93% of what the Fossil repo has, I'd say Fossil is doing quite well. ../Dave On 27 November 2017 at 16:16, Joerg Sonnenberger wrote: > On Mon, Nov 27, 2017 at 03:28:37PM -0500, Richard Hipp wrote: > > I didn't try any repacking. I merely ran "git clone" then looked at > > the packfile in .git/objects/pack. You would think that the server > > would want to do an aggressive repack before sending the packfile > > across a clone, to save bandwidth. But maybe GitHub values CPU cycles > > more than bandwidth... > > git import is writing pretty dumb packs. Lots of redundancy in it, > so it's not really that surprising. It's kind of similar to the effect > of avoiding "fossil rebuild --compress" or Mercurial's generic delta. > Cloning IIRC will mostly use the deltas as recorded, it doesn't > recompute them. GitHub in generally naturally avoids CPU load as much as > possible, since it is one of the more expensive parts of running in the > cloud. > > > Your git-foo is much greater than mine, Joerg. Can you please clone > > https://github.com/mackyle/sqlite.git and see if you can get the > > packfile to come out smaller? > > git repack -A -d --depth=500 --window-memory 4g -f -F > gives me around 43MB for .git. > > Joerg > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Repository size - Fossil v. Git
On Mon, Nov 27, 2017 at 03:28:37PM -0500, Richard Hipp wrote: > I didn't try any repacking. I merely ran "git clone" then looked at > the packfile in .git/objects/pack. You would think that the server > would want to do an aggressive repack before sending the packfile > across a clone, to save bandwidth. But maybe GitHub values CPU cycles > more than bandwidth... git import is writing pretty dumb packs. Lots of redundancy in it, so it's not really that surprising. It's kind of similar to the effect of avoiding "fossil rebuild --compress" or Mercurial's generic delta. Cloning IIRC will mostly use the deltas as recorded, it doesn't recompute them. GitHub in generally naturally avoids CPU load as much as possible, since it is one of the more expensive parts of running in the cloud. > Your git-foo is much greater than mine, Joerg. Can you please clone > https://github.com/mackyle/sqlite.git and see if you can get the > packfile to come out smaller? git repack -A -d --depth=500 --window-memory 4g -f -F gives me around 43MB for .git. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Repository size - Fossil v. Git
On 11/27/17, Joerg Sonnenberger wrote: > On Mon, Nov 27, 2017 at 02:28:37PM -0500, Richard Hipp wrote: >> TL;DR: A Git packfile for SQLite is about 52% larger than the >> equivalent content in a Fossil repository. > > Did you run repack with aggresive settings? I.e. with -A -d -f and large > --depth and --window-size settings? Especially if the original migration > wasn't done well, the pack files are often quite redundant. > > Your numbers really don't match my experience, i.e. what I see is about > a factor of 2 to 2.5 larger Fossil repos. > 2x larger for Fossil is about what I would expect too. The Git file formats are crazy-aggressive at avoiding any wasted bytes (thus making them hard to parse and use and especially hard to extend for things like SHA3). I didn't try any repacking. I merely ran "git clone" then looked at the packfile in .git/objects/pack. You would think that the server would want to do an aggressive repack before sending the packfile across a clone, to save bandwidth. But maybe GitHub values CPU cycles more than bandwidth... Your git-foo is much greater than mine, Joerg. Can you please clone https://github.com/mackyle/sqlite.git and see if you can get the packfile to come out smaller? -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Repository size - Fossil v. Git
On Mon, Nov 27, 2017 at 02:28:37PM -0500, Richard Hipp wrote: > TL;DR: A Git packfile for SQLite is about 52% larger than the > equivalent content in a Fossil repository. Did you run repack with aggresive settings? I.e. with -A -d -f and large --depth and --window-size settings? Especially if the original migration wasn't done well, the pack files are often quite redundant. Your numbers really don't match my experience, i.e. what I see is about a factor of 2 to 2.5 larger Fossil repos. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Repository size - Fossil v. Git
On 11/27/17, Richard Hipp wrote: > TL;DR: A Git packfile for SQLite is about 52% larger than the > equivalent content in a Fossil repository. It gets worse (for Git): The Git repo I cloned only contains the master branch - 18336 check-ins out of the 19715 check-ins found in the Fossil repo. > > I downloaded a copy of the Git packfile from mackyle's mirror of > SQLite on GitHub (https://github.com/mackyle/sqlite). Git uses a > tightly coded binary implementation for packfiles, so I was expecting > that a Git packfile would be significantly smaller than the equivalent > Fossil repo. > > I was wrong. > > The Git packfile comes in a 86.8MB and the entire Fossil repo is only > 68.8MB. This is in spite of the fact that the Fossil repo contains a > lot of supplemental information (ex: indexes) used to make it faster > as well as additional content (wiki, tickets) that Git does not > support. > > The equivalent of a Git packfile in Fossil would be the contents of > the BLOB and DELTA tables without the UNIQUE index on the BLOB.UUID > field. Comparing the packfile against just the unindexed BLOB table > and the DELTA table, I find that the packfile is 52% larger. > > Git packfile: 86.8MB > Fossil content tables: 57.1MB > > I do not know why this is. I have put almost no effort toward > optimizing Fossil repositories for size, whereas metrics like > performance and size seem to be driving forces behind Git. > -- > D. Richard Hipp > d...@sqlite.org > -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] Repository size - Fossil v. Git
TL;DR: A Git packfile for SQLite is about 52% larger than the equivalent content in a Fossil repository. I downloaded a copy of the Git packfile from mackyle's mirror of SQLite on GitHub (https://github.com/mackyle/sqlite). Git uses a tightly coded binary implementation for packfiles, so I was expecting that a Git packfile would be significantly smaller than the equivalent Fossil repo. I was wrong. The Git packfile comes in a 86.8MB and the entire Fossil repo is only 68.8MB. This is in spite of the fact that the Fossil repo contains a lot of supplemental information (ex: indexes) used to make it faster as well as additional content (wiki, tickets) that Git does not support. The equivalent of a Git packfile in Fossil would be the contents of the BLOB and DELTA tables without the UNIQUE index on the BLOB.UUID field. Comparing the packfile against just the unindexed BLOB table and the DELTA table, I find that the packfile is 52% larger. Git packfile: 86.8MB Fossil content tables: 57.1MB I do not know why this is. I have put almost no effort toward optimizing Fossil repositories for size, whereas metrics like performance and size seem to be driving forces behind Git. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users