Re: [fossil-users] Repository size - Fossil v. Git

2017-11-28 Thread Joerg Sonnenberger
On Mon, Nov 27, 2017 at 08:53:06PM -0500, David Mason wrote:
> Does it stay that size with moderate activity, or does it start growing
> significantly?

Incremental fastimport isn't that bad, but occassional repacks would
still help. Of course, github doesn't allow triggering those remotely
and the only option is to delete the repo and repush from scratch.
...which in turn breaks any forks. Basically, if you convert a larger
existing repo with fastimport, remember to repack aggressively before
pushing to github.

> Does the pack format slow it down, or speed it up?

Neither compared to fossil. I don't think either storage format is
optimized for reducing seeking (which matters for spinning disks) and
I'm not sure how well sqlite benefits from mmap. Everything else is
mostly a function of the delta chain length.

> Given that the Git version only has 93% of what the Fossil repo has, I'd
> say Fossil is doing quite well.

The sqlite tree is also both flat and shallow, which is one of the
biggest costs for the NetBSD repos.

Joerg
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Repository size - Fossil v. Git

2017-11-27 Thread David Mason
Does it stay that size with moderate activity, or does it start growing
significantly? Does the pack format slow it down, or speed it up?

Given that the Git version only has 93% of what the Fossil repo has, I'd
say Fossil is doing quite well.

../Dave

On 27 November 2017 at 16:16, Joerg Sonnenberger  wrote:

> On Mon, Nov 27, 2017 at 03:28:37PM -0500, Richard Hipp wrote:
> > I didn't try any repacking.  I merely ran "git clone" then looked at
> > the packfile in .git/objects/pack.  You would think that the server
> > would want to do an aggressive repack before sending the packfile
> > across a clone, to save bandwidth.  But maybe GitHub values CPU cycles
> > more than bandwidth...
>
> git import is writing pretty dumb packs. Lots of redundancy in it,
> so it's not really that surprising. It's kind of similar to the effect
> of avoiding "fossil rebuild --compress" or Mercurial's generic delta.
> Cloning IIRC will mostly use the deltas as recorded, it doesn't
> recompute them. GitHub in generally naturally avoids CPU load as much as
> possible, since it is one of the more expensive parts of running in the
> cloud.
>
> > Your git-foo is much greater than mine, Joerg.  Can you please clone
> > https://github.com/mackyle/sqlite.git and see if you can get the
> > packfile to come out smaller?
>
> git repack -A -d --depth=500 --window-memory 4g -f -F
> gives me around 43MB for .git.
>
> Joerg
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Repository size - Fossil v. Git

2017-11-27 Thread Joerg Sonnenberger
On Mon, Nov 27, 2017 at 03:28:37PM -0500, Richard Hipp wrote:
> I didn't try any repacking.  I merely ran "git clone" then looked at
> the packfile in .git/objects/pack.  You would think that the server
> would want to do an aggressive repack before sending the packfile
> across a clone, to save bandwidth.  But maybe GitHub values CPU cycles
> more than bandwidth...

git import is writing pretty dumb packs. Lots of redundancy in it,
so it's not really that surprising. It's kind of similar to the effect
of avoiding "fossil rebuild --compress" or Mercurial's generic delta.
Cloning IIRC will mostly use the deltas as recorded, it doesn't
recompute them. GitHub in generally naturally avoids CPU load as much as
possible, since it is one of the more expensive parts of running in the
cloud.

> Your git-foo is much greater than mine, Joerg.  Can you please clone
> https://github.com/mackyle/sqlite.git and see if you can get the
> packfile to come out smaller?

git repack -A -d --depth=500 --window-memory 4g -f -F
gives me around 43MB for .git.

Joerg
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Repository size - Fossil v. Git

2017-11-27 Thread Richard Hipp
On 11/27/17, Joerg Sonnenberger  wrote:
> On Mon, Nov 27, 2017 at 02:28:37PM -0500, Richard Hipp wrote:
>> TL;DR:  A Git packfile for SQLite is about 52% larger than the
>> equivalent content in a Fossil repository.
>
> Did you run repack with aggresive settings? I.e. with -A -d -f and large
> --depth and --window-size settings? Especially if the original migration
> wasn't done well, the pack files are often quite redundant.
>
> Your numbers really don't match my experience, i.e. what I see is about
> a factor of 2 to 2.5 larger Fossil repos.
>

2x larger for Fossil is about what I would expect too.  The Git file
formats are crazy-aggressive at avoiding any wasted bytes (thus making
them hard to parse and use and especially hard to extend for things
like SHA3).

I didn't try any repacking.  I merely ran "git clone" then looked at
the packfile in .git/objects/pack.  You would think that the server
would want to do an aggressive repack before sending the packfile
across a clone, to save bandwidth.  But maybe GitHub values CPU cycles
more than bandwidth...

Your git-foo is much greater than mine, Joerg.  Can you please clone
https://github.com/mackyle/sqlite.git and see if you can get the
packfile to come out smaller?

-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Repository size - Fossil v. Git

2017-11-27 Thread Joerg Sonnenberger
On Mon, Nov 27, 2017 at 02:28:37PM -0500, Richard Hipp wrote:
> TL;DR:  A Git packfile for SQLite is about 52% larger than the
> equivalent content in a Fossil repository.

Did you run repack with aggresive settings? I.e. with -A -d -f and large
--depth and --window-size settings? Especially if the original migration
wasn't done well, the pack files are often quite redundant.

Your numbers really don't match my experience, i.e. what I see is about
a factor of 2 to 2.5 larger Fossil repos.

Joerg
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Repository size - Fossil v. Git

2017-11-27 Thread Richard Hipp
On 11/27/17, Richard Hipp  wrote:
> TL;DR:  A Git packfile for SQLite is about 52% larger than the
> equivalent content in a Fossil repository.

It gets worse (for Git):

The Git repo I cloned only contains the master branch - 18336
check-ins out of the 19715 check-ins found in the Fossil repo.


>
> I downloaded a copy of the Git packfile from mackyle's mirror of
> SQLite on GitHub (https://github.com/mackyle/sqlite).  Git uses a
> tightly coded binary implementation for packfiles, so I was expecting
> that a Git packfile would be significantly smaller than the equivalent
> Fossil repo.
>
> I was wrong.
>
> The Git packfile comes in a 86.8MB and the entire Fossil repo is only
> 68.8MB.  This is in spite of the fact that the Fossil repo contains a
> lot of supplemental information (ex: indexes) used to make it faster
> as well as additional content (wiki, tickets) that Git does not
> support.
>
> The equivalent of a Git packfile in Fossil would be the contents of
> the BLOB and DELTA tables without the UNIQUE index on the BLOB.UUID
> field.  Comparing the packfile against just the unindexed BLOB table
> and the DELTA table, I find that the packfile is 52% larger.
>
>   Git packfile:  86.8MB
>   Fossil content tables:  57.1MB
>
> I do not know why this is.  I have put almost no effort toward
> optimizing Fossil repositories for size, whereas metrics like
> performance and size seem to be driving forces behind Git.
> --
> D. Richard Hipp
> d...@sqlite.org
>


-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users