Re: [HACKERS] repository size differences

2010-09-22 Thread Aidan Van Dyk
On Tue, Sep 21, 2010 at 10:32 PM, Abhijit Menon-Sen  wrote:

> That's not it. I ran the same git gc command on my old repository, and
> it didn't make any difference to the size. (I didn't try with a larger
> window size, though.)

Probably lots of it has to do with the delta chains themselves.  The
old repository was an "incremental" conversion, so each new delta (as
it's added) has only (and all) "repository wide" objects to look at
for choosing a base. git has some limits and hueristics on deciding
"how far and wide" to look for the best delta base.

The cvs2* scripts are more direct, they first reference the files,
then commit graph, etc, so all revisions of a particular file are
added before moving on to the next.  This means that all previous
versions of a file are likely "hot" in the path git will look for the
best fit delta.  By changing the order of how the objects are added to
the git repository, it makes it easier for git to find the best/better
delta bases.

You can adjust the "delta window" git-repack uses, see the man page
for git-repack, and git-gc.  If you're willing to do a monster repack
on the old repository (using a *huge* window) you can likely get it
close in size.

a.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] repository size differences

2010-09-21 Thread Abhijit Menon-Sen
At 2010-09-21 17:53:22 -0400, t...@sss.pgh.pa.us wrote:
>
> > Does anyone know offhand why the sizes are so different?
> 
> Magnus did
>   git gc --aggressive --prune
> during the conversion.  I imagine it's the --aggressive that does it.

That's not it. I ran the same git gc command on my old repository, and
it didn't make any difference to the size. (I didn't try with a larger
window size, though.)

Oh well, it's probably just some problem with the older conversion, and
doesn't matter now. The number of commits ("git rev-list --all|wc -l")
is broadly similar (36848 old, 35978 new), as is the number of packed
objects (~383k old, ~387k new).

I'm certainly not complaining about git clone being twice as fast. :-)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] repository size differences

2010-09-21 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 21, 2010 at 5:53 PM, Tom Lane  wrote:
>> Magnus did
>>        git gc --aggressive --prune
>> during the conversion.  I imagine it's the --aggressive that does it.

> It's also possible that some of the history cleanup we did might have
> helped, although that's pure speculation on my part.

The converted repositories I was getting during testing (without any use
of git gc) were circa 300MB.  So it wasn't the CVS-side cleanups that
did it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] repository size differences

2010-09-21 Thread Robert Haas
On Tue, Sep 21, 2010 at 5:53 PM, Tom Lane  wrote:
> Abhijit Menon-Sen  writes:
>> My new clone of git://git.postgresql.org/git/postgresql.git is 196MB,
>> whereas my old clone (last synced around the beginning of September)
>> was 285MB.
>
>> Does anyone know offhand why the sizes are so different?
>
> Magnus did
>        git gc --aggressive --prune
> during the conversion.  I imagine it's the --aggressive that does it.

It's also possible that some of the history cleanup we did might have
helped, although that's pure speculation on my part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] repository size differences

2010-09-21 Thread Tom Lane
Abhijit Menon-Sen  writes:
> My new clone of git://git.postgresql.org/git/postgresql.git is 196MB,
> whereas my old clone (last synced around the beginning of September)
> was 285MB.

> Does anyone know offhand why the sizes are so different?

Magnus did
git gc --aggressive --prune
during the conversion.  I imagine it's the --aggressive that does it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] repository size differences

2010-09-21 Thread Abhijit Menon-Sen
Hi.

My new clone of git://git.postgresql.org/git/postgresql.git is 196MB,
whereas my old clone (last synced around the beginning of September)
was 285MB.

Does anyone know offhand why the sizes are so different?

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers