subject:"Re\: Why pack\+unpack\?"

Re: Why pack+unpack?

2005-07-26 Thread Linus Torvalds

On Tue, 26 Jul 2005, Jeff Garzik wrote:
> > 
> > Put another way: do you argue that X network transparency is a total waste
> > of time? You could certainly optimize X if you always made it be
> > local-machine only. Or you could make tons of special cases, and have X 
> > have separate code-paths for local clients and for remote clients, rather 
> > than just always opening a socket connection.
> 
> Poor example...   sure it opens a socket, but X certainly does have a 
> special case local path (mit shm), and they're adding more for 3D due 
> the massive amount of data involved in 3D.

.. and that's still a special case. Exactly like git does the "clone -l" 
special case.

> Well, I'm not overly concerned, mostly curious.  The pack+unpack step 
> (a) appears completely redundant and (b) is the step that takes the most 
> time here, for local pulls, after the diffstat.

It's not actually redundant. Some of the _compression_ may be, and you 
could see if you prefer a smaller delta window (use "--window=0" to 
git-pack-objects to totally disable delta compression), but in general you 
can't actually just link the files over like with "git clone", because 
that would create total chaos and a real mess if the other end was packed.

So "git pull" actually needs to copy one object at a time in order to have 
sensible semantics together with "git repack". Now, you could make that 
"one object at a time" thing have its own special cases ("if it's packed, 
extract it as a unpacked object in the destination, if it's unpacked, just 
link it if you can"), but it would just be pretty ugly.

If it ever gets to be a real performance problem, we can certainly fix it,
but in the meantime I _much_ prefer having one single path. I dislike the
rsync (and the http) paths immensely already, but at least I don't have to
use them..

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why pack+unpack?

2005-07-25 Thread Junio C Hamano

Linus Torvalds <[EMAIL PROTECTED]> writes:

> See? Trying to have one really solid code-path is not a waste of time. 

An alternative code path specialized for local case would not be
too bad.

First, finding the list of objects to copy.  You can use
alternate object pool to cover the upstream repository to pull
from, and the downstream repository to pull into (both local),
run rev-list --objects, giving it prefix '^' for all refs in the
downstream repository, and the upstream head SHA1 you are
pulling.  If the upstream head you are pulling is a tag, then
you may need to dereference it as well.

Among those objects, ones unpacked in the upstream can be
copied/linked to the downstream repository.

Handling packs involves a little bit of policy decision.  The
current pack/unpack way always unpacks, and to emulate it, we
can cat-file in the upstream object database, pipe that to
"hash-object -w" (after giving hash-object an option to read
from the standard input) to write in the downstream repository
unpacked.  Easier alternative is to just hardlink all the packs
from the upstream object database into the downstream object
database, and keep packed things packed.

Well, it starts to sound somewhat bad...



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why pack+unpack?

2005-07-25 Thread Jeff Garzik


Linus Torvalds wrote:
First, make sure you have a recent git, it does better at optimizing the 


I was using vanilla git, as of 10 minutes before I sent the email.  Top 
of tree is 154d3d2dd2656c23ea04e9d1c6dd4e576a7af6de.



Secondly, what's the problem? Sure, I could special-case the local case, 
but do you really want to have two _totally_ different code-paths? In 
other words, it's absolutely NOT a complete waste of time: it's very much 
a case of trying to have a unified architecture, and the fact that it 
spends a few seconds doing things in a way that is network-transparent is 
time well spent.


Put another way: do you argue that X network transparency is a total waste
of time? You could certainly optimize X if you always made it be
local-machine only. Or you could make tons of special cases, and have X 
have separate code-paths for local clients and for remote clients, rather 
than just always opening a socket connection.


Poor example...   sure it opens a socket, but X certainly does have a 
special case local path (mit shm), and they're adding more for 3D due 
the massive amount of data involved in 3D.




We do end up having a special code-path for "clone" (the "-l" flag), which
does need it, but I seriously doubt you need it for a local pull. The most 
expensive operation in a local pull tends to be (if the repositories are 
unpacked and cold-cache) just figuring out the objects to pull, not the 
packing/unpacking per se.


Well, I'm not overly concerned, mostly curious.  The pack+unpack step 
(a) appears completely redundant and (b) is the step that takes the most 
time here, for local pulls, after the diffstat.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why pack+unpack?

2005-07-25 Thread Linus Torvalds

On Tue, 26 Jul, Jeff Garzik wrote:
>
>   AFAICT this is
> just a complete waste of time.  Why does this occur?
>
> Packing 1394 objects
> Unpacking 1394 objects
>   100% (1394/1394) done
> 
> It doesn't seem to make any sense to perform work, then immediately undo
> that work, just for a local pull.

First, make sure you have a recent git, it does better at optimizing the 
objects, so there are fewer of them. Of course, the above could be a real 
pull of a a fair amount of work, but check that your git has this commit:

commit 4311d328fee11fbd80862e3c5de06a26a0e80046

Be more aggressive about marking trees uninteresting

because otherwise you sometimes get a fair number of objects just because
git-rev-list wasn't always being very careful, and took more objects than
it strictly needed.

Secondly, what's the problem? Sure, I could special-case the local case, 
but do you really want to have two _totally_ different code-paths? In 
other words, it's absolutely NOT a complete waste of time: it's very much 
a case of trying to have a unified architecture, and the fact that it 
spends a few seconds doing things in a way that is network-transparent is 
time well spent.

Put another way: do you argue that X network transparency is a total waste
of time? You could certainly optimize X if you always made it be
local-machine only. Or you could make tons of special cases, and have X 
have separate code-paths for local clients and for remote clients, rather 
than just always opening a socket connection.

See? Trying to have one really solid code-path is not a waste of time. 

We do end up having a special code-path for "clone" (the "-l" flag), which
does need it, but I seriously doubt you need it for a local pull. The most 
expensive operation in a local pull tends to be (if the repositories are 
unpacked and cold-cache) just figuring out the objects to pull, not the 
packing/unpacking per se.

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why pack+unpack?

Re: Why pack+unpack?

Re: Why pack+unpack?

Re: Why pack+unpack?

4 matches

Site Navigation

Mail list logo

Footer information