Re: Why pack+unpack?
On Tue, 26 Jul 2005, Jeff Garzik wrote: > > > > Put another way: do you argue that X network transparency is a total waste > > of time? You could certainly optimize X if you always made it be > > local-machine only. Or you could make tons of special cases, and have X > > have separate code-paths for local clients and for remote clients, rather > > than just always opening a socket connection. > > Poor example... sure it opens a socket, but X certainly does have a > special case local path (mit shm), and they're adding more for 3D due > the massive amount of data involved in 3D. .. and that's still a special case. Exactly like git does the "clone -l" special case. > Well, I'm not overly concerned, mostly curious. The pack+unpack step > (a) appears completely redundant and (b) is the step that takes the most > time here, for local pulls, after the diffstat. It's not actually redundant. Some of the _compression_ may be, and you could see if you prefer a smaller delta window (use "--window=0" to git-pack-objects to totally disable delta compression), but in general you can't actually just link the files over like with "git clone", because that would create total chaos and a real mess if the other end was packed. So "git pull" actually needs to copy one object at a time in order to have sensible semantics together with "git repack". Now, you could make that "one object at a time" thing have its own special cases ("if it's packed, extract it as a unpacked object in the destination, if it's unpacked, just link it if you can"), but it would just be pretty ugly. If it ever gets to be a real performance problem, we can certainly fix it, but in the meantime I _much_ prefer having one single path. I dislike the rsync (and the http) paths immensely already, but at least I don't have to use them.. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why pack+unpack?
Linus Torvalds <[EMAIL PROTECTED]> writes: > See? Trying to have one really solid code-path is not a waste of time. An alternative code path specialized for local case would not be too bad. First, finding the list of objects to copy. You can use alternate object pool to cover the upstream repository to pull from, and the downstream repository to pull into (both local), run rev-list --objects, giving it prefix '^' for all refs in the downstream repository, and the upstream head SHA1 you are pulling. If the upstream head you are pulling is a tag, then you may need to dereference it as well. Among those objects, ones unpacked in the upstream can be copied/linked to the downstream repository. Handling packs involves a little bit of policy decision. The current pack/unpack way always unpacks, and to emulate it, we can cat-file in the upstream object database, pipe that to "hash-object -w" (after giving hash-object an option to read from the standard input) to write in the downstream repository unpacked. Easier alternative is to just hardlink all the packs from the upstream object database into the downstream object database, and keep packed things packed. Well, it starts to sound somewhat bad... - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why pack+unpack?
Linus Torvalds wrote: First, make sure you have a recent git, it does better at optimizing the I was using vanilla git, as of 10 minutes before I sent the email. Top of tree is 154d3d2dd2656c23ea04e9d1c6dd4e576a7af6de. Secondly, what's the problem? Sure, I could special-case the local case, but do you really want to have two _totally_ different code-paths? In other words, it's absolutely NOT a complete waste of time: it's very much a case of trying to have a unified architecture, and the fact that it spends a few seconds doing things in a way that is network-transparent is time well spent. Put another way: do you argue that X network transparency is a total waste of time? You could certainly optimize X if you always made it be local-machine only. Or you could make tons of special cases, and have X have separate code-paths for local clients and for remote clients, rather than just always opening a socket connection. Poor example... sure it opens a socket, but X certainly does have a special case local path (mit shm), and they're adding more for 3D due the massive amount of data involved in 3D. We do end up having a special code-path for "clone" (the "-l" flag), which does need it, but I seriously doubt you need it for a local pull. The most expensive operation in a local pull tends to be (if the repositories are unpacked and cold-cache) just figuring out the objects to pull, not the packing/unpacking per se. Well, I'm not overly concerned, mostly curious. The pack+unpack step (a) appears completely redundant and (b) is the step that takes the most time here, for local pulls, after the diffstat. Jeff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why pack+unpack?
On Tue, 26 Jul, Jeff Garzik wrote: > > AFAICT this is > just a complete waste of time. Why does this occur? > > Packing 1394 objects > Unpacking 1394 objects > 100% (1394/1394) done > > It doesn't seem to make any sense to perform work, then immediately undo > that work, just for a local pull. First, make sure you have a recent git, it does better at optimizing the objects, so there are fewer of them. Of course, the above could be a real pull of a a fair amount of work, but check that your git has this commit: commit 4311d328fee11fbd80862e3c5de06a26a0e80046 Be more aggressive about marking trees uninteresting because otherwise you sometimes get a fair number of objects just because git-rev-list wasn't always being very careful, and took more objects than it strictly needed. Secondly, what's the problem? Sure, I could special-case the local case, but do you really want to have two _totally_ different code-paths? In other words, it's absolutely NOT a complete waste of time: it's very much a case of trying to have a unified architecture, and the fact that it spends a few seconds doing things in a way that is network-transparent is time well spent. Put another way: do you argue that X network transparency is a total waste of time? You could certainly optimize X if you always made it be local-machine only. Or you could make tons of special cases, and have X have separate code-paths for local clients and for remote clients, rather than just always opening a socket connection. See? Trying to have one really solid code-path is not a waste of time. We do end up having a special code-path for "clone" (the "-l" flag), which does need it, but I seriously doubt you need it for a local pull. The most expensive operation in a local pull tends to be (if the repositories are unpacked and cold-cache) just figuring out the objects to pull, not the packing/unpacking per se. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html