On Thu, Sep 12, 2013 at 12:35:32PM +0200, Josef Wolf wrote:

> I'm not sure I understand correctly. I see that bitmaps can be used to
> implement set operations. But how comes that walking the graph requires a lot
> of CPU? Isn't it O(n)?

Yes and no. Your "n" there is the entirety of history. Whereas a simple
"git push" generally only has to look at the recent history. So even
though you are looking at each commit and tree only once, it's still a
large number of them (and each one needs to be pulled off of the disk,
decompressed, and reconstructed from deltas).

Secondly, the graph traversal ends up seeing the same sha1s over and
over again in tree entries (because most entries in the tree don't
change from commit to commit). We spend a non-trivial amount of time
looking those up in a hash table.

Just try "git rev-list --objects --all" in your favorite repository to
get a sense. It takes something like 30 seconds in the kernel repo. You
would probably not want to add 30 seconds of CPU time to a trivial push.

> Those bitmaps would be stored in the git metadata? Is it worth it? Storing a
> bitmap for every commit just to be used once-in-a-while seems to be a pretty
> big overhead to me. Not to mention the interoperability problems you mentioned
> below.

There are tricks to make them smaller (run-length compression,
bitmapping a subset of commits and traversing to the nearest one,
storing bitmaps as deltas against nearby bitmaps). And how often it is
used depends on your git workload. For a repository serving git clones
and fetches, it speeds up every operation.

Try starting a clone of:

  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

versus

  git://github.com/torvalds/linux.git

and see which one starts sending you data more quickly.

> Sounds like you're already almost done and don't really need help
> anymore. Just out of curiosity, I'd be interested in a pointer anyway
> ;-)

Shawn gave a talk on JGit here:

  
http://www.eclipsecon.org/2013/sites/eclipsecon.org.2013/files/Scaling%20Up%20JGit%20-%20EclipseCon%202013.pdf

and the scrapped patches for git are here:

  http://article.gmane.org/gmane.comp.version-control.git/228918

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to