On 10/5/2017 6:00 AM, Jeff King wrote:
On Thu, Oct 05, 2017 at 06:48:10PM +0900, Junio C Hamano wrote:

Jeff King <p...@peff.net> writes:

This is weirdly specific. Can we accomplish the same thing with existing
tools?

E.g., could:

   git cat-file --batch-all-objects --batch-check='%(objectname)' |
   shuffle |
   head -n 100

do the same thing?

I know that "shuffle" isn't available everywhere, but I'd much rather
see us fill in portability gaps in a general way, rather than
introducing one-shot C code that needs to be maintained (and you
wouldn't _think_ that t/helper programs need much maintenance, but try
perusing "git log t/helper" output; they have to adapt to the same
tree-wide changes as the rest of the code).
I was thinking about this a bit more, and came to the conclusion
that "sort -R" and "shuf" are wrong tools to use.  We would want to
measure with something close to real world workload.  for example,
letting

        git rev-list --all --objects

produce the listof objects in traversal order (i.e. this is very
similar to the order in which "git log -p" needs to access the
objects) and chomping at the number of sample objects you need in
your test would give you such a list.
Actually, I'd just as soon see timings for "git log --format=%h" or "git
log --raw", as opposed to patches 1 and 2.

You won't see a 90% speedup there, but you will see the actual
improvement that real-world users are going to experience, which is way
more important, IMHO.

-Peff
Thanks for thinking hard about this.

For some real-user context: Some engineers using Git for the Windows repo were seeing extremely slow commands, such as 'fetch' or 'commit', and when we took a trace we saw most of the time spinning in this abbreviation code. Our workaround so far has been to set core.abbrev=40.

I'll run some perf numbers for these commands you recommend, and also see if I can replicate some of the pain points that triggered this change using the Linux repo.

Thanks,
-Stolee

Reply via email to