On Fri, Mar 23, 2018 at 01:28:12AM +0000, Ramsay Jones wrote:

> > Of the used heap after your patches:
> > 
> >  - ~40% of that is from packlist_alloc()
> >  - ~17% goes to "struct object"
> >  - ~10% for the object.c hash table to store all the "struct object"
> >  - ~7% goes to the delta cache
> >  - ~7% goes to the pack revindex (actually, there's a duplicate 7%
> >        there, too; I think our peak is when we're sorting the revindex
> >        and have to keep two copies in memory at once)
> 
> which begs the question, how much slower would it be if we
> replaced the radix-sort with an in-place sort (e.g. heapsort).
> 
> I hacked up the patch below, just for fun. I don't have any
> large repos (or enough disk space) to do any meaningful perf
> tests, but I did at least compile it and it passes the test-suite.
> (That is no guarantee that I haven't introduced bugs, of course!)

It might have been easier to just revert 8b8dfd5132 (pack-revindex:
radix-sort the revindex, 2013-07-11). It even includes some performance
numbers. :)

In short, no, I don't think we want to go back to a comparison-sort. The
radix sort back then was around 4 times faster for linux.git. And that
was when there were half as many objects in the repository, so the radix
sort should continue to improve as the repo size grows.

The absolute time savings aren't huge for something as bulky as a
repack, so it's less exciting in this context. But it's also not that
much memory (7% of the peak here, but as I showed elsewhere, if we can
stop holding all of the "struct object" memory once we're done with it,
then this revindex stuff doesn't even factor into the peak).

-Peff

Reply via email to