On 6/7/2018 10:45 AM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Jun 07 2018, Derrick Stolee wrote:

To test the performance in this situation, I created a
script that organizes the Linux repository in a similar
fashion. I split the commit history into 50 parts by
creating branches on every 10,000 commits of the first-
parent history. Then, `git rev-list --objects A ^B`
provides the list of objects reachable from A but not B,
so I could send that to `git pack-objects` to create
these "time-based" packfiles. With these 50 packfiles
(deleting the old one from my fresh clone, and deleting
all tags as they were no longer on-disk) I could then
test 'git rev-list --objects HEAD^{tree}' and see:

         Before: 0.17s
         After:  0.13s
         % Diff: -23.5%

By adding logic to count hits and misses to bsearch_pack,
I was able to see that the command above calls that
method 266,930 times with a hit rate of 33%. The MIDX
has the same number of calls with a 100% hit rate.
Do you have the script you used for this? It would be very interesting
as something we could stick in t/perf/ to test this use-case in the
future.

How does this & the numbers below compare to just a naïve
--max-pack-size=<similar size> on linux.git?

Is it possible for you to tar this test repo up and share it as a
one-off? I've been polishing the core.validateAbbrev series I have, and
it would be interesting to compare some of the (abbrev) numbers.

Here is what I used. You will want to adjust your constants for whatever repo you are using. This is for the Linux kernel which has a first-parent history of ~50,000 commits. It also leaves a bunch of extra files around, so it is nowhere near incorporating into the code.

#!/bin/bash

for i in `seq 1 50`
do
        ORDER=$((51 - $i))
        NUM_BACK=$((1000 * ($i - 1)))
        echo creating batch/$ORDER
        git branch -f batch/$ORDER HEAD~$NUM_BACK
        echo batch/$ORDER
        git rev-parse batch/$ORDER
done

lastbranch=""
for i in `seq 1 50`
do
        branch=batch/$i
        if [$lastbranch -eq ""]
        then
                echo "$branch"
                git rev-list --objects $branch | sed 's/ .*//' >objects-$i.txt
        else
                echo "$lastbranch"
                echo "$branch"
                git rev-list --objects $branch ^$lastbranch | sed 's/ .*//' >objects-$i.txt
        fi

        git pack-objects --no-reuse-delta .git/objects/pack/branch-split2 <objects-$i.txt
        lastbranch=$branch
done


for tag in `git tag --list`
do
        git tag -d $tag
done

rm -rf .git/objects/pack/pack-*
git midx write

Reply via email to