On Sat, Mar 26, 2016 at 3:47 AM, Zaro <[email protected]> wrote: > So I've been researching this and I've found that there is a > significant performance improvement after running git gc on this nova > repro. Below are my results. > > File sizes of repo as-is: > ~/nova.git.orig$ du -hsx * | sort -r | head -10 > 6.4G objects > 6.1M info > 4.0K config > 4.0K HEAD > 382M refs > 2.1M logs > 0B hooks > 0B description > 0B branches > > Note that the repro as-is has already been thru a 'git repack -afd'. > > > File sizes after running 'jgit gc': > ~/nova.git.test$ du -hsx * | sort -r | head -10 > 6.1M packed-refs > 6.1M info > 420M objects > 4.0K config > 4.0K HEAD > 2.1M logs > 0B refs > 0B hooks > 0B description > 0B branches > > The result is that the gc cleans up the objects (6.4G -> 420M) and > moves the loose ref objects from 'refs' dir to a 'packed-refs' file > (382M -> 6.1M). > > Note that I'm using jgit because that's what Gerrit would use to do > the 'gc'. The jgit version is 4.0.1.201506240215-r which is the one > that's packaged with our current version of Gerrit > (2.11.4-11-ga14450f) on review.o.o > > > Here I've tested the performance of the git clone, fetch and push > before and after running 'jgit gc': > > `git clone` > ------------ > before: > real 3m30.163s > user 0m2.020s > sys 3m15.087s > > after: > real 0m0.925s > user 0m0.406s > sys 0m0.621s > > > `git fetch origin stable/liberty` > --------------------------------- > before: > real 0m4.271s > user 0m0.701s > sys 0m2.949s > > after: > real 0m0.686s > user 0m0.348s > sys 0m0.307s > > > `git push origin HEAD:refs/for/master` > -------------------------------------- > before: > real 0m36.454s > user 0m5.346s > sys 0m27.598s > > after: > real 0m16.588s > user 0m11.731s > sys 0m3.218s > > Note: I pushed the exact same change for both scenarios. > > > Conclusion: > The results indicate that it would be very advantages to run 'git gc' > for both file size reduction and improved performance. Below are > additional resources that I've found on the internet that seems to > back up my results. > > > > references: > > This says that one-file-per-ref format both wastes storage and hurts > performance: https://git-scm.com/docs/git-pack-refs > > This outlines some of the benefits and drawbacks of packed-refs file: > https://www.mail-archive.com/git%40vger.kernel.org/msg65722.html > > Info on speeding up clones/fetches with pack bitmaps: > https://www.mail-archive.com/git%40vger.kernel.org/msg65571.html > > On Fri, Jan 8, 2016 at 12:13 PM, James E. Blair <[email protected]> wrote: >> Hi, >> >> With the new version of Gerrit offering built-in "git gc" capability, we >> looked at the current state of our git repo maintenance. We run "git >> repack -afd" weekly in an attempt to produce the smallest packfiles >> possible, but it does not prune loose objects, which seems to be the >> main thing "git gc" does that we are missing. >> >> Some (relatively) quick experimentation suggests that various >> combinations of "git gc", "git repack", "git prune", "git prune-packed" >> all have effects on the overall repo size, the number of pack files, and >> the number of loose objects. >> >> However, we don't just want to find the thing that makes the smallest >> repo size (that's easy: "git prune; git gc" -- 394M for nova; one >> packfile with all objects and one packed-refs file with all refs) >> because this repo is used as the basis of all of our mirrors and is >> accessed over several protocols. It's not immediately clear what the >> right optimization is for our situation -- we don't necessarily want to >> trade on-disk size for reduced network performance. Even the packing of >> refs isn't entirely straightforward -- while we haven't needed to for >> some time, we have, in the past removed refs. >> >> We're looking for a volunteer to really dig into this problem and >> thoroughly evaluate the implications of different ways of optimizing the >> repo. If you're interested, you can download a snapshot of the full >> nova repository from Gerrit (it is a point-in-time snapshot and will not >> be updated) at this URL: >> >> http://tarballs.openstack.org/ci/nova.git.tar.bz2 >> >> Please follow up this message if you are interested and with any >> findings. >> >> Thanks, >> >> Jim >> >> _______________________________________________ >> OpenStack-Infra mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > > _______________________________________________ > OpenStack-Infra mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Zaro, thanks for sharing test results. I have tested it on neutron project. Before 'git gc': neutron total size: 65M Files: ---------------------------------------- 4.0K .git/branches 4.0K .git/config 4.0K .git/description 4.0K .git/HEAD 44K .git/hooks 152K .git/index 8.0K .git/info 32K .git/logs 50M .git/objects 12K .git/packed-refs 28K .git/refs ---------------------------------------- After running 'git gc --aggressive': neutron total size: 47M 4.0K .git/branches 4.0K .git/config 4.0K .git/description 4.0K .git/HEAD 44K .git/hooks 152K .git/index 24K .git/info 32K .git/logs 32M .git/objects 12K .git/packed-refs 24K .git/refs Each command executed 3 times: --- git clone before gc -- 10.59s user 0.74s system 195% cpu 5.785 total 12.80s user 0.63s system 205% cpu 6.554 total 12.27s user 0.61s system 202% cpu 5.849 total --- git clone after gc--- 8.69s user 0.52s system 149% cpu 6.178 total 8.61s user 0.55s system 175% cpu 5.230 total 8.62s user 0.51s system 187% cpu 4.877 total --- git fetch origin stable/liberty before gc--- 0.05s user 0.04s system 4% cpu 1.850 total 0.05s user 0.04s system 4% cpu 1.899 total 0.04s user 0.05s system 4% cpu 1.840 total --- git fetch origin stable/liberty after gc --- 0.01s user 0.01s system 9% cpu 0.245 total 0.02s user 0.01s system 12% cpu 0.173 total 0.01s user 0.01s system 11% cpu 0.193 total --- git push origin HEAD:refs/for/master before gc --- 0.05s user 0.04s system 4% cpu 1.850 total 0.03s user 0.04s system 4% cpu 1.899 total 0.05s user 0.05s system 3% cpu 1.573 total --- git push origin HEAD:refs/for/master after gc --- 0.01s user 0.00s system 12% cpu 0.142 total 0.01s user 0.01s system 12% cpu 0.178 total 0.01s user 0.01s system 11% cpu 0.183 total Also done quick test on openstack infra project ( project-config ): Before gc: 97M project-config After gc --aggressive: 19M project-config --- git clone before gc -- 7.81s user 0.52s system 146% cpu 5.677 total 6.91s user 0.48s system 144% cpu 5.112 total 7.43s user 0.66s system 147% cpu 5.496 total --- git clone after gc -- 6.39s user 0.56s system 139% cpu 4.965 total 6.32s user 0.51s system 130% cpu 5.218 total 6.39s user 0.55s system 127% cpu 5.431 total Cheers, Arie Bregman _______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
