Zaro, Thank you for the research, seems like we should definitely run gc against nova repo.
Best regards, Boris Pavlovic On Fri, Mar 25, 2016 at 5:47 PM, Zaro <[email protected]> wrote: > So I've been researching this and I've found that there is a > significant performance improvement after running git gc on this nova > repro. Below are my results. > > File sizes of repo as-is: > ~/nova.git.orig$ du -hsx * | sort -r | head -10 > 6.4G objects > 6.1M info > 4.0K config > 4.0K HEAD > 382M refs > 2.1M logs > 0B hooks > 0B description > 0B branches > > Note that the repro as-is has already been thru a 'git repack -afd'. > > > File sizes after running 'jgit gc': > ~/nova.git.test$ du -hsx * | sort -r | head -10 > 6.1M packed-refs > 6.1M info > 420M objects > 4.0K config > 4.0K HEAD > 2.1M logs > 0B refs > 0B hooks > 0B description > 0B branches > > The result is that the gc cleans up the objects (6.4G -> 420M) and > moves the loose ref objects from 'refs' dir to a 'packed-refs' file > (382M -> 6.1M). > > Note that I'm using jgit because that's what Gerrit would use to do > the 'gc'. The jgit version is 4.0.1.201506240215-r which is the one > that's packaged with our current version of Gerrit > (2.11.4-11-ga14450f) on review.o.o > > > Here I've tested the performance of the git clone, fetch and push > before and after running 'jgit gc': > > `git clone` > ------------ > before: > real 3m30.163s > user 0m2.020s > sys 3m15.087s > > after: > real 0m0.925s > user 0m0.406s > sys 0m0.621s > > > `git fetch origin stable/liberty` > --------------------------------- > before: > real 0m4.271s > user 0m0.701s > sys 0m2.949s > > after: > real 0m0.686s > user 0m0.348s > sys 0m0.307s > > > `git push origin HEAD:refs/for/master` > -------------------------------------- > before: > real 0m36.454s > user 0m5.346s > sys 0m27.598s > > after: > real 0m16.588s > user 0m11.731s > sys 0m3.218s > > Note: I pushed the exact same change for both scenarios. > > > Conclusion: > The results indicate that it would be very advantages to run 'git gc' > for both file size reduction and improved performance. Below are > additional resources that I've found on the internet that seems to > back up my results. > > > > references: > > This says that one-file-per-ref format both wastes storage and hurts > performance: https://git-scm.com/docs/git-pack-refs > > This outlines some of the benefits and drawbacks of packed-refs file: > https://www.mail-archive.com/git%40vger.kernel.org/msg65722.html > > Info on speeding up clones/fetches with pack bitmaps: > https://www.mail-archive.com/git%40vger.kernel.org/msg65571.html > > On Fri, Jan 8, 2016 at 12:13 PM, James E. Blair <[email protected]> > wrote: > > Hi, > > > > With the new version of Gerrit offering built-in "git gc" capability, we > > looked at the current state of our git repo maintenance. We run "git > > repack -afd" weekly in an attempt to produce the smallest packfiles > > possible, but it does not prune loose objects, which seems to be the > > main thing "git gc" does that we are missing. > > > > Some (relatively) quick experimentation suggests that various > > combinations of "git gc", "git repack", "git prune", "git prune-packed" > > all have effects on the overall repo size, the number of pack files, and > > the number of loose objects. > > > > However, we don't just want to find the thing that makes the smallest > > repo size (that's easy: "git prune; git gc" -- 394M for nova; one > > packfile with all objects and one packed-refs file with all refs) > > because this repo is used as the basis of all of our mirrors and is > > accessed over several protocols. It's not immediately clear what the > > right optimization is for our situation -- we don't necessarily want to > > trade on-disk size for reduced network performance. Even the packing of > > refs isn't entirely straightforward -- while we haven't needed to for > > some time, we have, in the past removed refs. > > > > We're looking for a volunteer to really dig into this problem and > > thoroughly evaluate the implications of different ways of optimizing the > > repo. If you're interested, you can download a snapshot of the full > > nova repository from Gerrit (it is a point-in-time snapshot and will not > > be updated) at this URL: > > > > http://tarballs.openstack.org/ci/nova.git.tar.bz2 > > > > Please follow up this message if you are interested and with any > > findings. > > > > Thanks, > > > > Jim > > > > _______________________________________________ > > OpenStack-Infra mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > > _______________________________________________ > OpenStack-Infra mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra >
_______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
