On Wed, Jan 26, 2011 at 8:38 AM, Darren Dale <dsdal...@gmail.com> wrote: > On Tue, Jan 25, 2011 at 6:12 PM, Darren Dale <dsdal...@gmail.com> wrote: >> There is still an outstanding issue that must be taken care of before >> we migrate. The conversion routines create a basemap repository out of >> trunk/toolkits/basemap, and matplotlib repository out of >> trunk/matplotlib. Still, the matplotlib repo (at >> github.com/darrendale/matplotlib) is over 200 MB. One can search the >> objects in the large packfile, and find that there are still >> references to basemap data in the matplotlib repo. I don't know how it >> got in there, nor how to remove it. > > I went through the exercise of identifying the largest blob, as > described near the end of http://progit.org/book/ch9-7.html : > > $ git verify-pack -v > objects/pack/pack-fa44ca56d7ec3964e562494f2fe08203143074bd.idx | sort > -k 3 -n | tail -3 > 3b8b6c010f8ce59afac1e811b1bbc3efc21b770a blob 9154481 9089827 62749144 > 6328b70e665b58ed7f5aa1e110418cbb3facc07a blob 9331200 94297 156884507 > f784efc1518b10dff33673ad9a7a1ac3a7d107d5 blob 51399604 14333430 162328624 > > $ git rev-list --objects --all | grep f784efc1518b10dff > f784efc1518b10dff33673ad9a7a1ac3a7d107d5 toolkits/basemap/data/gshhs_h.txt > > This shell script is supposed to identify which commits have that blob > in their tree > (http://stackoverflow.com/questions/223678/git-which-commit-has-this-blob): > > --- > #!/bin/sh > obj_name="$1" > shift > git log "$@" --pretty=format:'%T %h %s' \ > | while read tree commit subject ; do > if git ls-tree -r $tree | grep -q "$obj_name" ; then > echo $commit "$subject" > fi > done > --- > > but it comes up empty, so now I'm stuck. Any ideas would be greatly > appreciated. >
First of all, I must clarify that I'm not a git expert by any means. I suspected this could be some dangling objects within the repository, which could be side effects of svn2git. After some googling, I found that $ git fsck --unreachable HEAD $(git for-each-ref --format="%(objectname)" refs/heads) This gave me 2774 objects which includes the blob of "toolkits/basemap/data/gshhs_h.txt". Since they are unreachable, I suppose that they can be simply removed. I spend an hour to figure out how we can delete these unreachable objects. But it turned out that the answer seems to be simple. $ git repack -ad Now there is no unreachable object reported and this seems to reduce the total size down to ~140 MB. Now the biggest blob is for "release/osx/matplotlib-0.98.5.tar.gz". and $ git log -r -- release/osx/matplotlib-0.98.5.tar.gz works as expected. And some of the biggest blobs are associated with svg and pdf files. It seems possible to remove these files (if needed) from the repository using "git-filter-branch" to further reduce the size, but I'm not sure if we need that. IHTH, -JJ > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > Matplotlib-devel mailing list > Matplotlib-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel