On Wed, Jan 26, 2011 at 8:38 AM, Darren Dale <dsdal...@gmail.com> wrote:
> On Tue, Jan 25, 2011 at 6:12 PM, Darren Dale <dsdal...@gmail.com> wrote:
>> There is still an outstanding issue that must be taken care of before
>> we migrate. The conversion routines create a basemap repository out of
>> trunk/toolkits/basemap, and matplotlib repository out of
>> trunk/matplotlib. Still, the matplotlib repo (at
>> github.com/darrendale/matplotlib) is over 200 MB. One can search the
>> objects in the large packfile, and find that there are still
>> references to basemap data in the matplotlib repo. I don't know how it
>> got in there, nor how to remove it.
>
> I went through the exercise of identifying the largest blob, as
> described near the end of http://progit.org/book/ch9-7.html :
>
> $ git verify-pack -v
> objects/pack/pack-fa44ca56d7ec3964e562494f2fe08203143074bd.idx | sort
> -k 3 -n | tail -3
> 3b8b6c010f8ce59afac1e811b1bbc3efc21b770a blob   9154481 9089827 62749144
> 6328b70e665b58ed7f5aa1e110418cbb3facc07a blob   9331200 94297 156884507
> f784efc1518b10dff33673ad9a7a1ac3a7d107d5 blob   51399604 14333430 162328624
>
> $ git rev-list --objects --all | grep f784efc1518b10dff
> f784efc1518b10dff33673ad9a7a1ac3a7d107d5 toolkits/basemap/data/gshhs_h.txt
>
> This shell script is supposed to identify which commits have that blob
> in their tree 
> (http://stackoverflow.com/questions/223678/git-which-commit-has-this-blob):
>
> ---
> #!/bin/sh
> obj_name="$1"
> shift
> git log "$@" --pretty=format:'%T %h %s' \
> | while read tree commit subject ; do
>    if git ls-tree -r $tree | grep -q "$obj_name" ; then
>        echo $commit "$subject"
>    fi
> done
> ---
>
> but it comes up empty, so now I'm stuck. Any ideas would be greatly 
> appreciated.
>

First of all, I must clarify that I'm not a git expert by any means.

I suspected this could be some dangling objects within the repository,
which could be side effects of svn2git. After some googling, I found
that

$ git fsck --unreachable HEAD $(git for-each-ref
--format="%(objectname)" refs/heads)

This gave me 2774 objects which includes the blob of
"toolkits/basemap/data/gshhs_h.txt".
Since they are unreachable, I suppose that they can be simply removed.

I spend an hour to figure out how we can delete these unreachable
objects. But it turned out that the answer seems to be simple.

$ git repack -ad

Now there is no unreachable object reported and this seems to reduce
the total size down to ~140 MB.

Now the biggest blob is for "release/osx/matplotlib-0.98.5.tar.gz". and

$ git log -r -- release/osx/matplotlib-0.98.5.tar.gz

works as expected.

And some of the biggest blobs are associated with svg and pdf files.
It seems possible to remove these files (if needed) from the
repository using "git-filter-branch" to further reduce the size, but
I'm not sure if we need that.

IHTH,

-JJ




> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> Matplotlib-devel mailing list
> Matplotlib-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
>

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to