On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:

> filter-objects is a command to scan all objects in the object database
> for the repository and print the ids of those which match the given
> criteria.
> The current supported criteria are object type and the minimum size of
> the object.
> The guiding use case is to scan repositories quickly for large objects
> which may cause performance issues for users. The list of objects can
> then be used to guide some future remediating action.

I've had to perform this exact same task. You can already do the
"filtering" part pretty easily and efficiently with cat-file and a perl
script, like:

  magically_generate_all_objects |
  git cat-file --batch-check='%(objectsize) %(objectname)' |
  perl -alne 'print $F[1] if $F[0] > 1234'

That's not as friendly as your filter-objects, but it's a lot more
flexible (since you can ask cat-file for all sorts of information).

Obviously I've glossed over the "how to get a list of objects" part.
If you truly want all objects (not just reachable ones), or if "rev-list
--objects" is too slow, the best way is:

  objects() {
    # loose objects
    for i in objects/??/*; do
       echo $i
    done |
    sed 's,objects/\(..\)/,\1,'

    # packed objects
    for i in objects/pack/*.idx; do
      git show-index <$i
    done |
    cut -d' ' -f2

Certainly I'm not opposed to doing something less horrible there (and I
am happy to see my for_each_*_object interface getting more callers!).
I kind of wonder if we should make "all objects, reachable or not" an
option for rev-list. I'm not sure if it would choke on adding them all
to the "pending" list, though; it's not really made for that. But it
would enable neat things like:

  git rev-list --all-the-objects --not --all

to show you what's unreachable.

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to