Jeff King <p...@peff.net> writes:

> Perhaps we need
>
>   git cat-file --batch-format="%(disk-size) %(object)"
>
> or similar.

I agree with your reasoning.  It may be simpler to give an interface
to ask for which pieces of info, e.g. --batch-cols=size,disksize,
without giving the readers a flexible "format".

> +NOTE: The on-disk size reported is accurate, but care should be taken in
> +drawing conclusions about which refs or objects are responsible for disk
> +usage. The size of a packed non-delta object be much larger than the
> +size of objects which delta against it, but the choice of which object
> +is the base and which is the delta is arbitrary and is subject to change
> +during a repack. Note also that multiple copies of an object may be
> +present in the object database; in this case, it is undefined which
> +copy's size will be reported.

This is a good note to leave to the readers. I was wondering how
valid to accuse that B is taking a lot of space compared to C when
you have three objects A, B and C (in decreasing order of on-disk
footprint) when A is huge and C is a small delta against A and B is
independent.  The role of A and C in their delta chain could easily
be swapped during the next full repack and then C will appear a lot
larger than B.

It might be interesting to measure the total disk footprint of an
entire delta "family" (the objects that delta against the same
base).  You may find out that hello.c with a manageable size have
very many revisions and overall have a larger on-disk footprint than
a single copy of unchanging help.mov clip used in the documentation
does, which may be an interesting observation to make.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to