On Mon, Mar 19, 2018 at 5:43 PM, Junio C Hamano <[email protected]> wrote:
> Nguyễn Thái Ngọc Duy <[email protected]> writes:
>
>> +static inline void oe_set_size(struct object_entry *e,
>> + unsigned long size)
>> +{
>> + e->size_ = size;
>> + e->size_valid = e->size_ == size;
>
> A quite similar comment as my earlier one applies here. I wonder if
> this is easier to read?
>
> e->size_valid = fits_in_32bits(size);
> if (e->size_valid)
> e->size_ = size;
>
> Stepping back a bit in a different tangent,
>
> - fits_in_32bits() is a good public name if the helper is about
> seeing if the given quantity fits in 32bit uint,
>
> - but that carves it in stone that our e->size_ *will* be 32bit
> forever, which is not good.
>
> So, it may be a good idea to call it size_cacheable_in_oe(size) or
> something to ask "I have this 'size'; is it small enough to fit in
> the field in the oe, i.e. allow us to cache it, as opposed to having
> to go back to the object every time?" Of course, this would declare
> that the helper can only be used for that particular field, but that
> is sort of the point of such a change, to allow us to later define
> the e->size_ field to different sizes without affecting other stuff.
This is why I do "size_valid = size_ == size". In my private build, I
reduced size_ to less than 32 bits and change the "fits_in_32bits"
function to do something like
int fits_in_32bits(unsigned long size)
{
struct object_entry e;
e.size_ = size;
return e.size_ == size.
}
which makes sure it always works. This spreads the use of "valid = xx
== yy" in more places though. I think if we just limit the use of
this expression in a couple access wrappers than it's not so bad.
>> + if (!e->size_valid) {
>> + unsigned long real_size;
>> +
>> + if (sha1_object_info(e->idx.oid.hash, &real_size) < 0 ||
>> + size != real_size)
>> + die("BUG: 'size' is supposed to be the object size!");
>> + }
>
> If an object that is smaller than 4GB is fed to this function with
> an incorrect size, we happily record it in e->size_ and declare it
> is valid. Wouldn't that be equally grave error as we are catching
> in this block?
That adds an extra sha1_object_info() to all objects and it's
expensive (I think it's one of the reasons we cache values in
object_entry in the first place). I think there are also a few
occasions we reuse even bad in-pack objects (there are even tests for
that) so it's not always safe to die() here.
--
Duy