> > This makes sense - offsets may be different when we omit objects from
> > the packfile. I think this can be computed by calculating the number of
> > zero bits between the current object's index and the nth object prior
> > (where n is the offset) in the bitmap resulting from
> > reuse_partial_packfile_from_bitmap() above, thus eliminating the need
> > for this array, but I haven't tested it.
>
> You need to know not just the number of zero bits, but the accumulated
> offset due to those missing objects. So you'd end up having to walk over
> the revindex for that set of objects. This array is basically caching
> those accumulated offsets (for the parts we _do_ include) so we don't
> have to compute them repeatedly.
Ah...yes. For some reason I thought that the offset was a number of
objects, but it is actually a number of bytes. The patch makes sense
now.
> There's also a more subtle issue with entry sizes; see below.
Good point.
> > > @@ -1002,6 +1132,10 @@ static int have_duplicate_entry(const struct
> > > object_id *oid,
> > > {
> > > struct object_entry *entry;
> > >
> > > + if (reuse_packfile_bitmap &&
> > > + bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid))
> > > + return 1;
> >
> > Hmm...why did we previously not need to check the reuse information, but
> > we do now? I gave the code a cursory glance but couldn't find the
> > answer.
>
> I think the original code may simply have been buggy and nobody noticed.
> Here's what I wrote when this line was added in our fork:
[snip explanation]
Thanks - I'll also take a look if I have time.
> Thanks for looking at it. I still have to take a careful pass over the
> whole split, but I've tried to at least answer your questions in the
> meantime.
Thanks for your responses. Also thanks to Christian for splitting it in
the first place, making it easier to review.