On Thu, Aug 16, 2018 at 11:08 PM Jeff King <p...@peff.net> wrote:
>
> On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote:
>
> > >  * We spend the majority of the ~30s on this:
> > >    
> > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79
> >
> > This is hashing the actual packfile. This is potentially quite long,
> > especially if you have a ton of big objects.
> >
> > I wonder if we need to do this as a separate step anyway, though. Our
> > verification is based on index-pack these days, which means it's going
> > to walk over the whole content as part of the "Indexing objects" step to
> > expand base objects and mark deltas for later. Could we feed this hash
> > as part of that walk over the data? It's not going to save us 30s, but
> > it's likely to be more efficient. And it would fold the effort naturally
> > into the existing progress meter.
>
> Actually, I take it back. That's the nice, modern way we do it in
> git-verify-pack. But git-fsck uses the ancient "just walk over all of
> the idx entries method". It at least sorts in pack order, which is good,
> but:
>
>   - it's not multi-threaded, like index-pack/verify-pack
>
>   - the index-pack way is actually more efficient than pack-ordering for
>     the delta-base cache, because it actually walks the delta-graph in
>     the optimal order
>

I actually tried to make git-fsck use index-pack --verify at one
point. The only thing that stopped it from working was index-pack
automatically wrote the newer index version if I remember correctly,
and that would fail the final hash check. fsck performance was not a
big deal so I dropped it. Just saying it should be possible, if
someone's interested in that direction.
-- 
Duy

Reply via email to