On Sat, Aug 11 2018, René Scharfe wrote:

> Object IDs to skip are stored in a shared static oid_array.  Lookups do
> a binary search on the sorted array.  The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case.

I think this change makes sense, but it's missing an update to the
relevant documentation in Documentation/config.txt:

    fsck.skipList::
        The path to a sorted list of object names (i.e. one SHA-1 per
        line) that are known to be broken in a non-fatal way and should
        be ignored. This feature is useful when an established project
        should be accepted despite early commits containing errors that
        can be safely ignored such as invalid committer email addresses.
        Note: corrupt objects cannot be skipped with this setting.

Also, while I use the skipList feature it's for something on the order
of 10-100 objects, so whatever algorithm the lookup uses isn't going to
matter, but I think it's interesting to describe the trade-off in the
commit message.

I.e. what if I have 100K objects listed in the skipList, is it only
going to be read lazily during fsck if there's an issue, or on every
object etc? What's the difference in performance?

Before this change, I wanted to follow-up my ab/fsck-transfer-updates
with something where we'd die if we found the skipList wasn't ordered as
we read it, but from a UI POV this is even better.

Reply via email to