On Fri, May 04 2018, Jakub Narebski wrote:

(Just off-the cuff here and I'm surely about to be corrected by
Derrick...)

> * What to do about merge commits, and octopus merges in particular?
>   Should Bloom filter be stored for each of the parents?  How to ensure
>   fast access then (fixed-width records) - use large edge list?

You could still store it fixed with, you'd just say that if you
encounter a merge with N parents the filter wouldn't store files changed
in that commit, but rather whether any of the N (including the merge)
had changes to files as of the their common merge-base.

Then if they did you'd need to walk all sides of the merge where each
commit would also have the filter to figure out where the change(s)
was/were, but if they didn't you could skip straight to the merge base
and keep walking.

Which, on the topic of what else a commit graph could store: A mapping
from merge commits of N parents to the merge-base of those commits.

You could also store nothing for merges (or only files the merge itself
changed v.s. its parents). Derrick talked about how the bloom filter
implementation has a value that's "Didn't compute (for whatever reason),
look at it manually".

> * Then there is problem of rename and copying detection - I think we can
>   simply ignore it: unless someone has an idea about how to handle it?
>
>   Though this means that "git log --follow <file>" wouldn't get any
>   speedup, and neither the half of "git gui blame" that runs "git blame
>   --incremental -C -C -w" -- the one that allows code copying and
>   movement detection.

Couldn't the bloom filter also speed up --follow if you did two passes
through the history? The first to figure out all files that ever changed
names, and then say you did `--follow sha1-name.c` on git.git. The
filter would have had all the bits for both sha1_name.c and sha1-name.c
set on all commits that touched either for all of the history.

Of course this would only work with a given default value of -M<n>, but
on the assumption that most users left it at the default, and
furthermore that renames weren't so common as to make the filter useless
with too many false-positives as a result, it might be worth it. If you

Reply via email to