Hi Stefan,

On Mon, 9 Jul 2018, Stefan Beller wrote:

> On Tue, Jul 3, 2018 at 4:26 AM Johannes Schindelin via GitGitGadget
> <gitgitgad...@gmail.com> wrote:
> 
> > +'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
> > +       [--dual-color] [--creation-factor=<factor>]
> > +       ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
> > +
> > +DESCRIPTION
> > +-----------
> > +
> > +This command shows the differences between two versions of a patch
> > +series, or more generally, two commit ranges (ignoring merges).
> 
> Does it completely ignore merges or does it die("not supported"), how is
> the user expected to cope with the accidental merge in the given range?

It ignores merges. It does not reject them. It simply ignores them and
won't talk about them as a consequence.

Could you suggest an improved way to say that?

> > +To that end, it first finds pairs of commits from both commit ranges
> > +that correspond with each other. Two commits are said to correspond when
> > +the diff between their patches (i.e. the author information, the commit
> > +message and the commit diff) is reasonably small compared to the
> > +patches' size. See ``Algorithm` below for details.
> > +
> > +Finally, the list of matching commits is shown in the order of the
> > +second commit range, with unmatched commits being inserted just after
> > +all of their ancestors have been shown.
> > +
> > +
> > +OPTIONS
> > +-------
> > +--dual-color::
> > +       When the commit diffs differ, recreate the original diffs'
> > +       coloring, and add outer -/+ diff markers with the *background*
> > +       being red/green to make it easier to see e.g. when there was a
> > +       change in what exact lines were added.
> 
> I presume this is a boolean option, and can be turned off with
> --no-dual-color, but not with --dual-color=no. Would it be worth to
> give the --no-option here as well.
> The more pressing question I had when reading this, is whether this
> is the default.

In the final patch (which I mulled about adding or not for a couple of
weeks), the `--dual-color` mode is the default, and the man page talks
about `--no-dual-color`.

Do you want me to change this intermediate commit, even if that change
will be reverted anyway?

> > +--creation-factor=<percent>::
> > +       Set the creation/deletion cost fudge factor to `<percent>`.
> > +       Defaults to 60. Try a larger value if `git range-diff` erroneously
> > +       considers a large change a total rewrite (deletion of one commit
> > +       and addition of another), and a smaller one in the reverse case.
> > +       See the ``Algorithm`` section below for an explanation why this is
> > +       needed.
> > +
> > +<range1> <range2>::
> > +       Compare the commits specified by the two ranges, where
> > +       `<range1>` is considered an older version of `<range2>`.
> 
> Is it really older? How does that help the user?

It is important to get your ducks in a row, so to speak, when looking at
range-diffs. They are even more unintuitive than diffs, so it makes sense
to have a very clear mental picture of what you are trying to compare
here.

The coloring gives a strong hint of "pre" vs "post", i.e. old vs new: the
changes that are only in the "old" patches are marked with a minus with a
red background color, which only really makes sense if you think about
these changes as "dropped" or "removed" from the "new" changes.

So yes, it is really considered an older version, in my mind.

Again, if you have suggestions how to improve my patch (giving rise to a
"new" patch :-)), let's hear them.

> I think this comes from the notion of e.g. patch 4 ("range-diff: improve the
> order of the shown commits "), that assume the user wants the range-diff
> to be expressed with range2 as its "base range".

No, it is motivated by the fact that we use -/+ markers to indicate
differences between the "old" and the "new" patches.

> > +Algorithm
> > +---------
> > +
> > +The general idea is this: we generate a cost matrix between the commits
> > +in both commit ranges, then solve the least-cost assignment.
> 
> Can you say more about the generation of the cost matrix?
> I assume that it counts the number of lines added/deleted to make
> one patch into the other patch.

I think that is correct.

*reading the patch*

Actually, no, I was wrong. For the cost matrix, the *length* of the diff
*of the diffs* is computed. Think of it as

        git diff --no-index <(git diff A^!) <(git diff B^!) | wc -l

> If that assumption was correct, an edit of a commit message adding one
> line is just as costly as adding one line in the diff.

Nope, editing a commit message does not have any influence on the
algorithm's idea whether the commit matches or not. Only the content
changes associated with the commit have any say over this.

> Further I would assume that the context lines are ignored?

No.

> I think this is worth spelling out.

Sure.

> Another spot to look at is further metadata, such as author and
> author-date, which are kept the same in a rebase workflow.

I encourage you to offer that as an add-on patch series. Because what you
suggest is not necessary for my use cases, so I'd rather not spend time on
it.

Ciao,
Dscho

Reply via email to