Re: [PATCH] diff: add --ignore-blank-lines option

Antoine Pelisse Mon, 17 Jun 2013 12:10:36 -0700

On Mon, Jun 17, 2013 at 6:18 PM, Junio C Hamano <gits...@pobox.com> wrote:
> Antoine Pelisse <apeli...@gmail.com> writes:
>
>> So here is a more thorough description of the option:
>
>> - real changes are interesting
>
> OK, I think I can understand it.
>
>> - blank lines that are close enough (less than context size) to
>>   interesting changes are considered interesting (recursive definition)
>
> OK.
>
>> - "context" lines are used around each hunk of interesting changes
>
> OK.
>
>> - If two hunks are separated by less than "inter-hunk-context", they
>>   will be merged into one.
>
> Makes sense.
>
>> The current implementation does the "interesting changes selection" in a
>> single pass.
>
> "current" meaning "the code after this patch is applied"?  Is there
> a possible future enhancement hinted here?


No. There might be, but I'm not sure it should be discussed right now
(In case you're curious, I'm thinking about interaction with combined
diff). I will take the hint and rephrase.

>> +xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
>> +{
>> +     xdchange_t *xch, *xchp, *lxch;
>>       long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
>> +     long max_ignorable = xecfg->ctxlen;
>> +     unsigned long changes = ULONG_MAX;

Let me explain what "changes" means, as I know it will help the rest
of the message:
It counts the number of *added* blank lines we have ignored since
"lxch" (needed to calculate the distance between lxch and xch)
It also has the meaning of what was called "interesting" before.
If changes == ULONG_MAX, we are still in interesting zone, otherwise
it means we have ignored "changes" *added* blank lines (0 being a
valid value).
(Actually, After rereading this part, it looks like I could check that
lxch == xchp rather than setting changes to ULONG_MAX).

>> +
>> +     /* remove ignorable changes that are too far before other changes */
>> +     for (xchp = *xscr; xchp && xchp->ignore; xchp = xchp->next) {
>> +             xch = xchp->next;
>> +
>> +             if (xch == NULL ||
>> +                 xch->i1 - (xchp->i1 + xchp->chg1) >= max_ignorable)
>> +                     *xscr = xch;
>> +     }
>
> This strips leading ignorable ones away until we see an unignorable
> one.  Looks sane.
>
>> +     if (*xscr == NULL)
>> +             return NULL;
>> +
>> +     lxch = *xscr;
>
> "lxch" remembers the last one that is "interesting".
>
>> +     for (xchp = *xscr, xch = xchp->next; xch; xchp = xch, xch = xch->next) 
>> {
>> +             long distance = xch->i1 - (xchp->i1 + xchp->chg1);
>> +             if (distance > max_common)
>>                       break;
>
> If we see large-enough gap, the one we processed last (in xchp) is
> the end of the current hunk.  Looks sane.
>
>> +             if (distance < max_ignorable &&
>> +                 (!xch->ignore || changes == ULONG_MAX)) {
>> +                     lxch = xch;
>> +                     changes = ULONG_MAX;
>
> The current one is made into the "last interesting one we have seen"
> and the hunk continues, if either (1) the current one is interesting
> by itself, or (2) the last one we saw does not match some
> unexplainable criteria to cause changes set to not ULONG_MAX.
>
> Puzzling.

- If we are still in interesting zone, we take it, even if it's
ignorable change. Because it's close enough.
- Otherwise, only take real changes. We are close to another change,
and we are still in the loop, so it must be interesting.

>> +             } else if (changes != ULONG_MAX &&
>> +                        xch->i1 + changes - (lxch->i1 + lxch->chg1) > 
>> max_common) {
>> +                     break;
>
> If the last one we saw does not match some unexplainable criteria to
> cause changes set to not ULONG_MAX, and the distance between this
> one and the last "intersting" one is further than the context, this
> one will not be a part of the current hunk.
>
> Puzzling.

If we are no longer in "interesting zone" (changes != ULONG_MAX), it
means we will stop if the distance is too big.
"changes" is used in the calculation to consider the changes we have
already ignored (xch->i1 - (lxch->i1 + lxch->chg1) will only work if
xch and lxch are consecutive, we need to add the blank lines we
ignored).

> Could you add comment to the "changes" variable and explain what the
> variable means?
>
>> +             } else if (!xch->ignore) {
>> +                     lxch = xch;
>> +                     changes = ULONG_MAX;
>
> When this change by itself is interesting, it becomes the "last
> interesting one" and the hunk continues.

Exactly, and changes goes back to "interesting".

>> +             } else {
>> +                     if (changes == ULONG_MAX)
>> +                             changes = 0;
>> +                     changes += xch->chg2;
>
> Puzzled beyond guessing.  Also it is curious why here and only here
> we look at chg2 side of the things, not i1/chg1 in this whole thing.

chg2 being the number of blank line *additions*.
I don't want to coalesce two hunks because some blank lines have been
removed between the two, so we must not change the distance
calculation because of a blank line removal. That behavior can be seen
in "ignore-blank-lines: between changes" test.

Hope that makes things clearer,
Thanks again for the thorough reading,

Antoine
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] diff: add --ignore-blank-lines option

Reply via email to