Re: [PATCH 2/4] line-log: adjust start/end of ranges individually

2018-08-06 Thread Johannes Schindelin
Hi Eric,

On Sun, 5 Aug 2018, Eric Sunshine wrote:

> On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
>  wrote:
> > When traversing commits and adjusting the ranges, things can get really
> > tricky. For example, when the line range of interest encloses several
> > hunks of a commit, the line range can actually shrink.
> >
> > Currently, range_set_shift_diff() does not anticipate that scenario and
> > blindly adjusts start and end by the same offset ("shift" the range).
> > [...]
> > Let's fix this by adjusting the start and the end offsets individually.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> > diff --git a/line-log.c b/line-log.c
> > @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out,
> > - (target[j].end-target[j].start);
> > j++;
> > }
> > -   range_set_append(out, src[i].start+offset, 
> > src[i].end+offset);
> > +   start_offset = offset;
> > +   while (j < diff->target.nr && src[i].end > target[j].end) {
> > +   offset += (parent[j].end-parent[j].start)
> > +   - (target[j].end-target[j].start);
> > +   j++;
> > +   }
> > +   range_set_append(out, src[i].start+start_offset, 
> > src[i].end+offset);
> 
> I'm still trying to wrap my head around the original code, so I'm not
> even at the point of being able to say if this fix is correct. What
> happens if the "start_offset" loop consumes all of 'j' before it even
> gets to the new loop?

First of all, it is not really the `start_offset` loop, but more like the
"end_offset loop": we are still adjusting `offset`, and that is what we
use to adjust the end of the current line range, while we take pains to
keep the offset that needs to be used for the start of said line range.

> Why does the new loop use '>' whereas the existing uses '>='?

As a mathematician, I usually think of intervals as inclusive only on one
end. So I intuitively use the non-inclusive comparison on the other end.

In this instance, the first loop tries to make sure that the offset to be
used for the start offset has taken into account all of the relevant diff
hunks (meaning: the diff hunks whose target lines start before src[i],
i.e. the current line range to adjust).

The second loop, however, tries to make sure that the offset to adjust the
*end* of the current line range.

Hence what I wrote.

But now that I think about it again, I am no longer sure that the first
loop (the one I did not change) is sound, to begin with. Let's imagine a
very simple case, where we try to adjust a start line, say, 100, and the
diff has a hunk with header `@@ -90,20 +90,40 @@`. So the start line lies
somewhere around a quarter into the post-image.

The current line-log code adjusts this by a very crude method: since the
line number lies between the post-image's start, it is adjusted by adding
the length of the pre-image and subtracting the length of the post image,
i.e. 20 - 40. The result is that we pretend that it came from line number
80, which is squarely outside the pre-image range.

That method of adjustment is appropriate for lines *outside* the
post-image range, of course. If we had looked at line 140 (which is 10
lines after the post-image), then it would be totally correct to say that
it came from line 120.

But that method is pretty obviously broken for any line that lies *within*
a post-image.

So I think the entire loop is in dear need of adjustment.

The question is: how should it be adjusted? The safest way would be to
adjust start and end of the line range differently, where start is pulled
back to the pre-image's start (in the above example, 90), and end is
pushed up to the pre-image's end (in the above example, 110).

Of course, this has a slightly unintuitive consequence: imagine that some
commit made a block of, say, 20 lines a conditional one. Like,

/* Some comment */
do_something = 1;
...
print_it();
...

could have become

if (!dry_run) {
/* Some comment */
do_something = 1;
...
print_it();
...
}

If you use my proposed method to adjust the line ranges to figure out the
history of that `print_it()` line, this commit would blow up the line
range to the entire conditional block, which is probably not what you
wanted.

I don't really see a better way, though.

> Having said that, a much easier fix is to use range_set_append_unsafe()
> here, and then at the bottom of the loop, invoke
> 'sort_and_merge_range_set(out)' to restore range-set invariants and
> ensure that neighboring ranges are coalesced. Not only does that resolve
> the crash and other weird behavior, but it means you don't have to add a
> special-case to range_set_append(), thus the fix becomes simpler
> overall.

That would only p

Re: [PATCH 2/4] line-log: adjust start/end of ranges individually

2018-08-05 Thread Eric Sunshine
On Sun, Aug 5, 2018 at 6:14 AM Eric Sunshine  wrote:
> Having said that, a much easier fix is to use
> range_set_append_unsafe() here, and then at the bottom of the loop,
> invoke 'sort_and_merge_range_set(out)' to restore range-set invariants

By "bottom", I meant "outside" or "after":

...and then invoke sort_and_merge_range_set()
after the loop to restore range-set invariants.


Re: [PATCH 2/4] line-log: adjust start/end of ranges individually

2018-08-05 Thread Eric Sunshine
On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
 wrote:
> When traversing commits and adjusting the ranges, things can get really
> tricky. For example, when the line range of interest encloses several
> hunks of a commit, the line range can actually shrink.
>
> Currently, range_set_shift_diff() does not anticipate that scenario and
> blindly adjusts start and end by the same offset ("shift" the range).
> [...]
> Let's fix this by adjusting the start and the end offsets individually.
>
> Signed-off-by: Johannes Schindelin 
> ---
> diff --git a/line-log.c b/line-log.c
> @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out,
> - (target[j].end-target[j].start);
> j++;
> }
> -   range_set_append(out, src[i].start+offset, src[i].end+offset);
> +   start_offset = offset;
> +   while (j < diff->target.nr && src[i].end > target[j].end) {
> +   offset += (parent[j].end-parent[j].start)
> +   - (target[j].end-target[j].start);
> +   j++;
> +   }
> +   range_set_append(out, src[i].start+start_offset, 
> src[i].end+offset);

I'm still trying to wrap my head around the original code, so I'm not
even at the point of being able to say if this fix is correct. What
happens if the "start_offset" loop consumes all of 'j' before it even
gets to the new loop? Why does the new loop use '>' whereas the
existing uses '>='?

Having said that, a much easier fix is to use
range_set_append_unsafe() here, and then at the bottom of the loop,
invoke 'sort_and_merge_range_set(out)' to restore range-set invariants
and ensure that neighboring ranges are coalesced. Not only does that
resolve the crash and other weird behavior, but it means you don't
have to add a special-case to range_set_append(), thus the fix becomes
simpler overall.

Aside from simplicity, I think the suggested use of
range_set_append_unsafe() and sort_and_merge_range_set() _is_ the
correct fix anyhow because this code isn't taking care to ensure that
the range, after applying 'offset', doesn't abut or overlap with an
earlier range, and sort_and_merge_range_set() is meant to be used
exactly in cases like this when invariants may be broken.

So, while the suggested fix is simpler and "better" and fixes the
crash, that doesn't necessarily mean that the values computed here are
actually correct. As noted, I'm still trying to grok the computation
of these values, but that's a separate issue from the crash itself.


[PATCH 2/4] line-log: adjust start/end of ranges individually

2018-08-04 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When traversing commits and adjusting the ranges, things can get really
tricky. For example, when the line range of interest encloses several
hunks of a commit, the line range can actually shrink.

Currently, range_set_shift_diff() does not anticipate that scenario and
blindly adjusts start and end by the same offset ("shift" the range).

This can lead to a couple of surprising issues, such as assertions in
range_set_append() (when the end of a given range is not adjusted
properly, it can point after the start of the next range) or even
segmentation faults (when t_end in the loop of dump_diff_hacky_one()
points outside the valid line range).

Let's fix this by adjusting the start and the end offsets individually.

Signed-off-by: Johannes Schindelin 
---
 line-log.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/line-log.c b/line-log.c
index 72a5fed66..d8d09b5ee 100644
--- a/line-log.c
+++ b/line-log.c
@@ -427,7 +427,7 @@ static void range_set_shift_diff(struct range_set *out,
 struct diff_ranges *diff)
 {
unsigned int i, j = 0;
-   long offset = 0;
+   long offset = 0, start_offset;
struct range *src = rs->ranges;
struct range *target = diff->target.ranges;
struct range *parent = diff->parent.ranges;
@@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out,
- (target[j].end-target[j].start);
j++;
}
-   range_set_append(out, src[i].start+offset, src[i].end+offset);
+   start_offset = offset;
+   while (j < diff->target.nr && src[i].end > target[j].end) {
+   offset += (parent[j].end-parent[j].start)
+   - (target[j].end-target[j].start);
+   j++;
+   }
+   range_set_append(out, src[i].start+start_offset, 
src[i].end+offset);
}
 }
 
-- 
gitgitgadget