Re: [PATCH 2/4] line-log: adjust start/end of ranges individually
Hi Eric, On Sun, 5 Aug 2018, Eric Sunshine wrote: > On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget > wrote: > > When traversing commits and adjusting the ranges, things can get really > > tricky. For example, when the line range of interest encloses several > > hunks of a commit, the line range can actually shrink. > > > > Currently, range_set_shift_diff() does not anticipate that scenario and > > blindly adjusts start and end by the same offset ("shift" the range). > > [...] > > Let's fix this by adjusting the start and the end offsets individually. > > > > Signed-off-by: Johannes Schindelin > > --- > > diff --git a/line-log.c b/line-log.c > > @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out, > > - (target[j].end-target[j].start); > > j++; > > } > > - range_set_append(out, src[i].start+offset, > > src[i].end+offset); > > + start_offset = offset; > > + while (j < diff->target.nr && src[i].end > target[j].end) { > > + offset += (parent[j].end-parent[j].start) > > + - (target[j].end-target[j].start); > > + j++; > > + } > > + range_set_append(out, src[i].start+start_offset, > > src[i].end+offset); > > I'm still trying to wrap my head around the original code, so I'm not > even at the point of being able to say if this fix is correct. What > happens if the "start_offset" loop consumes all of 'j' before it even > gets to the new loop? First of all, it is not really the `start_offset` loop, but more like the "end_offset loop": we are still adjusting `offset`, and that is what we use to adjust the end of the current line range, while we take pains to keep the offset that needs to be used for the start of said line range. > Why does the new loop use '>' whereas the existing uses '>='? As a mathematician, I usually think of intervals as inclusive only on one end. So I intuitively use the non-inclusive comparison on the other end. In this instance, the first loop tries to make sure that the offset to be used for the start offset has taken into account all of the relevant diff hunks (meaning: the diff hunks whose target lines start before src[i], i.e. the current line range to adjust). The second loop, however, tries to make sure that the offset to adjust the *end* of the current line range. Hence what I wrote. But now that I think about it again, I am no longer sure that the first loop (the one I did not change) is sound, to begin with. Let's imagine a very simple case, where we try to adjust a start line, say, 100, and the diff has a hunk with header `@@ -90,20 +90,40 @@`. So the start line lies somewhere around a quarter into the post-image. The current line-log code adjusts this by a very crude method: since the line number lies between the post-image's start, it is adjusted by adding the length of the pre-image and subtracting the length of the post image, i.e. 20 - 40. The result is that we pretend that it came from line number 80, which is squarely outside the pre-image range. That method of adjustment is appropriate for lines *outside* the post-image range, of course. If we had looked at line 140 (which is 10 lines after the post-image), then it would be totally correct to say that it came from line 120. But that method is pretty obviously broken for any line that lies *within* a post-image. So I think the entire loop is in dear need of adjustment. The question is: how should it be adjusted? The safest way would be to adjust start and end of the line range differently, where start is pulled back to the pre-image's start (in the above example, 90), and end is pushed up to the pre-image's end (in the above example, 110). Of course, this has a slightly unintuitive consequence: imagine that some commit made a block of, say, 20 lines a conditional one. Like, /* Some comment */ do_something = 1; ... print_it(); ... could have become if (!dry_run) { /* Some comment */ do_something = 1; ... print_it(); ... } If you use my proposed method to adjust the line ranges to figure out the history of that `print_it()` line, this commit would blow up the line range to the entire conditional block, which is probably not what you wanted. I don't really see a better way, though. > Having said that, a much easier fix is to use range_set_append_unsafe() > here, and then at the bottom of the loop, invoke > 'sort_and_merge_range_set(out)' to restore range-set invariants and > ensure that neighboring ranges are coalesced. Not only does that resolve > the crash and other weird behavior, but it means you don't have to add a > special-case to range_set_append(), thus the fix becomes simpler > overall. That would only p
Re: [PATCH 2/4] line-log: adjust start/end of ranges individually
On Sun, Aug 5, 2018 at 6:14 AM Eric Sunshine wrote: > Having said that, a much easier fix is to use > range_set_append_unsafe() here, and then at the bottom of the loop, > invoke 'sort_and_merge_range_set(out)' to restore range-set invariants By "bottom", I meant "outside" or "after": ...and then invoke sort_and_merge_range_set() after the loop to restore range-set invariants.
Re: [PATCH 2/4] line-log: adjust start/end of ranges individually
On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget wrote: > When traversing commits and adjusting the ranges, things can get really > tricky. For example, when the line range of interest encloses several > hunks of a commit, the line range can actually shrink. > > Currently, range_set_shift_diff() does not anticipate that scenario and > blindly adjusts start and end by the same offset ("shift" the range). > [...] > Let's fix this by adjusting the start and the end offsets individually. > > Signed-off-by: Johannes Schindelin > --- > diff --git a/line-log.c b/line-log.c > @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out, > - (target[j].end-target[j].start); > j++; > } > - range_set_append(out, src[i].start+offset, src[i].end+offset); > + start_offset = offset; > + while (j < diff->target.nr && src[i].end > target[j].end) { > + offset += (parent[j].end-parent[j].start) > + - (target[j].end-target[j].start); > + j++; > + } > + range_set_append(out, src[i].start+start_offset, > src[i].end+offset); I'm still trying to wrap my head around the original code, so I'm not even at the point of being able to say if this fix is correct. What happens if the "start_offset" loop consumes all of 'j' before it even gets to the new loop? Why does the new loop use '>' whereas the existing uses '>='? Having said that, a much easier fix is to use range_set_append_unsafe() here, and then at the bottom of the loop, invoke 'sort_and_merge_range_set(out)' to restore range-set invariants and ensure that neighboring ranges are coalesced. Not only does that resolve the crash and other weird behavior, but it means you don't have to add a special-case to range_set_append(), thus the fix becomes simpler overall. Aside from simplicity, I think the suggested use of range_set_append_unsafe() and sort_and_merge_range_set() _is_ the correct fix anyhow because this code isn't taking care to ensure that the range, after applying 'offset', doesn't abut or overlap with an earlier range, and sort_and_merge_range_set() is meant to be used exactly in cases like this when invariants may be broken. So, while the suggested fix is simpler and "better" and fixes the crash, that doesn't necessarily mean that the values computed here are actually correct. As noted, I'm still trying to grok the computation of these values, but that's a separate issue from the crash itself.
[PATCH 2/4] line-log: adjust start/end of ranges individually
From: Johannes Schindelin When traversing commits and adjusting the ranges, things can get really tricky. For example, when the line range of interest encloses several hunks of a commit, the line range can actually shrink. Currently, range_set_shift_diff() does not anticipate that scenario and blindly adjusts start and end by the same offset ("shift" the range). This can lead to a couple of surprising issues, such as assertions in range_set_append() (when the end of a given range is not adjusted properly, it can point after the start of the next range) or even segmentation faults (when t_end in the loop of dump_diff_hacky_one() points outside the valid line range). Let's fix this by adjusting the start and the end offsets individually. Signed-off-by: Johannes Schindelin --- line-log.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/line-log.c b/line-log.c index 72a5fed66..d8d09b5ee 100644 --- a/line-log.c +++ b/line-log.c @@ -427,7 +427,7 @@ static void range_set_shift_diff(struct range_set *out, struct diff_ranges *diff) { unsigned int i, j = 0; - long offset = 0; + long offset = 0, start_offset; struct range *src = rs->ranges; struct range *target = diff->target.ranges; struct range *parent = diff->parent.ranges; @@ -438,7 +438,13 @@ static void range_set_shift_diff(struct range_set *out, - (target[j].end-target[j].start); j++; } - range_set_append(out, src[i].start+offset, src[i].end+offset); + start_offset = offset; + while (j < diff->target.nr && src[i].end > target[j].end) { + offset += (parent[j].end-parent[j].start) + - (target[j].end-target[j].start); + j++; + } + range_set_append(out, src[i].start+start_offset, src[i].end+offset); } } -- gitgitgadget