Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
On Wed, Oct 10, 2018 at 8:26 AM Phillip Wood wrote: > > On 09/10/2018 22:10, Stefan Beller wrote: > >> As I said above I've more or less come to the view that the correctness > >> of pythonic indentation is orthogonal to move detection as it affects > >> all additions, not just those that correspond to moved lines. > > > > Makes sense. > > Right so are you happy for we to re-roll with a single > allow-indentation-change mode based on my RFC? I am happy with that.
Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
On 09/10/2018 22:10, Stefan Beller wrote: As I said above I've more or less come to the view that the correctness of pythonic indentation is orthogonal to move detection as it affects all additions, not just those that correspond to moved lines. Makes sense. Right so are you happy for we to re-roll with a single allow-indentation-change mode based on my RFC? What is your use case, what kind of content do you process that this patch would help you? I wrote this because I was re-factoring some shell code than was using a indentation step of four spaces but with tabs in the leading indentation which the current mode does not handle. Ah that is good to know. I was thinking whether we want to generalize the move detection into a more generic "detect and fade out uninteresting things" and not just focus on white spaces (but these are most often the uninteresting things). Over the last year we had quite a couple of large refactorings, that would have helped by that: * For example the hash transition plan had a lot of patches that were basically s/char *sha1/struct object oid/ or some variation thereof. * Introducing struct repository I used the word diff to look at those patches, which helped a lot, but maybe a mode that would allow me to mark this specific replacement uninteresting would be even better. Maybe this can be done as a piggyback on top of the move detection as a "move in place, but with uninteresting pattern". The problem of this is that the pattern needs to be accounted for when hashing the entries into the hashmaps, which is easy when doing white spaces only. Yes the I like the idea. Yesterday I was looking at Alban's patches to refactor the todo list handling for rebase -i and there are a lot of '.' to '->' changes which weren't particularly interesting though at least diff-highlight made it clear if that was the only change on a line. Incidentally --color-moved was very useful for looking at that series. + if (a->s == DIFF_SYMBOL_PLUS) + *delta = la - lb; + else + *delta = lb - la; When writing the original feature I had reasons not to rely on the symbol, as you could have moved things from + to - (or the other way round) and added or removed indentation. That is what the `current_longer` is used for. But given that you only count here, we can have negative numbers, so it would work either way for adding or removing indentation. But then, why do we need to have a different sign depending on the sign of the line? The check means that we get the same delta whichever way round the lines are compared. I think I added this because without it the highlighting gets broken if there is increase in indentation followed by an identical decrease on the next line. But wouldn't we want to get that highlighted? I do not quite understand the scenario, yet. Are both indented and dedented part of the same block? With --color-moved=zebra the indented lines and the de-indented lines should be different colors, without the test they both ended up in the same block. Best Wishes Phillip + } else { + BUG("no color_moved_ws_allow_indentation_change set"); Instead of the BUG here could we have a switch/case (or if/else) covering the complete space of delta->have_string instead? Then we would not leave a lingering bug in the code base. I'm not sure what you mean, we cover all the existing color_moved_ws_handling values, I added the BUG() call to pick up future omissions if another mode is added. (If we go for a single mode none of this matters) Ah, makes sense!
Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
> As I said above I've more or less come to the view that the correctness > of pythonic indentation is orthogonal to move detection as it affects > all additions, not just those that correspond to moved lines. Makes sense. > > What is your use case, what kind of content do you process that > > this patch would help you? > > I wrote this because I was re-factoring some shell code than was using a > indentation step of four spaces but with tabs in the leading indentation > which the current mode does not handle. Ah that is good to know. I was thinking whether we want to generalize the move detection into a more generic "detect and fade out uninteresting things" and not just focus on white spaces (but these are most often the uninteresting things). Over the last year we had quite a couple of large refactorings, that would have helped by that: * For example the hash transition plan had a lot of patches that were basically s/char *sha1/struct object oid/ or some variation thereof. * Introducing struct repository I used the word diff to look at those patches, which helped a lot, but maybe a mode that would allow me to mark this specific replacement uninteresting would be even better. Maybe this can be done as a piggyback on top of the move detection as a "move in place, but with uninteresting pattern". The problem of this is that the pattern needs to be accounted for when hashing the entries into the hashmaps, which is easy when doing white spaces only. > >> + if (a->s == DIFF_SYMBOL_PLUS) > >> + *delta = la - lb; > >> + else > >> + *delta = lb - la; > > > > When writing the original feature I had reasons > > not to rely on the symbol, as you could have > > moved things from + to - (or the other way round) > > and added or removed indentation. That is what the > > `current_longer` is used for. But given that you only > > count here, we can have negative numbers, so it > > would work either way for adding or removing indentation. > > > > But then, why do we need to have a different sign > > depending on the sign of the line? > > The check means that we get the same delta whichever way round the lines > are compared. I think I added this because without it the highlighting > gets broken if there is increase in indentation followed by an identical > decrease on the next line. But wouldn't we want to get that highlighted? I do not quite understand the scenario, yet. Are both indented and dedented part of the same block? > > > >> + } else { > >> + BUG("no color_moved_ws_allow_indentation_change set"); > > > > Instead of the BUG here could we have a switch/case (or if/else) > > covering the complete space of delta->have_string instead? > > Then we would not leave a lingering bug in the code base. > > I'm not sure what you mean, we cover all the existing > color_moved_ws_handling values, I added the BUG() call to pick up future > omissions if another mode is added. (If we go for a single mode none of > this matters) Ah, makes sense!
[RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
Hi Stefan Thanks for all your comments on this, they've been really helpful. On 25/09/2018 02:07, Stefan Beller wrote: > On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood > wrote: >> >> From: Phillip Wood >> >> This adds another mode for highlighting lines that have moved with an >> indentation change. Unlike the existing >> --color-moved-ws=allow-indentation-change setting this mode uses the >> visible change in the indentation to group lines, rather than the >> indentation string. > > Wow! Thanks for putting this RFC out. > My original vision was to be useful to python users as well, > which counts 1 tab as 8 spaces IIUC. > > The "visual" indentation you mention here sounds like > a tab is counted as "up to the next position of (n-1) % 8", > i.e. stop at positions 8, 16, 24... which would not be pythonic, > but useful in e.g. our code base. The docs for python2 state[1] Leading whitespace (spaces and tabs) at the beginning of a logical line is used to compute the indentation level of the line, which in turn is used to determine the grouping of statements. First, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation. As I understand it that fits with the "visual" indentation implemented by this patch. For python3 adds a third paragraph[2] Indentation is rejected as inconsistent if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces; a TabError is raised in that case. My impression is that people generally avoid mixing tabs and spaces in python3 code, in which case I wonder if the "visual" indentation combined with a suitable setting for core.whitespace to highlight erroneous tabs/spaces would be enough. (I'm not a python programmer so I could be completely wrong on that) In any case the more I think about it the more convinced I am that having a move detection mode for "pythonic" indentation is a mistake. If a line is added with dodgy indentation then it is a problem whether or not it has been moved so I think this should be handled by the whitespace error highlighting. This would allow a single mode for move detection with an indentation change. [1] https://docs.python.org/2.7/reference/lexical_analysis.html#indentation [2] https://docs.python.org/3.7/reference/lexical_analysis.html#indentation >> This means it works with files that use a mix of >> tabs and spaces for indentation and can cope with whitespace errors >> where there is a space before a tab > > Cool! > >> (it's the job of >> --ws-error-highlight to deal with those errors, it should affect the >> move detection). > > Not sure I understand this side note. So --ws-error-highlight can > highlight them, but the move detection should *not*(?) be affected > by the highlighted parts, or it should do things differently on > whether --ws-error-highlight is given? I just meant that the move detection should pretend the whitespace errors do not exist. >> It will also group the lines either >> side of a blank line if their indentation change matches so short >> lines followed by a blank line followed by more lines with the same >> indentation change will be correctly highlighted. > > That sounds very useful (at least for my editor, that strips > blank lines to be empty lines), but I would think this feature is > worth its own commit/patch. > > I wonder how much this feature is orthogonal to the existing > problem of detecting the moved indented blocks (existing > allow-indentation-change vs the new feature discussed first > above) It only works if the blank lines get moved with the non-blank lines around it, then it matches the normal moved behavior I think. I'd like to have it include blank context lines where the lines either side have the same indentation change but that is trickier to implement. >> >> This is a RFC as there are a number of questions about how to proceed >> from here: >> 1) Do we need a second option or should this implementation replace >> --color-moved-ws=allow-indentation-change. I'm unclear if that mode >> has any advantages for some people. There seems to have been an >> intention [1] to get it working with mixes of tabs and spaces but >> nothing ever came of it. > > Oh, yeah, I was working on that, but dropped the ball. > > I am not sure what the best end goal is, or if there are many different > modes that are useful to different target audiences. > My own itch at the time was (de-/)in-dented code from refactoring > patches for git.git and JGit (so Java, C, shell); and I think not hurting >
Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood wrote: > > From: Phillip Wood > > This adds another mode for highlighting lines that have moved with an > indentation change. Unlike the existing > --color-moved-ws=allow-indentation-change setting this mode uses the > visible change in the indentation to group lines, rather than the > indentation string. Wow! Thanks for putting this RFC out. My original vision was to be useful to python users as well, which counts 1 tab as 8 spaces IIUC. The "visual" indentation you mention here sounds like a tab is counted as "up to the next position of (n-1) % 8", i.e. stop at positions 8, 16, 24... which would not be pythonic, but useful in e.g. our code base. > This means it works with files that use a mix of > tabs and spaces for indentation and can cope with whitespace errors > where there is a space before a tab Cool! > (it's the job of > --ws-error-highlight to deal with those errors, it should affect the > move detection). Not sure I understand this side note. So --ws-error-highlight can highlight them, but the move detection should *not*(?) be affected by the highlighted parts, or it should do things differently on whether --ws-error-highlight is given? > It will also group the lines either > side of a blank line if their indentation change matches so short > lines followed by a blank line followed by more lines with the same > indentation change will be correctly highlighted. That sounds very useful (at least for my editor, that strips blank lines to be empty lines), but I would think this feature is worth its own commit/patch. I wonder how much this feature is orthogonal to the existing problem of detecting the moved indented blocks (existing allow-indentation-change vs the new feature discussed first above) > > This is a RFC as there are a number of questions about how to proceed > from here: > 1) Do we need a second option or should this implementation replace > --color-moved-ws=allow-indentation-change. I'm unclear if that mode > has any advantages for some people. There seems to have been an > intention [1] to get it working with mixes of tabs and spaces but > nothing ever came of it. Oh, yeah, I was working on that, but dropped the ball. I am not sure what the best end goal is, or if there are many different modes that are useful to different target audiences. My own itch at the time was (de-/)in-dented code from refactoring patches for git.git and JGit (so Java, C, shell); and I think not hurting python would also be good. ignoring the mixture of ws seems like it would also cater free text or other more exotic languages. What is your use case, what kind of content do you process that this patch would help you? I am not overly attached to the current implementation of --color-moved-ws=allow-indentation-change, and I think Junio has expressed the fear of "too many options" already in this problem space, so if possible I would extend/replace the current option. > 2) If we keep two options what should this option be called, the name > is long and ambiguous at the moment - mixed could refer to mixed > indentation length rather than a mix of tabs and spaces. Let's first read the code to have an opinion, or re-state the question from above ("What is this used for?") as I could imagine one of the modes could be "ws-pythonic" and allow for whitespace indentation that would have the whole block count as an indented by the same amount, (e.g. if you wrap a couple functions in python by a class). > +++ b/diff.c > @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg) > else if (!strcmp(sb.buf, "ignore-all-space")) > ret |= XDF_IGNORE_WHITESPACE; > else if (!strcmp(sb.buf, "allow-indentation-change")) > - ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE; > + ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | > +(ret & > ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE); So this RFC lets "allow-indentation-change" override "allow-mixed-indentation-change" and vice versa. That also solves the issue of configuring one and giving the other as a command line option. Nice. > if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) && > (ret & XDF_WHITESPACE_FLAGS)) > die(_("color-moved-ws: allow-indentation-change cannot be > combined with other white space modes")); > + else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) && > +(ret & XDF_WHITESPACE_FLAGS)) > + die(_("color-moved-ws: allow-mixed-indentation-change cannot > be combined with other white space modes")); Do we want to open a bit mask for all indentation change options? e.g. #define COLOR_MOVED_WS_INDENTATION_MASK \ (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | \ COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) > @@ -763,11 +770,65 @@ struct moved_entry
[RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
From: Phillip Wood This adds another mode for highlighting lines that have moved with an indentation change. Unlike the existing --color-moved-ws=allow-indentation-change setting this mode uses the visible change in the indentation to group lines, rather than the indentation string. This means it works with files that use a mix of tabs and spaces for indentation and can cope with whitespace errors where there is a space before a tab (it's the job of --ws-error-highlight to deal with those errors, it should affect the move detection). It will also group the lines either side of a blank line if their indentation change matches so short lines followed by a blank line followed by more lines with the same indentation change will be correctly highlighted. This is a RFC as there are a number of questions about how to proceed from here: 1) Do we need a second option or should this implementation replace --color-moved-ws=allow-indentation-change. I'm unclear if that mode has any advantages for some people. There seems to have been an intention [1] to get it working with mixes of tabs and spaces but nothing ever came of it. 2) If we keep two options what should this option be called, the name is long and ambiguous at the moment - mixed could refer to mixed indentation length rather than a mix of tabs and spaces. 3) Should we support whitespace flags with this mode? --ignore-space-at-eol and --ignore-cr-at eol would be fairly simple to support and I can see a use for them, --ignore-all-space and --ignore-space-change would need some changes to xdiff to allow them to apply only after the indentation. I think --ignore-blank-lines would need a bit of work to get it working as well. (Note the existing mode does not support any of these flags either) [1] https://public-inbox.org/git/CAGZ79kasAqE+=7ciVrdjoRdu0UFjVBr8Ma502nw+3hZL=eb...@mail.gmail.com/ Signed-off-by: Phillip Wood --- diff.c | 122 + diff.h | 1 + t/t4015-diff-whitespace.sh | 89 +++ 3 files changed, 199 insertions(+), 13 deletions(-) diff --git a/diff.c b/diff.c index 0a652e28d4..45f33daa60 100644 --- a/diff.c +++ b/diff.c @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg) else if (!strcmp(sb.buf, "ignore-all-space")) ret |= XDF_IGNORE_WHITESPACE; else if (!strcmp(sb.buf, "allow-indentation-change")) - ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE; + ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | +(ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE); + else if (!strcmp(sb.buf, "allow-mixed-indentation-change")) + ret = COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE | +(ret & ~COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE); else error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf); @@ -314,6 +318,9 @@ static int parse_color_moved_ws(const char *arg) if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) && (ret & XDF_WHITESPACE_FLAGS)) die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes")); + else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) && +(ret & XDF_WHITESPACE_FLAGS)) + die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes")); string_list_clear(, 0); @@ -763,11 +770,65 @@ struct moved_entry { * comparision is longer than the second. */ struct ws_delta { - char *string; + union { + int delta; + char *string; + }; unsigned int current_longer : 1; + unsigned int have_string : 1; }; #define WS_DELTA_INIT { NULL, 0 } +static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a, + const struct emitted_diff_symbol *b, + int *delta) +{ + unsigned int i = 0, j = 0; + int la, lb; + int ta = a->flags & WS_TAB_WIDTH_MASK; + int tb = b->flags & WS_TAB_WIDTH_MASK; + const char *sa = a->line; + const char *sb = b->line; + + if (xdiff_is_blankline(sa, a->len, 0) && + xdiff_is_blankline(sb, b->len, 0)) { + *delta = INT_MIN; + return 1; + } + + /* skip any \v \f \r at start of indentation */ + while (sa[i] == '\f' || sa[i] == '\v' || + (sa[i] == '\r' && i < a->len - 1)) + i++; + while (sb[j] == '\f' || sb[j] == '\v' || + (sb[j] == '\r' && j < b->len - 1)) + j++; + + for (la = 0; ; i++) { + if (sa[i] == ' ') + la++; + else if