Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-10 Thread Stefan Beller
On Wed, Oct 10, 2018 at 8:26 AM Phillip Wood  wrote:
>
> On 09/10/2018 22:10, Stefan Beller wrote:
> >> As I said above I've more or less come to the view that the correctness
> >> of pythonic indentation is orthogonal to move detection as it affects
> >> all additions, not just those that correspond to moved lines.
> >
> > Makes sense.
>
> Right so are you happy for we to re-roll with a single
> allow-indentation-change mode based on my RFC?

I am happy with that.


Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-10 Thread Phillip Wood

On 09/10/2018 22:10, Stefan Beller wrote:

As I said above I've more or less come to the view that the correctness
of pythonic indentation is orthogonal to move detection as it affects
all additions, not just those that correspond to moved lines.


Makes sense.


Right so are you happy for we to re-roll with a single 
allow-indentation-change mode based on my RFC?





What is your use case, what kind of content do you process that
this patch would help you?


I wrote this because I was re-factoring some shell code than was using a
indentation step of four spaces but with tabs in the leading indentation
which the current mode does not handle.


Ah that is good to know.

I was thinking whether we want to generalize the move detection into a more
generic "detect and fade out uninteresting things" and not just focus on white
spaces (but these are most often the uninteresting things).

Over the last year we had quite a couple of large refactorings, that
would have helped by that:
* For example the hash transition plan had a lot of patches that
   were basically s/char *sha1/struct object oid/ or some variation thereof.
* Introducing struct repository

I used the word diff to look at those patches, which helped a lot, but
maybe a mode that would allow me to mark this specific replacement
uninteresting would be even better.
Maybe this can be done as a piggyback on top of the move detection as
a "move in place, but with uninteresting pattern". The problem of this
is that the pattern needs to be accounted for when hashing the entries
into the hashmaps, which is easy when doing white spaces only.


Yes the I like the idea. Yesterday I was looking at Alban's patches to 
refactor the todo list handling for rebase -i and there are a lot of '.' 
to '->' changes which weren't particularly interesting though at least 
diff-highlight made it clear if that was the only change on a line. 
Incidentally --color-moved was very useful for looking at that series.



+   if (a->s == DIFF_SYMBOL_PLUS)
+   *delta = la - lb;
+   else
+   *delta = lb - la;


When writing the original feature I had reasons
not to rely on the symbol, as you could have
moved things from + to - (or the other way round)
and added or removed indentation. That is what the
`current_longer` is used for. But given that you only
count here, we can have negative numbers, so it
would work either way for adding or removing indentation.

But then, why do we need to have a different sign
depending on the sign of the line?


The check means that we get the same delta whichever way round the lines
are compared. I think I added this because without it the highlighting
gets broken if there is increase in indentation followed by an identical
decrease on the next line.


But wouldn't we want to get that highlighted?
I do not quite understand the scenario, yet. Are both indented
and dedented part of the same block?


With --color-moved=zebra the indented lines and the de-indented lines 
should be different colors, without the test they both ended up in the 
same block.


Best Wishes

Phillip



+   } else {
+   BUG("no color_moved_ws_allow_indentation_change set");


Instead of the BUG here could we have a switch/case (or if/else)
covering the complete space of delta->have_string instead?
Then we would not leave a lingering bug in the code base.


I'm not sure what you mean, we cover all the existing
color_moved_ws_handling values, I added the BUG() call to pick up future
omissions if another mode is added. (If we go for a single mode none of
this matters)


Ah, makes sense!





Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-09 Thread Stefan Beller
> As I said above I've more or less come to the view that the correctness
> of pythonic indentation is orthogonal to move detection as it affects
> all additions, not just those that correspond to moved lines.

Makes sense.

> > What is your use case, what kind of content do you process that
> > this patch would help you?
>
> I wrote this because I was re-factoring some shell code than was using a
> indentation step of four spaces but with tabs in the leading indentation
> which the current mode does not handle.

Ah that is good to know.

I was thinking whether we want to generalize the move detection into a more
generic "detect and fade out uninteresting things" and not just focus on white
spaces (but these are most often the uninteresting things).

Over the last year we had quite a couple of large refactorings, that
would have helped by that:
* For example the hash transition plan had a lot of patches that
  were basically s/char *sha1/struct object oid/ or some variation thereof.
* Introducing struct repository

I used the word diff to look at those patches, which helped a lot, but
maybe a mode that would allow me to mark this specific replacement
uninteresting would be even better.
Maybe this can be done as a piggyback on top of the move detection as
a "move in place, but with uninteresting pattern". The problem of this
is that the pattern needs to be accounted for when hashing the entries
into the hashmaps, which is easy when doing white spaces only.


> >> +   if (a->s == DIFF_SYMBOL_PLUS)
> >> +   *delta = la - lb;
> >> +   else
> >> +   *delta = lb - la;
> >
> > When writing the original feature I had reasons
> > not to rely on the symbol, as you could have
> > moved things from + to - (or the other way round)
> > and added or removed indentation. That is what the
> > `current_longer` is used for. But given that you only
> > count here, we can have negative numbers, so it
> > would work either way for adding or removing indentation.
> >
> > But then, why do we need to have a different sign
> > depending on the sign of the line?
>
> The check means that we get the same delta whichever way round the lines
> are compared. I think I added this because without it the highlighting
> gets broken if there is increase in indentation followed by an identical
> decrease on the next line.

But wouldn't we want to get that highlighted?
I do not quite understand the scenario, yet. Are both indented
and dedented part of the same block?


> >
> >> +   } else {
> >> +   BUG("no color_moved_ws_allow_indentation_change set");
> >
> > Instead of the BUG here could we have a switch/case (or if/else)
> > covering the complete space of delta->have_string instead?
> > Then we would not leave a lingering bug in the code base.
>
> I'm not sure what you mean, we cover all the existing
> color_moved_ws_handling values, I added the BUG() call to pick up future
> omissions if another mode is added. (If we go for a single mode none of
> this matters)

Ah, makes sense!


[RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-09 Thread Phillip Wood
Hi Stefan

Thanks for all your comments on this, they've been really helpful.

On 25/09/2018 02:07, Stefan Beller wrote:
> On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood  
> wrote:
>>
>> From: Phillip Wood 
>>
>> This adds another mode for highlighting lines that have moved with an
>> indentation change. Unlike the existing
>> --color-moved-ws=allow-indentation-change setting this mode uses the
>> visible change in the indentation to group lines, rather than the
>> indentation string.
> 
> Wow! Thanks for putting this RFC out.
> My original vision was to be useful to python users as well,
> which counts 1 tab as 8 spaces IIUC.
> 
> The "visual" indentation you mention here sounds like
> a tab is counted as "up to the next position of (n-1) % 8",
> i.e. stop at positions 8, 16, 24... which would not be pythonic,
> but useful in e.g. our code base.

The docs for python2 state[1]

  Leading whitespace (spaces and tabs) at the beginning of a logical
  line is used to compute the indentation level of the line, which in
  turn is used to determine the grouping of statements.

  First, tabs are replaced (from left to right) by one to eight spaces
  such that the total number of characters up to and including the
  replacement is a multiple of eight (this is intended to be the same
  rule as used by Unix). The total number of spaces preceding the
  first non-blank character then determines the line’s
  indentation. Indentation cannot be split over multiple physical
  lines using backslashes; the whitespace up to the first backslash
  determines the indentation.

As I understand it that fits with the "visual" indentation implemented
by this patch.

For python3 adds a third paragraph[2]

  Indentation is rejected as inconsistent if a source file mixes tabs
  and spaces in a way that makes the meaning dependent on the worth of
  a tab in spaces; a TabError is raised in that case.

My impression is that people generally avoid mixing tabs and spaces in
python3 code, in which case I wonder if the "visual" indentation
combined with a suitable setting for core.whitespace to highlight
erroneous tabs/spaces would be enough. (I'm not a python programmer so I
could be completely wrong on that)

In any case the more I think about it the more convinced I am that
having a move detection mode for "pythonic" indentation is a mistake. If
a line is added with dodgy indentation then it is a problem whether or
not it has been moved so I think this should be handled by the
whitespace error highlighting. This would allow a single mode for move
detection with an indentation change.

[1] https://docs.python.org/2.7/reference/lexical_analysis.html#indentation
[2] https://docs.python.org/3.7/reference/lexical_analysis.html#indentation

>> This means it works with files that use a mix of
>> tabs and spaces for indentation and can cope with whitespace errors
>> where there is a space before a tab
> 
> Cool!
> 
>> (it's the job of
>> --ws-error-highlight to deal with those errors, it should affect the
>> move detection).
> 
> Not sure I understand this side note. So --ws-error-highlight can
> highlight them, but the move detection should *not*(?) be affected
> by the highlighted parts, or it should do things differently on
> whether  --ws-error-highlight is given?

I just meant that the move detection should pretend the whitespace
errors do not exist.

>> It will also group the lines either
>> side of a blank line if their indentation change matches so short
>> lines followed by a blank line followed by more lines with the same
>> indentation change will be correctly highlighted.
> 
> That sounds very useful (at least for my editor, that strips
> blank lines to be empty lines), but I would think this feature is
> worth its own commit/patch.
> 
> I wonder how much this feature is orthogonal to the existing
> problem of detecting the moved indented blocks (existing
> allow-indentation-change vs the new feature discussed first
> above)

It only works if the blank lines get moved with the non-blank lines
around it, then it matches the normal moved behavior I think. I'd like
to have it include blank context lines where the lines either side have
the same indentation change but that is trickier to implement.

>>
>> This is a RFC as there are a number of questions about how to proceed
>> from here:
>>  1) Do we need a second option or should this implementation replace
>> --color-moved-ws=allow-indentation-change. I'm unclear if that mode
>> has any advantages for some people. There seems to have been an
>> intention [1] to get it working with mixes of tabs and spaces but
>> nothing ever came of it.
> 
> Oh, yeah, I was working on that, but dropped the ball.
> 
> I am not sure what the best end goal is, or if there are many different
> modes that are useful to different target audiences.
> My own itch at the time was (de-/)in-dented code from refactoring
> patches for git.git and JGit (so Java, C, shell); and I think not hurting
> 

Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-09-24 Thread Stefan Beller
On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> This adds another mode for highlighting lines that have moved with an
> indentation change. Unlike the existing
> --color-moved-ws=allow-indentation-change setting this mode uses the
> visible change in the indentation to group lines, rather than the
> indentation string.

Wow! Thanks for putting this RFC out.
My original vision was to be useful to python users as well,
which counts 1 tab as 8 spaces IIUC.

The "visual" indentation you mention here sounds like
a tab is counted as "up to the next position of (n-1) % 8",
i.e. stop at positions 8, 16, 24... which would not be pythonic,
but useful in e.g. our code base.

> This means it works with files that use a mix of
> tabs and spaces for indentation and can cope with whitespace errors
> where there is a space before a tab

Cool!

> (it's the job of
> --ws-error-highlight to deal with those errors, it should affect the
> move detection).

Not sure I understand this side note. So --ws-error-highlight can
highlight them, but the move detection should *not*(?) be affected
by the highlighted parts, or it should do things differently on
whether  --ws-error-highlight is given?

> It will also group the lines either
> side of a blank line if their indentation change matches so short
> lines followed by a blank line followed by more lines with the same
> indentation change will be correctly highlighted.

That sounds very useful (at least for my editor, that strips
blank lines to be empty lines), but I would think this feature is
worth its own commit/patch.

I wonder how much this feature is orthogonal to the existing
problem of detecting the moved indented blocks (existing
allow-indentation-change vs the new feature discussed first
above)

>
> This is a RFC as there are a number of questions about how to proceed
> from here:
>  1) Do we need a second option or should this implementation replace
> --color-moved-ws=allow-indentation-change. I'm unclear if that mode
> has any advantages for some people. There seems to have been an
> intention [1] to get it working with mixes of tabs and spaces but
> nothing ever came of it.

Oh, yeah, I was working on that, but dropped the ball.

I am not sure what the best end goal is, or if there are many different
modes that are useful to different target audiences.
My own itch at the time was (de-/)in-dented code from refactoring
patches for git.git and JGit (so Java, C, shell); and I think not hurting
python would also be good.

ignoring the mixture of ws seems like it would also cater free text or
other more exotic languages.

What is your use case, what kind of content do you process that
this patch would help you?

I am not overly attached to the current implementation of
 --color-moved-ws=allow-indentation-change,
and I think Junio has expressed the fear of "too many options"
already in this problem space, so if possible I would extend/replace
the current option.

>  2) If we keep two options what should this option be called, the name
> is long and ambiguous at the moment - mixed could refer to mixed
> indentation length rather than a mix of tabs and spaces.

Let's first read the code to have an opinion, or re-state the question
from above ("What is this used for?") as I could imagine one of the
modes could be "ws-pythonic" and allow for whitespace indentation
that would have the whole block count as an indented by the same
amount, (e.g. if you wrap a couple functions in python by a class).

> +++ b/diff.c
> @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
> else if (!strcmp(sb.buf, "ignore-all-space"))
> ret |= XDF_IGNORE_WHITESPACE;
> else if (!strcmp(sb.buf, "allow-indentation-change"))
> -   ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
> +   ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +(ret & 
> ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);

So this RFC lets "allow-indentation-change" override
"allow-mixed-indentation-change" and vice versa. That
also solves the issue of configuring one and giving the other
as a command line option. Nice.

> if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
> (ret & XDF_WHITESPACE_FLAGS))
> die(_("color-moved-ws: allow-indentation-change cannot be 
> combined with other white space modes"));
> +   else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
> +(ret & XDF_WHITESPACE_FLAGS))
> +   die(_("color-moved-ws: allow-mixed-indentation-change cannot 
> be combined with other white space modes"));

Do we want to open a bit mask for all indentation change options? e.g.
#define COLOR_MOVED_WS_INDENTATION_MASK \
(COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | \
 COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE)

> @@ -763,11 +770,65 @@ struct moved_entry 

[RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-09-24 Thread Phillip Wood
From: Phillip Wood 

This adds another mode for highlighting lines that have moved with an
indentation change. Unlike the existing
--color-moved-ws=allow-indentation-change setting this mode uses the
visible change in the indentation to group lines, rather than the
indentation string. This means it works with files that use a mix of
tabs and spaces for indentation and can cope with whitespace errors
where there is a space before a tab (it's the job of
--ws-error-highlight to deal with those errors, it should affect the
move detection). It will also group the lines either
side of a blank line if their indentation change matches so short
lines followed by a blank line followed by more lines with the same
indentation change will be correctly highlighted.

This is a RFC as there are a number of questions about how to proceed
from here:
 1) Do we need a second option or should this implementation replace
--color-moved-ws=allow-indentation-change. I'm unclear if that mode
has any advantages for some people. There seems to have been an
intention [1] to get it working with mixes of tabs and spaces but
nothing ever came of it.
 2) If we keep two options what should this option be called, the name
is long and ambiguous at the moment - mixed could refer to mixed
indentation length rather than a mix of tabs and spaces.
 3) Should we support whitespace flags with this mode?
--ignore-space-at-eol and --ignore-cr-at eol would be fairly simple
to support and I can see a use for them, --ignore-all-space and
--ignore-space-change would need some changes to xdiff to allow them
to apply only after the indentation. I think --ignore-blank-lines
would need a bit of work to get it working as well. (Note the
existing mode does not support any of these flags either)

[1] 
https://public-inbox.org/git/CAGZ79kasAqE+=7ciVrdjoRdu0UFjVBr8Ma502nw+3hZL=eb...@mail.gmail.com/

Signed-off-by: Phillip Wood 
---
 diff.c | 122 +
 diff.h |   1 +
 t/t4015-diff-whitespace.sh |  89 +++
 3 files changed, 199 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 0a652e28d4..45f33daa60 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
else if (!strcmp(sb.buf, "ignore-all-space"))
ret |= XDF_IGNORE_WHITESPACE;
else if (!strcmp(sb.buf, "allow-indentation-change"))
-   ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
+   ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+(ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);
+   else if (!strcmp(sb.buf, "allow-mixed-indentation-change"))
+   ret = COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE |
+(ret & ~COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE);
else
error(_("ignoring unknown color-moved-ws mode '%s'"), 
sb.buf);
 
@@ -314,6 +318,9 @@ static int parse_color_moved_ws(const char *arg)
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
(ret & XDF_WHITESPACE_FLAGS))
die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
+(ret & XDF_WHITESPACE_FLAGS))
+   die(_("color-moved-ws: allow-mixed-indentation-change cannot be 
combined with other white space modes"));
 
string_list_clear(, 0);
 
@@ -763,11 +770,65 @@ struct moved_entry {
  * comparision is longer than the second.
  */
 struct ws_delta {
-   char *string;
+   union {
+   int delta;
+   char *string;
+   };
unsigned int current_longer : 1;
+   unsigned int have_string : 1;
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
+ const struct emitted_diff_symbol *b,
+ int *delta)
+{
+   unsigned int i = 0, j = 0;
+   int la, lb;
+   int ta = a->flags & WS_TAB_WIDTH_MASK;
+   int tb = b->flags & WS_TAB_WIDTH_MASK;
+   const char *sa = a->line;
+   const char *sb = b->line;
+
+   if (xdiff_is_blankline(sa, a->len, 0) &&
+   xdiff_is_blankline(sb, b->len, 0)) {
+   *delta = INT_MIN;
+   return 1;
+   }
+
+   /* skip any \v \f \r at start of indentation */
+   while (sa[i] == '\f' || sa[i] == '\v' ||
+  (sa[i] == '\r' && i < a->len - 1))
+   i++;
+   while (sb[j] == '\f' || sb[j] == '\v' ||
+  (sb[j] == '\r' && j < b->len - 1))
+   j++;
+
+   for (la = 0; ; i++) {
+   if (sa[i] == ' ')
+   la++;
+   else if