Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-17 Thread Phillip Wood

Hi Stefan

On 16/11/2018 21:47, Stefan Beller wrote:

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:


From: Phillip Wood 

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has they benefit that any whitespace errors do not


s/they/the/


Thanks, well spotted




interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
 lines each line was colored as a different block, since that commit
 they are uncolored.

Signed-off-by: Phillip Wood 
---

Notes:
 Changes since rfc:
  - It now replaces the existing implementation rather than adding a new
mode.
  - The indentation deltas are now calculated once for each line and
cached.
  - Optimized the whitespace delta comparison to compare string lengths
before comparing the actual strings.
  - Modified the calculation of tabs as suggested by Stefan.
  - Split out the blank line handling into a separate commit as suggest
by Stefan.
  - Fixed some comments pointed out by Stefan.

  diff.c | 130 +
  t/t4015-diff-whitespace.sh |  56 
  2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..89559293e7 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
 const char *line;
 int len;
 int flags;
+   int indent_off;
+   int indent_width;


So this is the trick how we compute the ws related
data only once per line. :-)

On the other hand, we do not save memory by disabling
the ws detection, but I guess that is not a problem for now.


I did wonder about that, but decided the increase was small compared to 
all the strings that are copied when creating the emitted_diff_symbols. 
If we want to save memory then we should stop struct 
emitted_diff_symbol() from carrying a copy of all the strings.



Would it make sense to have the new variables be
unsigned? (Also a comment on what they are, I
needed to read the code to understand off to be
offset into the line, where the content starts, and
width to be the visual width, as I did not recall
the RFC.)


Yes a comment would make sense. I don't think I have a strong preference 
for signed/unsigned, I can change it if you want.


Thanks for looking at these so promptly

Best Wishes

Phillip

+static void fill_es_indent_data(struct emitted_diff_symbol *es)
[...]



+   if (o->color_moved_ws_handling &
+   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+   fill_es_indent_data(>emitted_symbols->buf[n]);


Nice.

By reducing the information kept around to ints, we also do not need
to alloc/free
memory for each line.


+++ b/t/t4015-diff-whitespace.sh
@@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta 
incompatible with other space opti
 test_i18ngrep allow-indentation-change err
  '

+test_expect_success 'compare mixed whitespace delta across moved blocks' '


Looks good,

Thanks!
Stefan





Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-16 Thread Stefan Beller
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Currently diff --color-moved-ws=allow-indentation-change does not
> support indentation that contains a mix of tabs and spaces. For
> example in commit 546f70f377 ("convert.h: drop 'extern' from function
> declaration", 2018-06-30) the function parameters in the following
> lines are not colored as moved [1].
>
> -extern int stream_filter(struct stream_filter *,
> -const char *input, size_t *isize_p,
> -char *output, size_t *osize_p);
> +int stream_filter(struct stream_filter *,
> + const char *input, size_t *isize_p,
> + char *output, size_t *osize_p);
>
> This commit changes the way the indentation is handled to track the
> visual size of the indentation rather than the characters in the
> indentation. This has they benefit that any whitespace errors do not

s/they/the/

> interfer with the move detection (the whitespace errors will still be
> highlighted according to --ws-error-highlight). During the discussion
> of this feature there were concerns about the correct detection of
> indentation for python. However those concerns apply whether or not
> we're detecting moved lines so no attempt is made to determine if the
> indentation is 'pythonic'.
>
> [1] Note that before the commit to fix the erroneous coloring of moved
> lines each line was colored as a different block, since that commit
> they are uncolored.
>
> Signed-off-by: Phillip Wood 
> ---
>
> Notes:
> Changes since rfc:
>  - It now replaces the existing implementation rather than adding a new
>mode.
>  - The indentation deltas are now calculated once for each line and
>cached.
>  - Optimized the whitespace delta comparison to compare string lengths
>before comparing the actual strings.
>  - Modified the calculation of tabs as suggested by Stefan.
>  - Split out the blank line handling into a separate commit as suggest
>by Stefan.
>  - Fixed some comments pointed out by Stefan.
>
>  diff.c | 130 +
>  t/t4015-diff-whitespace.sh |  56 
>  2 files changed, 129 insertions(+), 57 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c378ce3daf..89559293e7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -750,6 +750,8 @@ struct emitted_diff_symbol {
> const char *line;
> int len;
> int flags;
> +   int indent_off;
> +   int indent_width;

So this is the trick how we compute the ws related
data only once per line. :-)

On the other hand, we do not save memory by disabling
the ws detection, but I guess that is not a problem for now.

Would it make sense to have the new variables be
unsigned? (Also a comment on what they are, I
needed to read the code to understand off to be
offset into the line, where the content starts, and
width to be the visual width, as I did not recall
the RFC.)

> +static void fill_es_indent_data(struct emitted_diff_symbol *es)
> [...]

> +   if (o->color_moved_ws_handling &
> +   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +   fill_es_indent_data(>emitted_symbols->buf[n]);

Nice.

By reducing the information kept around to ints, we also do not need
to alloc/free
memory for each line.

> +++ b/t/t4015-diff-whitespace.sh
> @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta 
> incompatible with other space opti
> test_i18ngrep allow-indentation-change err
>  '
>
> +test_expect_success 'compare mixed whitespace delta across moved blocks' '

Looks good,

Thanks!
Stefan


[PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has they benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
lines each line was colored as a different block, since that commit
they are uncolored.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since rfc:
 - It now replaces the existing implementation rather than adding a new
   mode.
 - The indentation deltas are now calculated once for each line and
   cached.
 - Optimized the whitespace delta comparison to compare string lengths
   before comparing the actual strings.
 - Modified the calculation of tabs as suggested by Stefan.
 - Split out the blank line handling into a separate commit as suggest
   by Stefan.
 - Fixed some comments pointed out by Stefan.

 diff.c | 130 +
 t/t4015-diff-whitespace.sh |  56 
 2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..89559293e7 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
const char *line;
int len;
int flags;
+   int indent_off;
+   int indent_width;
enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -780,44 +782,68 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-/**
- * The struct ws_delta holds white space differences between moved lines, i.e.
- * between '+' and '-' lines that have been detected to be a move.
- * The string contains the difference in leading white spaces, before the
- * rest of the line is compared using the white space config for move
- * coloring. The current_longer indicates if the first string in the
- * comparision is longer than the second.
- */
-struct ws_delta {
-   char *string;
-   unsigned int current_longer : 1;
-};
-#define WS_DELTA_INIT { NULL, 0 }
-
 struct moved_block {
struct moved_entry *match;
-   struct ws_delta wsd;
+   int wsd; /* The whitespace delta of this block */
 };
 
 static void moved_block_clear(struct moved_block *b)
 {
-   FREE_AND_NULL(b->wsd.string);
-   b->match = NULL;
+   memset(b, 0, sizeof(*b));
+}
+
+static void fill_es_indent_data(struct emitted_diff_symbol *es)
+{
+   unsigned int off = 0;
+   int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
+   const char *s = es->line;
+   const int len = es->len;
+
+   /* skip any \v \f \r at start of indentation */
+   while (s[off] == '\f' || s[off] == '\v' ||
+  (s[off] == '\r' && off < len - 1))
+   off++;
+
+   /* calculate the visual width of indentation */
+   while(1) {
+   if (s[off] == ' ') {
+   width++;
+   off++;
+   } else if (s[off] == '\t') {
+   width += tab_width - (width % tab_width);
+   while (s[++off] == '\t')
+   width += tab_width;
+   } else {
+   break;
+   }
+   }
+
+   es->indent_off = off;
+   es->indent_width = width;
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-const struct emitted_diff_symbol *b,
-struct ws_delta *out)
+   const struct emitted_diff_symbol *b,
+   int *out)
 {
-   const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
-   const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
-   int d = longer->len - shorter->len;
+   int a_len = a->len,
+   b_len = b->len,
+   a_off = a->indent_off,
+   a_width = a->indent_width,
+   b_off =