Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

2017-06-29 Thread Junio C Hamano
On Thu, Jun 29, 2017 at 2:01 PM, Stefan Beller  wrote:
> On Tue, Jun 27, 2017 at 10:06 PM, Junio C Hamano  wrote:
>> Junio C Hamano  writes:
>> This probably deserves a bit more illustration of how I envision the
>> code should evolve.
>> ...
> It turns out that the code here is rather a very loose proposal,
> not to be copied literally,

Yeah, whenever I say "illustration", do not take it as anything more than
scribbling on the back of an envelope, whose primary purpose is to show
some idea (in this case, the central idea is that a helper function that lets
you ask "what's the next byte that matters? tell me also when you run out"
is all you need to do an in-place comparison with various "ignore" modes;
by the way, that applies also to ignore-case comparison if there is a need
to support it).

The implementation detail to support that central idea is left
to the readers. ;-)

Thanks.


Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

2017-06-29 Thread Stefan Beller
On Tue, Jun 27, 2017 at 10:06 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Looking at the implementation of get_ws_cleaned_string() that is the
>> workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong
>> things for various "ignore whitespace" options (i.e. there is only
>> one implementation, while "git diff" family takes things like
>> "ignore space change", "ignore all whitespace", etc.), though.
>
> This probably deserves a bit more illustration of how I envision the
> code should evolve.
>
> In the longer term, I would prefer to see emitted_symbol_cmp_no_ws()
> to go and instead emitted_symbol_cmp() to take the diff options so
> that it can change the behaviour of the comparison function based on
> the -w/-b/--ignore-space-at-eol/etc. settings.  And compare two strings
> in place.
>
> For that, you may need a helper function that takes a pointer to a
> character pointer, picks the next byte that matters while advancing
> the pointer, and returns that byte.  The emitted_symbol_cmp(a, b)
> which is not used for real comparison (i.e. ordering to see if a
> sorts earlier than b) but for equivalence (i.e. considering various
> whitespace-ignoring settings, does a and b matfch?) may become
> something like:
>
> int
> emitted_symbol_eqv(struct emitted_diff_symbol *a,
>struct emitted_diff_symbol *b,
>const void *keydata) {
> struct diff_options *diffopt = keydata;
> const char *ap = a->line;
> const char *bp = b->line;
>
> while (1) {
> int ca, cb;
> ca = next_byte(&ap, diffopt);
> cb = next_byte(&bp, diffopt);
> if (ca != cb)
> return 1; /* differs */
> if (!ca)
> return 0;
> };
> }
>
> where next_byte() may look like:
>
> static int
> next_byte(const char **cp, struct diff_options *diffopt)
> {
> int retval;
>
> again:
> retval = **cp;
> if (!retval)
> return retval; /* end of string */
> if (!isspace(retval)) {
> (*cp)++; /* advance */
> return retval;
> }
>
> switch (ignore_space_type(diffopt)) {
> case NOT_IGNORING:
> (*cp)++; /* advance */
> return retval;
> case IGNORE_SPACE_CHANGE:
> while (**cp && isspace(**cp))
> (*cp)++; /* squash consecutive spaces */
> return ' '; /* normalize spaces with a SP */
> case IGNORE_ALL_SPACE:
> (*cp)++; /* advance */
> goto again;
> ... other cases here ...
> }
> }
>
>

I just rebased the diff series on top of the hashmap series and now
I want to implement the compare function based on the new data
feature. So I thought I might start off this proposal, as you seem to
have good taste for how to approach problems.

It turns out that the code here is rather a very loose proposal,
not to be copied literally, because of these constraints:
* When dealing with user content, we do not have C-strings,
  but memory + length, such that next_byte also needs to be
  aware of the end pointer.
* The ignore_space_type() as well as these constants do not exist
  as-is, I think the cleanest for now is to parse diffopt->xdl_opts via
  DIFF_XDL_TST

Thanks,
Stefan


Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

2017-06-28 Thread Junio C Hamano
Stefan Beller  writes:

> Originally I wanted to do that, see prep work in [1], but Jeff explained that
> the additional pointer in the compare function is **not** supposed to be
> a additional payload (such as the diff options specifying the white space
> options.)
>
> [1] https://public-inbox.org/git/20170512200244.25245-1-sbel...@google.com/

Ah, yes, keydata is a wrong thing to use to give additional hint to
the eqv function.  We do need to fix the function signature of
hashmap_cmp_fn.  It is common for us to want to use a single
implementation of an eqv function that can be configured to behave
slightly differently but the customization will stay the same
throughout the lifetime of a hashmap.  IOW, hashmap_init() needs to
be able to take a pointer to such a configuration data
(e.g. diff_options), and then the comparison made between two
elements that hash to the same bucket needs to be given not just the
two elements but also that configuration data to tweak the function.



Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

2017-06-28 Thread Stefan Beller
On Tue, Jun 27, 2017 at 10:06 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Looking at the implementation of get_ws_cleaned_string() that is the
>> workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong
>> things for various "ignore whitespace" options (i.e. there is only
>> one implementation, while "git diff" family takes things like
>> "ignore space change", "ignore all whitespace", etc.), though.
>
> This probably deserves a bit more illustration of how I envision the
> code should evolve.
>
> In the longer term, I would prefer to see emitted_symbol_cmp_no_ws()
> to go and instead emitted_symbol_cmp() to take the diff options
> so
> that it can change the behaviour of the comparison function based on
> the -w/-b/--ignore-space-at-eol/etc. settings.  And compare two strings
> in place.

ok, in-place is no problem. But passing down the diff options into the
compare function is a bit hard.

Originally I wanted to do that, see prep work in [1], but Jeff explained that
the additional pointer in the compare function is **not** supposed to be
a additional payload (such as the diff options specifying the white space
options.)

[1] https://public-inbox.org/git/20170512200244.25245-1-sbel...@google.com/

However as we no settled on the struct emitted_diff_symbol,
that has a 'flags' field in there, which ought to contain everything we
know about whitespace settings, we should be able to do that from there.

> emitted_symbol_eqv(struct emitted_diff_symbol *a,
>struct emitted_diff_symbol *b,
>const void *keydata) {
> struct diff_options *diffopt = keydata;

The prep work mentioned, would allow for this, as keydata
would be passed through as-is in all calls of the hashmap API,
such that the user can decide if they use it as the actual 'keydata'
or rather as an additional payload.

Thanks for outlining the idea in code, but maybe we need to
reconsider the hashmap API before that.

By not considering the change in the hashmap API, the current
implementation tried to get away by having different compare functions.

Thanks,
Stefan


Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

2017-06-27 Thread Junio C Hamano
Junio C Hamano  writes:

> Looking at the implementation of get_ws_cleaned_string() that is the
> workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong
> things for various "ignore whitespace" options (i.e. there is only
> one implementation, while "git diff" family takes things like
> "ignore space change", "ignore all whitespace", etc.), though.

This probably deserves a bit more illustration of how I envision the
code should evolve.

In the longer term, I would prefer to see emitted_symbol_cmp_no_ws()
to go and instead emitted_symbol_cmp() to take the diff options so
that it can change the behaviour of the comparison function based on
the -w/-b/--ignore-space-at-eol/etc. settings.  And compare two strings
in place.

For that, you may need a helper function that takes a pointer to a
character pointer, picks the next byte that matters while advancing
the pointer, and returns that byte.  The emitted_symbol_cmp(a, b)
which is not used for real comparison (i.e. ordering to see if a
sorts earlier than b) but for equivalence (i.e. considering various
whitespace-ignoring settings, does a and b matfch?) may become
something like:

int
emitted_symbol_eqv(struct emitted_diff_symbol *a,
   struct emitted_diff_symbol *b,
   const void *keydata) {
struct diff_options *diffopt = keydata;
const char *ap = a->line;
const char *bp = b->line;

while (1) {
int ca, cb;
ca = next_byte(&ap, diffopt);
cb = next_byte(&bp, diffopt);
if (ca != cb)
return 1; /* differs */
if (!ca)
return 0;
};
}   

where next_byte() may look like:

static int
next_byte(const char **cp, struct diff_options *diffopt)
{  
int retval;

again:
retval = **cp;
if (!retval)
return retval; /* end of string */
if (!isspace(retval)) {
(*cp)++; /* advance */
return retval;
}

switch (ignore_space_type(diffopt)) {
case NOT_IGNORING:
(*cp)++; /* advance */
return retval;
case IGNORE_SPACE_CHANGE:
while (**cp && isspace(**cp))
(*cp)++; /* squash consecutive spaces */
return ' '; /* normalize spaces with a SP */
case IGNORE_ALL_SPACE:
(*cp)++; /* advance */
goto again;
... other cases here ...
}
}




Re: [PATCH 6/6] diff.c: detect blocks despite whitespace changes

2017-06-27 Thread Junio C Hamano
Stefan Beller  writes:

> Reuse the compare function from the hash map instead of calling the
> compare function directly. Then we pick the correct compare function
> when told to compare ignoring white space.
>
> Signed-off-by: Stefan Beller 
> ---
>  diff.c |  3 +--
>  t/t4015-diff-whitespace.sh | 65 
> ++
>  2 files changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 1d93e98e3a..4bcf938e3a 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -903,8 +903,7 @@ static void mark_color_as_moved(struct diff_options *o,
>   struct moved_entry *p = pmb[i];
>   struct moved_entry *pnext = (p && p->next_line) ?
>   p->next_line : NULL;
> - if (pnext &&
> - !emitted_symbol_cmp(pnext->es, l, o)) {
> + if (pnext && !hm->cmpfn(pnext, match, NULL)) {
>   pmb[i] = p->next_line;
>   } else {
>   pmb[i] = NULL;

I presume that this makes the use of emitted_symbol_cmp() vs
emitted_symbol_cmp_no_ws() consistent with the remainder of the
file.

Looking at the implementation of get_ws_cleaned_string() that is the
workhorse of emitted_symbol_cmp_no_ws(), it seems to be doing wrong
things for various "ignore whitespace" options (i.e. there is only
one implementation, while "git diff" family takes things like
"ignore space change", "ignore all whitespace", etc.), though.