Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-05-03 Thread Jacob Keller
On Mon, May 2, 2016 at 11:02 AM, Jeff King  wrote:
> On Mon, May 02, 2016 at 10:40:28AM -0700, Junio C Hamano wrote:
>
>> "Keller, Jacob E"  writes:
>>
>> > True. I think the chances that it needs such a thing are quite minor,
>> > and if an undocumented knob gets exposed it would have to become
>> > documented and maintained, so I'd prefer to avoid it. Given that the
>> > risk is pretty small I think that's ok.
>>
>> OK, then let's do only the "documentation" part.
>>
>> -- >8 --
>> Subject: [PATCH] diff: undocument the compaction heuristic knobs for 
>> experimentation
>>
>> It seems that people around here are all happy with the updated
>> heuristics used to decide where the hunks are separated.  Let's keep
>> that as the default.  Even though we do not expect too much trouble
>> from the difference between the old and the new algorithms, just in
>> case let's leave the implementation of the knobs to turn it off for
>> emergencies.  There is no longer need for documenting them, though.
>
> I agree with this reasoning. Thanks.
>
> -Peff

I think I agree too.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-05-02 Thread Jeff King
On Mon, May 02, 2016 at 10:40:28AM -0700, Junio C Hamano wrote:

> "Keller, Jacob E"  writes:
> 
> > True. I think the chances that it needs such a thing are quite minor,
> > and if an undocumented knob gets exposed it would have to become
> > documented and maintained, so I'd prefer to avoid it. Given that the
> > risk is pretty small I think that's ok.
> 
> OK, then let's do only the "documentation" part.
> 
> -- >8 --
> Subject: [PATCH] diff: undocument the compaction heuristic knobs for 
> experimentation
> 
> It seems that people around here are all happy with the updated
> heuristics used to decide where the hunks are separated.  Let's keep
> that as the default.  Even though we do not expect too much trouble
> from the difference between the old and the new algorithms, just in
> case let's leave the implementation of the knobs to turn it off for
> emergencies.  There is no longer need for documenting them, though.

I agree with this reasoning. Thanks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-05-02 Thread Stefan Beller
On Mon, May 2, 2016 at 10:40 AM, Junio C Hamano  wrote:
> "Keller, Jacob E"  writes:
>
>> True. I think the chances that it needs such a thing are quite minor,
>> and if an undocumented knob gets exposed it would have to become
>> documented and maintained, so I'd prefer to avoid it. Given that the
>> risk is pretty small I think that's ok.
>
> OK, then let's do only the "documentation" part.

The patch below looks good to me.

Thanks,
Stefan

>
> -- >8 --
> Subject: [PATCH] diff: undocument the compaction heuristic knobs for 
> experimentation
>
> It seems that people around here are all happy with the updated
> heuristics used to decide where the hunks are separated.  Let's keep
> that as the default.  Even though we do not expect too much trouble
> from the difference between the old and the new algorithms, just in
> case let's leave the implementation of the knobs to turn it off for
> emergencies.  There is no longer need for documenting them, though.
>
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/diff-config.txt  | 5 -
>  Documentation/diff-options.txt | 6 --
>  2 files changed, 11 deletions(-)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index 9bf3e92..6eaa452 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -166,11 +166,6 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> -diff.compactionHeuristic::
> -   Set this option to enable an experimental heuristic that
> -   shifts the hunk boundary in an attempt to make the resulting
> -   patch easier to read.
> -
>  diff.algorithm::
> Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index b513023..3ad6404 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,12 +63,6 @@ ifndef::git-format-patch[]
> Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> ---compaction-heuristic::
> ---no-compaction-heuristic::
> -   These are to help debugging and tuning an experimental
> -   heuristic that shifts the hunk boundary in an attempt to
> -   make the resulting patch easier to read.
> -
>  --minimal::
> Spend extra time to make sure the smallest possible
> diff is produced.
> --
> 2.8.2-458-gacc1066
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-05-02 Thread Junio C Hamano
"Keller, Jacob E"  writes:

> True. I think the chances that it needs such a thing are quite minor,
> and if an undocumented knob gets exposed it would have to become
> documented and maintained, so I'd prefer to avoid it. Given that the
> risk is pretty small I think that's ok.

OK, then let's do only the "documentation" part.

-- >8 --
Subject: [PATCH] diff: undocument the compaction heuristic knobs for 
experimentation

It seems that people around here are all happy with the updated
heuristics used to decide where the hunks are separated.  Let's keep
that as the default.  Even though we do not expect too much trouble
from the difference between the old and the new algorithms, just in
case let's leave the implementation of the knobs to turn it off for
emergencies.  There is no longer need for documenting them, though.

Signed-off-by: Junio C Hamano 
---
 Documentation/diff-config.txt  | 5 -
 Documentation/diff-options.txt | 6 --
 2 files changed, 11 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 9bf3e92..6eaa452 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,11 +166,6 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
-diff.compactionHeuristic::
-   Set this option to enable an experimental heuristic that
-   shifts the hunk boundary in an attempt to make the resulting
-   patch easier to read.
-
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b513023..3ad6404 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,12 +63,6 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
---compaction-heuristic::
---no-compaction-heuristic::
-   These are to help debugging and tuning an experimental
-   heuristic that shifts the hunk boundary in an attempt to
-   make the resulting patch easier to read.
-
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
-- 
2.8.2-458-gacc1066

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Jeff King
On Fri, Apr 29, 2016 at 03:35:54PM -0700, Stefan Beller wrote:

> > -- >8 --
> > diff: enable "compaction heuristics" and lose experimentation knob
> >
> > It seems that the new "find a good hunk boundary by locating a blank
> > line" heuristics gives much more pleasant result without much
> > noticeable downsides.  Let's make it the new algorithm for real,
> > without the opt-out knob we added while experimenting with it.
> 
> I would remove the opt-out knob much later in the game, i.e.
> 
> 1) make a patch that removes the documentation only
>before the next release (i.e. before 2.9)
> 2) make a patch to remove the actual (unlabeled) knobs,
> merge into master before 2.10 (i.e. just after the 2.9 release)

Yeah, I think it might be a good idea to keep some sort of undocumented
safety valve in the release, at least for a cycle or two. The heuristic
won't _really_ see wide use until it is in a released version of git, as
much as we would like it to be otherwise.

So I am anticipating a possible conversation where somebody reports that
the new output looks bad, and it would be nice to say "try it with this
flag (or environment variable, or whatever) and see if that looks
better".  And then based on that conversation we can decide what the
right next is (a real user-visible flag, or reversion, or deciding the
user's case isn't worth it). Or maybe if we're lucky that conversation
never happens.

The "whatever" in the instructions can obviously be "build with this
patch" or "try with an older version of git", but we're a lot more
likely to get a good response if it's easy for the user to do.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Keller, Jacob E
On Fri, 2016-04-29 at 15:44 -0700, Stefan Beller wrote:
> > 
> > Currently it's an "opt in" knob, so this doesn't make sense to me.
> +static int diff_compaction_heuristic = 1;
> 

Oops didn't know we'd made it default at some point. (all my versions
had it disabled by default)

> It's rather an opt-out knob going by the current
> origin/jk/diff-compact-heuristic
> 

Yea in that case, we could keep it.

> 
> > 
> > If
> > we remove the entire knob as is, we can always (fairly easily) add
> > it
> > back. I would keep the code inside xdiff as a knob, but set it to
> > enable default so that the user config has no knob at the top level
> > but
> > the xdiff machinery does (this making a "disable" be relatively
> > small
> > patch).
> When writing my reply, I thought about people using Git from a binary
> distribution with little to no admin rights. They want to have an
> emergency
> knob to disable this thing, but cannot patch/recompile Git.
> 
> If you can patch and compile your version of Git, then reverting is
> easy, so
> in that case Junios patch looks good to me.
> 
> Thanks,
> Stefan

True. I think the chances that it needs such a thing are quite minor,
and if an undocumented knob gets exposed it would have to become
documented and maintained, so I'd prefer to avoid it. Given that the
risk is pretty small I think that's ok.

Thanks,
Jake

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Stefan Beller
> Currently it's an "opt in" knob, so this doesn't make sense to me.

+static int diff_compaction_heuristic = 1;

It's rather an opt-out knob going by the current
origin/jk/diff-compact-heuristic


> If
> we remove the entire knob as is, we can always (fairly easily) add it
> back. I would keep the code inside xdiff as a knob, but set it to
> enable default so that the user config has no knob at the top level but
> the xdiff machinery does (this making a "disable" be relatively small
> patch).

When writing my reply, I thought about people using Git from a binary
distribution with little to no admin rights. They want to have an emergency
knob to disable this thing, but cannot patch/recompile Git.

If you can patch and compile your version of Git, then reverting is easy, so
in that case Junios patch looks good to me.

Thanks,
Stefan

>
> Thanks,
> Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Keller, Jacob E
On Fri, 2016-04-29 at 15:35 -0700, Stefan Beller wrote:
> On Fri, Apr 29, 2016 at 3:18 PM, Junio C Hamano 
> wrote:
> > 
> > Jacob Keller  writes:
> > 
> > > 
> > > On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano  > > m> wrote:
> > > > 
> > > > Jeff King  writes:
> > > > 
> > > > > 
> > > > > ... Having the two directly next to each other reads
> > > > > better to me. This is a pretty unusual diff, though, in that
> > > > > it did
> > > > > change the surrounding whitespace (and if you look further in
> > > > > the diff,
> > > > > the identical change is made elsewhere _without_ touching the
> > > > > whitespace). So this is kind of an anomaly. And IMHO the
> > > > > weirdness here
> > > > > is outweighed by the vast number of improvements elsewhere.
> > > > So... is everybody happy with the result and now we can drop
> > > > the
> > > > tweaking knob added to help experimentation before merging the
> > > > result to 'master'?
> > > > 
> > > > I am pretty happy with the end result myself.
> > > I am very happy with it. I haven't had any issues, and I think
> > > we'll
> > > find better traction by enabling it at this point and seeing
> > > when/if
> > > someone complains.
> > > 
> > > I think for most it won't be noticed and for those that do it
> > > will
> > > likely be positive.
> > I am doing this only to prepare in case we have a concensus,
> > i.e. this is not to declare that I do not care what other people
> > say.  Here is a patch to remove the experimentation knob.
> > 
> > Let's say we keep this patch out of tree for now and keep the topic
> > in 'next' so that people can further play with it for several more
> > weeks, and then apply this on top and merge the result to 'master'
> > early in the next cycle.
> > 
> > -- >8 --
> > diff: enable "compaction heuristics" and lose experimentation knob
> > 
> > It seems that the new "find a good hunk boundary by locating a
> > blank
> > line" heuristics gives much more pleasant result without much
> > noticeable downsides.  Let's make it the new algorithm for real,
> > without the opt-out knob we added while experimenting with it.
> I would remove the opt-out knob much later in the game, i.e.
> 
> 1) make a patch that removes the documentation only
>    before the next release (i.e. before 2.9)
> 2) make a patch to remove the actual (unlabeled) knobs,
> merge into master before 2.10 (i.e. just after the 2.9
> release)
> 
> Then we get the most of the community to test it with the 2.9 release
> and still have an emergency knob in case some major headaches
> show up. After one release cycle we'll be much more confident
> about its usage and its short comings and do not need the
> emergency turn off. If the community doesn't like it for some reason
> we can document it and debate the default setting?
> 
> I agree we want the knob gone eventually.
> Making it an undocumented feature is as good as that from
> a users point of view?
> 

Currently it's an "opt in" knob, so this doesn't make sense to me. If
we remove the entire knob as is, we can always (fairly easily) add it
back. I would keep the code inside xdiff as a knob, but set it to
enable default so that the user config has no knob at the top level but
the xdiff machinery does (this making a "disable" be relatively small
patch).

Thanks,
Jake

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Stefan Beller
On Fri, Apr 29, 2016 at 3:18 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano  wrote:
>>> Jeff King  writes:
>>>
 ... Having the two directly next to each other reads
 better to me. This is a pretty unusual diff, though, in that it did
 change the surrounding whitespace (and if you look further in the diff,
 the identical change is made elsewhere _without_ touching the
 whitespace). So this is kind of an anomaly. And IMHO the weirdness here
 is outweighed by the vast number of improvements elsewhere.
>>>
>>> So... is everybody happy with the result and now we can drop the
>>> tweaking knob added to help experimentation before merging the
>>> result to 'master'?
>>>
>>> I am pretty happy with the end result myself.
>>
>> I am very happy with it. I haven't had any issues, and I think we'll
>> find better traction by enabling it at this point and seeing when/if
>> someone complains.
>>
>> I think for most it won't be noticed and for those that do it will
>> likely be positive.
>
> I am doing this only to prepare in case we have a concensus,
> i.e. this is not to declare that I do not care what other people
> say.  Here is a patch to remove the experimentation knob.
>
> Let's say we keep this patch out of tree for now and keep the topic
> in 'next' so that people can further play with it for several more
> weeks, and then apply this on top and merge the result to 'master'
> early in the next cycle.
>
> -- >8 --
> diff: enable "compaction heuristics" and lose experimentation knob
>
> It seems that the new "find a good hunk boundary by locating a blank
> line" heuristics gives much more pleasant result without much
> noticeable downsides.  Let's make it the new algorithm for real,
> without the opt-out knob we added while experimenting with it.

I would remove the opt-out knob much later in the game, i.e.

1) make a patch that removes the documentation only
   before the next release (i.e. before 2.9)
2) make a patch to remove the actual (unlabeled) knobs,
merge into master before 2.10 (i.e. just after the 2.9 release)

Then we get the most of the community to test it with the 2.9 release
and still have an emergency knob in case some major headaches
show up. After one release cycle we'll be much more confident
about its usage and its short comings and do not need the
emergency turn off. If the community doesn't like it for some reason
we can document it and debate the default setting?

I agree we want the knob gone eventually.
Making it an undocumented feature is as good as that from
a users point of view?

>
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/diff-config.txt  |  5 -
>  Documentation/diff-options.txt |  6 --
>  diff.c | 11 ---
>  xdiff/xdiff.h  |  2 --
>  xdiff/xdiffi.c |  2 +-
>  5 files changed, 1 insertion(+), 25 deletions(-)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index 9bf3e92..6eaa452 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -166,11 +166,6 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> -diff.compactionHeuristic::
> -   Set this option to enable an experimental heuristic that
> -   shifts the hunk boundary in an attempt to make the resulting
> -   patch easier to read.
> -
>  diff.algorithm::
> Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index b513023..3ad6404 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,12 +63,6 @@ ifndef::git-format-patch[]
> Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> ---compaction-heuristic::
> ---no-compaction-heuristic::
> -   These are to help debugging and tuning an experimental
> -   heuristic that shifts the hunk boundary in an attempt to
> -   make the resulting patch easier to read.
> -
>  --minimal::
> Spend extra time to make sure the smallest possible
> diff is produced.
> diff --git a/diff.c b/diff.c
> index 05ca3ce..f62b7f7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,7 +25,6 @@
>  #endif
>
>  static int diff_detect_rename_default;
> -static int diff_compaction_heuristic = 1;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -184,10 +183,6 @@ int git_diff_ui_config(const char *var, const char 
> *value, void *cb)
> diff_detect_rename_default = git_config_rename(var, value);
> return 0;
> }
> -   if (!strcmp(var, "diff.compactionheuristic")) {
> -   diff_compaction_heuristic = git_config_bool(var, value);
> -   return 0;
> -   }
> 

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Junio C Hamano
Jacob Keller  writes:

> On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano  wrote:
>> Jeff King  writes:
>>
>>> ... Having the two directly next to each other reads
>>> better to me. This is a pretty unusual diff, though, in that it did
>>> change the surrounding whitespace (and if you look further in the diff,
>>> the identical change is made elsewhere _without_ touching the
>>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
>>> is outweighed by the vast number of improvements elsewhere.
>>
>> So... is everybody happy with the result and now we can drop the
>> tweaking knob added to help experimentation before merging the
>> result to 'master'?
>>
>> I am pretty happy with the end result myself.
>
> I am very happy with it. I haven't had any issues, and I think we'll
> find better traction by enabling it at this point and seeing when/if
> someone complains.
>
> I think for most it won't be noticed and for those that do it will
> likely be positive.

I am doing this only to prepare in case we have a concensus,
i.e. this is not to declare that I do not care what other people
say.  Here is a patch to remove the experimentation knob.

Let's say we keep this patch out of tree for now and keep the topic
in 'next' so that people can further play with it for several more
weeks, and then apply this on top and merge the result to 'master'
early in the next cycle.

-- >8 --
diff: enable "compaction heuristics" and lose experimentation knob

It seems that the new "find a good hunk boundary by locating a blank
line" heuristics gives much more pleasant result without much
noticeable downsides.  Let's make it the new algorithm for real,
without the opt-out knob we added while experimenting with it.

Signed-off-by: Junio C Hamano 
---
 Documentation/diff-config.txt  |  5 -
 Documentation/diff-options.txt |  6 --
 diff.c | 11 ---
 xdiff/xdiff.h  |  2 --
 xdiff/xdiffi.c |  2 +-
 5 files changed, 1 insertion(+), 25 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 9bf3e92..6eaa452 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,11 +166,6 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
-diff.compactionHeuristic::
-   Set this option to enable an experimental heuristic that
-   shifts the hunk boundary in an attempt to make the resulting
-   patch easier to read.
-
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b513023..3ad6404 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,12 +63,6 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
---compaction-heuristic::
---no-compaction-heuristic::
-   These are to help debugging and tuning an experimental
-   heuristic that shifts the hunk boundary in an attempt to
-   make the resulting patch easier to read.
-
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
diff --git a/diff.c b/diff.c
index 05ca3ce..f62b7f7 100644
--- a/diff.c
+++ b/diff.c
@@ -25,7 +25,6 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_compaction_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -184,10 +183,6 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
-   if (!strcmp(var, "diff.compactionheuristic")) {
-   diff_compaction_heuristic = git_config_bool(var, value);
-   return 0;
-   }
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
return 0;
@@ -3240,8 +3235,6 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
-   if (diff_compaction_heuristic)
-   DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3719,10 +3712,6 @@ int diff_opt_parse(struct diff_options *options, const 
char **av, int ac)
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
-   else if (!strcmp(arg, "--compaction-heuristic"))
-   DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
-   else if (!strcmp(arg, "--no-compaction-heuristic"))
-   DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
 

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Jacob Keller
On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> ... Having the two directly next to each other reads
>> better to me. This is a pretty unusual diff, though, in that it did
>> change the surrounding whitespace (and if you look further in the diff,
>> the identical change is made elsewhere _without_ touching the
>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
>> is outweighed by the vast number of improvements elsewhere.
>
> So... is everybody happy with the result and now we can drop the
> tweaking knob added to help experimentation before merging the
> result to 'master'?
>
> I am pretty happy with the end result myself.

I am very happy with it. I haven't had any issues, and I think we'll
find better traction by enabling it at this point and seeing when/if
someone complains.

I think for most it won't be noticed and for those that do it will
likely be positive.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-29 Thread Junio C Hamano
Jeff King  writes:

> ... Having the two directly next to each other reads
> better to me. This is a pretty unusual diff, though, in that it did
> change the surrounding whitespace (and if you look further in the diff,
> the identical change is made elsewhere _without_ touching the
> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
> is outweighed by the vast number of improvements elsewhere.

So... is everybody happy with the result and now we can drop the
tweaking knob added to help experimentation before merging the
result to 'master'?

I am pretty happy with the end result myself.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-20 Thread Jeff King
On Wed, Apr 20, 2016 at 09:09:53AM -0700, Junio C Hamano wrote:

> "Michael S. Tsirkin"  writes:
> 
> > FWIW IIRC what that commit is about is ability to reorder the chunks in
> > a patch without changing patch-id. Not about keeping id stable across
> > git revisions.
> 
> OK, but "reorder the chunks" is not meant to stay to be the _ONLY_
> purpose for an option whose name is a broad "--[un]stable", but
> merely one (and only) possible cause of patch-id instability that
> happened to be noticed as an issue back then and was dealt with that
> commit, no?  In other words, the intent of the "--stable" feature is
> to give a stable ID that is not affected by random end-user settings
> (e.g. diff.orderfile) and if somebody invents a new configurable knob
> in the future, they are supposed to pay attention to the "--stable"
> feature or existing users who do use "--stable" will be broken, no?

I forgot that we added "--stable". Evne if it is not meant to be about
stability across versions, is there any reason _not_ to turn off
this heuristic for --stable (or for patch-ids in general)?

I guess maybe that creates some inconsistency between generating a
patch-id directly, and making one from a diff given on stdin (though I
don't know that we can promise much about the latter in the general
case; we can fix file ordering, but we don't have enough information to
tweak other aspects).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-20 Thread Junio C Hamano
"Michael S. Tsirkin"  writes:

> FWIW IIRC what that commit is about is ability to reorder the chunks in
> a patch without changing patch-id. Not about keeping id stable across
> git revisions.

OK, but "reorder the chunks" is not meant to stay to be the _ONLY_
purpose for an option whose name is a broad "--[un]stable", but
merely one (and only) possible cause of patch-id instability that
happened to be noticed as an issue back then and was dealt with that
commit, no?  In other words, the intent of the "--stable" feature is
to give a stable ID that is not affected by random end-user settings
(e.g. diff.orderfile) and if somebody invents a new configurable knob
in the future, they are supposed to pay attention to the "--stable"
feature or existing users who do use "--stable" will be broken, no?

I can still buy "--stable is not about stability across versions of
Git"--it makes our job easier ;-)  I just want to make sure that
"--stable is about stability inside a single version of Git that
patch ID for the same commit will stay the same and unaffected by
random end-user configuration knobs".

Which in turn would mean that we won't have to worry about this
option in patch-id as long as we remove the diff.compactionheuristic
configuration and command line option once the developers are done
experimenting with their heuristics code.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-20 Thread Michael S. Tsirkin
On Tue, Apr 19, 2016 at 04:07:35PM -0700, Junio C Hamano wrote:
> Jacob Keller  writes:
> 
> > On Tue, Apr 19, 2016 at 10:06 AM, Jeff King  wrote:
> >> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:
> >>
> >>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King  wrote:
> >>>
> >>> > I guess this will invalidate old patch-ids, but there's not much to be
> >>> > done about that.
> >>>
> >>> What do you mean by that? (What consequences do you imagine?)
> >>> I think diffs with any kind of heuristic can still be applied, no?
> >>
> >> I mean that if you save any old patch-ids from "git patch-id", they
> >> won't match up when compared with new versions of git. We can probably
> >> ignore it, though. This isn't the first time that patch-ids might have
> >> changed, and I think the advice is already that one should not count on
> >> them to be stable in the long term.
> >>
> >> -Peff
> >
> > Plus they'll be stable within a version of Git, it's only recorded
> > patch ids that change, which hopefully isn't done very much if at all.
> >
> > Thanks,
> > Jake
> 
> Some people, like those who did things like 30e12b92 (patch-id: make
> it stable against hunk reordering, 2014-04-27), _may_ care.
> 

FWIW IIRC what that commit is about is ability to reorder the chunks in
a patch without changing patch-id. Not about keeping id stable across
git revisions.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-20 Thread Junio C Hamano
Jeff King  writes:

> I mean that if you save any old patch-ids from "git patch-id", they
> won't match up when compared with new versions of git. We can probably
> ignore it, though. This isn't the first time that patch-ids might have
> changed, and I think the advice is already that one should not count on
> them to be stable in the long term.

Another thing that this *will* break is the patch signature upload
protocol k.org uses to allow Linus, Greg, et al. on the road with
limited hotel wifi bandwidth to prepare patch-X-test1.gz and
patch-X-test1.sign file.  They can locally tag X-test1, prepare
"git diff X X-test1 | gzip -n >patch-X-test1.gz" and sign the
result, and upload _only_ the detached signature after pushing.

They can tell k.org, when uploading the detached signature, to
recreate the patchfile by running the same "git diff" to save the
bandwidth of sending the same thing twice (as they have to "push"
anyway, having to send the generated patch is a pure overhead).

Having said all that, kup(1) users are already warned that the
textual diff produced by "git diff-tree -p" (which is mentioned in
the documentation of the tool) varies across versions of Git and
the above "optimization" would not work unless both ends have the
same version of Git, so it may not be too big an issue for them.
They have already been burned once when we corrected "git archive"
output in the past (they obviously have the same optimization to
sign tarballs, and the kup(1) mechanism relies to have byte-for-byte
identical output).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Stefan Beller
On Tue, Apr 19, 2016 at 9:18 PM, Jeff King  wrote:
> [your original probably didn't make it to the list because of its 5MB
>  attachment; the list has a 100K limit; I'll try to quote liberally]
>
> On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote:
>
>> I ran this version of the patch against the entire Linux kernel
>> history, as I figured this has a large batch of C code to try and spot
>> any issues.
>>
>> I ran something like the following command in bash
>>
>> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git
>> show --format="commit %H" --no-compaction-heuristic $rev) <(git show
>> --format="commit %H" --compaction-heuristic $rev); done >
>> heuristic.patch
>
> My earlier tests with the perl script were all done with "git log -p",
> which will not show anything at all for merges (and my script wouldn't
> know how to deal with combined diffs anyway). But I think this new patch
> _will_ kick in for combined diffs (because it is built on individual
> diffs). It will be interesting to see if this has any effect there, and
> what it looks like.
>
> We should be able to see it (on a small enough repository) with:
>
>   git log --format='commit %H' --cc --merges
>
> and comparing the before/after.
>
>> I've attached the file that I generated for the Linux history, it's
>> rather large so hopefully I can get some help to spot any differences.
>> The above approach will work for pretty much any repository, and works
>> better than trying to generate the entire thing first and then diff
>> (since that runs out of memory pretty fast).
>
> I don't think there is much point in generating a complete diff between
> the patches for every commit, when nobody can look at the whole thing.
> Unless we have automated tooling to find "interesting" bits (and
> certainly a tool to remove the boring "a comment got shifted by one"
> lines would help; those are all known improvements, but it's the _other_
> stuff we want to look).
>
> But if we are not using automated tooling to find the needle in the
> haystack, we might as well using sampling to make the dataset more
> manageable. Adding "--since=1.year.ago" is one way, though we may want
> to sample more randomly across time.
>
>> So far, I haven't spotted anything that would want me to disable it,
>> while I've spotted several cases where I felt that readability was
>> improved. It's somewhat difficult to spot though.
>
> I did find one case that I think is worse. Look at 857942fd1a in the
> kernel. It has a pattern like this:
>
>   ... surrounding code ...
>
>   function_one();
>   ... more surrounding code ...
>
> which becomes:
>
>   ... surrounding code ...
>
>   function_two();
>
>   ... more surrounding code
>
> Without the new heuristic, that looks like:
>
>   -function_one();
>   +function_two();
>   +
>
> but with it, it becomes:
>
>   +
>   +function_two();
>
>   -function_one();
>
> which is kind of weird. Having the two directly next to each other reads
> better to me. This is a pretty unusual diff, though, in that it did
> change the surrounding whitespace (and if you look further in the diff,
> the identical change is made elsewhere _without_ touching the
> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
> is outweighed by the vast number of improvements elsewhere.

The new implementation supports the flags for ignoring white space, too.

>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Jeff King
On Wed, Apr 20, 2016 at 12:18:27AM -0400, Jeff King wrote:

> My earlier tests with the perl script were all done with "git log -p",
> which will not show anything at all for merges (and my script wouldn't
> know how to deal with combined diffs anyway). But I think this new patch
> _will_ kick in for combined diffs (because it is built on individual
> diffs). It will be interesting to see if this has any effect there, and
> what it looks like.
> 
> We should be able to see it (on a small enough repository) with:
> 
>   git log --format='commit %H' --cc --merges
> 
> and comparing the before/after.

Add in "-p" if you are testing the tip of jk/diff-compact-heuristic. It
is based on the older maintenance track in which "--cc" does not imply
"-p".

Looking over the results, it's about what you'd expect (comment blocks
shifted by one as we want, and then there happens to be a one-line
conflict resolved later in the hunk).

The most interesting thing I found was db65f0fc3b1e. There we have two
functions being added in the same spot, and the resolution obviously is
to put one after the other. So both sides do the usual comment-block
thing, and the resulting combined diff carries through that improvement
as you'd expect.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Jeff King
[your original probably didn't make it to the list because of its 5MB
 attachment; the list has a 100K limit; I'll try to quote liberally]

On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote:

> I ran this version of the patch against the entire Linux kernel
> history, as I figured this has a large batch of C code to try and spot
> any issues.
> 
> I ran something like the following command in bash
> 
> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git
> show --format="commit %H" --no-compaction-heuristic $rev) <(git show
> --format="commit %H" --compaction-heuristic $rev); done >
> heuristic.patch

My earlier tests with the perl script were all done with "git log -p",
which will not show anything at all for merges (and my script wouldn't
know how to deal with combined diffs anyway). But I think this new patch
_will_ kick in for combined diffs (because it is built on individual
diffs). It will be interesting to see if this has any effect there, and
what it looks like.

We should be able to see it (on a small enough repository) with:

  git log --format='commit %H' --cc --merges

and comparing the before/after.

> I've attached the file that I generated for the Linux history, it's
> rather large so hopefully I can get some help to spot any differences.
> The above approach will work for pretty much any repository, and works
> better than trying to generate the entire thing first and then diff
> (since that runs out of memory pretty fast).

I don't think there is much point in generating a complete diff between
the patches for every commit, when nobody can look at the whole thing.
Unless we have automated tooling to find "interesting" bits (and
certainly a tool to remove the boring "a comment got shifted by one"
lines would help; those are all known improvements, but it's the _other_
stuff we want to look).

But if we are not using automated tooling to find the needle in the
haystack, we might as well using sampling to make the dataset more
manageable. Adding "--since=1.year.ago" is one way, though we may want
to sample more randomly across time.

> So far, I haven't spotted anything that would want me to disable it,
> while I've spotted several cases where I felt that readability was
> improved. It's somewhat difficult to spot though.

I did find one case that I think is worse. Look at 857942fd1a in the
kernel. It has a pattern like this:

  ... surrounding code ...

  function_one();
  ... more surrounding code ...

which becomes:

  ... surrounding code ...

  function_two();

  ... more surrounding code

Without the new heuristic, that looks like:

  -function_one();
  +function_two();
  +

but with it, it becomes:

  +
  +function_two();

  -function_one();

which is kind of weird. Having the two directly next to each other reads
better to me. This is a pretty unusual diff, though, in that it did
change the surrounding whitespace (and if you look further in the diff,
the identical change is made elsewhere _without_ touching the
whitespace). So this is kind of an anomaly. And IMHO the weirdness here
is outweighed by the vast number of improvements elsewhere.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Junio C Hamano
Jacob Keller  writes:

> On Tue, Apr 19, 2016 at 10:06 AM, Jeff King  wrote:
>> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:
>>
>>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King  wrote:
>>>
>>> > I guess this will invalidate old patch-ids, but there's not much to be
>>> > done about that.
>>>
>>> What do you mean by that? (What consequences do you imagine?)
>>> I think diffs with any kind of heuristic can still be applied, no?
>>
>> I mean that if you save any old patch-ids from "git patch-id", they
>> won't match up when compared with new versions of git. We can probably
>> ignore it, though. This isn't the first time that patch-ids might have
>> changed, and I think the advice is already that one should not count on
>> them to be stable in the long term.
>>
>> -Peff
>
> Plus they'll be stable within a version of Git, it's only recorded
> patch ids that change, which hopefully isn't done very much if at all.
>
> Thanks,
> Jake

Some people, like those who did things like 30e12b92 (patch-id: make
it stable against hunk reordering, 2014-04-27), _may_ care.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Jacob Keller
On Tue, Apr 19, 2016 at 10:06 AM, Jeff King  wrote:
> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:
>
>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King  wrote:
>>
>> > I guess this will invalidate old patch-ids, but there's not much to be
>> > done about that.
>>
>> What do you mean by that? (What consequences do you imagine?)
>> I think diffs with any kind of heuristic can still be applied, no?
>
> I mean that if you save any old patch-ids from "git patch-id", they
> won't match up when compared with new versions of git. We can probably
> ignore it, though. This isn't the first time that patch-ids might have
> changed, and I think the advice is already that one should not count on
> them to be stable in the long term.
>
> -Peff

Plus they'll be stable within a version of Git, it's only recorded
patch ids that change, which hopefully isn't done very much if at all.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Jeff King
On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:

> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King  wrote:
> 
> > I guess this will invalidate old patch-ids, but there's not much to be
> > done about that.
> 
> What do you mean by that? (What consequences do you imagine?)
> I think diffs with any kind of heuristic can still be applied, no?

I mean that if you save any old patch-ids from "git patch-id", they
won't match up when compared with new versions of git. We can probably
ignore it, though. This isn't the first time that patch-ids might have
changed, and I think the advice is already that one should not count on
them to be stable in the long term.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Junio C Hamano
Jeff King  writes:

> I guess this will invalidate old patch-ids, but there's not much to be
> done about that.

If we really cared, we could disable this (and any future) change to
the compaction logic to "patch-id --[un]stable" option.

I am not sure if it is worth the effort, though ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Stefan Beller
On Mon, Apr 18, 2016 at 10:03 PM, Jeff King  wrote:

> I guess this will invalidate old patch-ids, but there's not much to be
> done about that.

What do you mean by that? (What consequences do you imagine?)
I think diffs with any kind of heuristic can still be applied, no?

Thanks,
Stefan

>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Stefan Beller
On Tue, Apr 19, 2016 at 12:00 AM, Jeff King  wrote:
> On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote:
>
>> I am convinced the better way to do it is like this:
>>
>> Calculate the entropy for each line and take the last line with the
>> lowest entropy as the last line of the hunk.
>
> I'll be curious to see the results, but I think sometimes predictable
> and stupid may be the best route with these sorts of things. In
> particular, I'd worry that a content-independent measure of entropy
> might miss some subtleties of a particular language (e.g., that "*" is
> more or less meaningful than some other character). But we'll see. :)

I would assume that the "*" would have little entropy when there are lots
of comments, i.e. it just "feels" like an empty line.
If there are no "*", then the entropy is high as it is unusual. And
unusual things
should not be at the border of a hunk I would assume.
So m prediction is that the  'subtleties of a particular language' correlate
highly with the actual use of characters.

Anyway, the experiment can be carried out later. :)

Thanks,
Stefan

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Jeff King
On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote:

> I am convinced the better way to do it is like this:
> 
> Calculate the entropy for each line and take the last line with the
> lowest entropy as the last line of the hunk.

I'll be curious to see the results, but I think sometimes predictable
and stupid may be the best route with these sorts of things. In
particular, I'd worry that a content-independent measure of entropy
might miss some subtleties of a particular language (e.g., that "*" is
more or less meaningful than some other character). But we'll see. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-19 Thread Stefan Beller
On Mon, Apr 18, 2016 at 10:03 PM, Jeff King  wrote:
> On Mon, Apr 18, 2016 at 02:12:30PM -0700, Stefan Beller wrote:
>
>> +
>> + /*
>> +  * If a group can be moved back and forth, see if there is an
>> +  * blank line in the moving space. If there is a blank line,
>> +  * make sure the last blank line is the end of the group.
>
> s/an/a/ on the first line

So it looks like I'll be resending another version for this series tomorrow.
Thanks for pointing this out!

>
>> +  * As we shifted the group forward as far as possible, we only
>> +  * need to shift it back if at all.
>
> Maybe because I'm reading it as a diff that only contains this hunk and
> not the whole rest of the function, but the "we" here confused me. You
> mean the earlier, existing loop in xdl_change_compact, right?
>
> Maybe something like:
>
>   As we already shifted the group forward as far as possible in the
>   earlier loop...
>
> would help.

I'll see to get rid of the 'we', otherwise I'll stick with your suggestion.

>
>> + if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
>> + while (ixs > 0 &&
>> +!is_blank_line(recs, ix - 1, flags) &&
>> +recs_match(recs, ixs - 1, ix - 1, flags)) {
>> + rchg[--ixs] = 1;
>> + rchg[--ix] = 0;
>> + }
>> + }
>
> This turned out to be delightfully simple (especially compared to the
> perl monstrosity).
>
> I tried comparing the output to the perl one, but it's not quite the
> same. In that one we had to work with the existing hunks and context
> lines, so any hunk that got shifted ended up with extra context on one
> side, and too little on the other. But here, we can actually bump the
> context lines to give the correct amount on both sides, which is good.
>
> I guess this will invalidate old patch-ids, but there's not much to be
> done about that.

For the record:
I thought about "optimal hunk separation" for a while, specially during my
bike commute. And while this heuristic seems to be a good fit for most of
the cases inspected, we can do better (in the future).

I am convinced the better way to do it is like this:

Calculate the entropy for each line and take the last line with the
lowest entropy as the last line of the hunk.

That heuristic requires more compute though as it will be hard to compute
the entropy for the line. To do that I would imagine, we'd need to loop over
the whole file and count the occurrences for each char (byte) and then
take the negative log of (#number of that byte / #number of bytes in file) [1].

This would model our actual goal a bit more closely to split at parts, where
there is low information density (the definition of entropy).

One example Jacob pointed out was a thing like

/**
 * Comment here. Over
 * more lines.
 *
+ *  Add line here with a blank line
+ *
+ * in between and a trailing blank after.
+ *
 */

I think we had cases like this in the kernel tree and else where,
and for a human it is clear to break after the last "empty line"
(which for comments starts with " * "). To detect those we can use
the entropy as it doesn't convey lots of information.
(git show e1f7037167323461c0415447676262dcb)

It also keeps the false positives out, Jacob pointed at
85ed2f32064b82e541fc7dcf2b0049a05 IIRC, which was bad with
the shortest lines only, but I'd imagine the entropy based
heuristic will do better there.

[1] https://en.wikipedia.org/wiki/Entropy_(information_theory)

Thanks for the review,
Stefan

>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-18 Thread Jeff King
On Mon, Apr 18, 2016 at 02:12:30PM -0700, Stefan Beller wrote:

> +
> + /*
> +  * If a group can be moved back and forth, see if there is an
> +  * blank line in the moving space. If there is a blank line,
> +  * make sure the last blank line is the end of the group.

s/an/a/ on the first line

> +  * As we shifted the group forward as far as possible, we only
> +  * need to shift it back if at all.

Maybe because I'm reading it as a diff that only contains this hunk and
not the whole rest of the function, but the "we" here confused me. You
mean the earlier, existing loop in xdl_change_compact, right?

Maybe something like:

  As we already shifted the group forward as far as possible in the
  earlier loop...

would help.

> + if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
> + while (ixs > 0 &&
> +!is_blank_line(recs, ix - 1, flags) &&
> +recs_match(recs, ixs - 1, ix - 1, flags)) {
> + rchg[--ixs] = 1;
> + rchg[--ix] = 0;
> + }
> + }

This turned out to be delightfully simple (especially compared to the
perl monstrosity).

I tried comparing the output to the perl one, but it's not quite the
same. In that one we had to work with the existing hunks and context
lines, so any hunk that got shifted ended up with extra context on one
side, and too little on the other. But here, we can actually bump the
context lines to give the correct amount on both sides, which is good.

I guess this will invalidate old patch-ids, but there's not much to be
done about that.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-18 Thread Junio C Hamano
Jacob Keller  writes:

> Thanks Stephan and Junio, this looks pretty good. I think before it's
> merged we'd probably want to implement some sort of attributes which
> allows per-path configuration, incase it needs to be configured at
> all.

My take on it is that we'd want to make sure that the shift with
blank line heuristics is "good enough", i.e. there is no need for
end-user configuration or attributes, and then remove the tentative
option, configuration and its documentation, before this is merged.

If we really want to add knobs to handle different kind of payloads
in vastly different way, the right place to do so is to add a set of
bits "use this and that heuristics" to userdiff driver, I would say,
but in the compaction codepath it does not seem to be enough room to
have that many knobs to be tweaked in the first place to me.

> I've got it applied to my local git, and I'm going to try to run a
> diff between enabled vs disabled on a large section of the Linux
> kernel history and a few other projects to see if I spot anything odd.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-18 Thread Jacob Keller
On Mon, Apr 18, 2016 at 2:12 PM, Stefan Beller  wrote:
> In order to produce the smallest possible diff and combine several diff
> hunks together, we implement a heuristic from GNU Diff which moves diff
> hunks forward as far as possible when we find common context above and
> below a diff hunk. This sometimes produces less readable diffs when
> writing C, Shell, or other programming languages, ie:
>
> ...
>  /*
> + *
> + *
> + */
> +
> +/*
> ...
>
> instead of the more readable equivalent of
>
> ...
> +/*
> + *
> + *
> + */
> +
>  /*
> ...
>
> Implement the following heuristic to (optionally) produce the desired
> output.
>
>   If there are diff chunks which can be shifted around, shift each hunk
>   such that the last common empty line is below the chunk with the rest
>   of the context above.
>
> This heuristic appears to resolve the above example and several other
> common issues without producing significantly weird results. However, as
> with any heuristic it is not really known whether this will always be
> more optimal. Thus, it can be disabled via diff.compactionHeuristic.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jacob Keller 
> Signed-off-by: Stefan Beller 
> ---

Thanks Stephan and Junio, this looks pretty good. I think before it's
merged we'd probably want to implement some sort of attributes which
allows per-path configuration, incase it needs to be configured at
all.

I've got it applied to my local git, and I'm going to try to run a
diff between enabled vs disabled on a large section of the Linux
kernel history and a few other projects to see if I spot anything odd.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-18 Thread Stefan Beller
On Mon, Apr 18, 2016 at 12:22 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> I think we're going to make use of xdl_blankline instead of this or
>> our own "is_emptyline"
>
> OK, so perhaps either of you two can do a final version people can
> start having fun with?

Junios proposal seems to be on top of my latest series sent out,
I'll squash it in and send it out as a final version if you don't mind
(though I'll do it later today; currently diving into Gerrits Java)

>
> By the way, I really do not want to see something this low-level to
> be end-user tweakable with "one bit enable/disable"; the end users
> shouldn't have to bother [1].

Ok. Thanks for fixing that mistake.

> I left it in but renamed after "what"
> it enables/disables, not "how" the enabled thing works, to clarify
> that we have this only as a developers' aid.


>
> *1* I am fine with --compaction-heuristic=(shortest|blank|...) that
> allows a choice among many as a developers' aid, but I do not think
> this topic is there yet.

This doesn't bode well with
> +--compaction-heuristic::
> +--no-compaction-heuristic::

in the future? I'd rather have
+--compaction-heuristic=none
+--compaction-heuristic=lastEmptyLine
such that we don't have to worry about further experiments (or matured
heuristics) later?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-18 Thread Junio C Hamano
Jacob Keller  writes:

> I think we're going to make use of xdl_blankline instead of this or
> our own "is_emptyline"

OK, so perhaps either of you two can do a final version people can
start having fun with?

By the way, I really do not want to see something this low-level to
be end-user tweakable with "one bit enable/disable"; the end users
shouldn't have to bother [1].  I left it in but renamed after "what"
it enables/disables, not "how" the enabled thing works, to clarify
that we have this only as a developers' aid.

*1* I am fine with --compaction-heuristic=(shortest|blank|...) that
allows a choice among many as a developers' aid, but I do not think
this topic is there yet.

 Documentation/diff-config.txt  |  9 -
 Documentation/diff-options.txt | 10 +-
 diff.c | 18 +-
 xdiff/xdiff.h  |  2 +-
 xdiff/xdiffi.c | 22 ++
 5 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index c62745b..9bf3e92 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,11 +166,10 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
-diff.shortestLineHeuristic::
-   Set this option to true to enable the shortest line chunk heuristic when
-   producing diff output. This heuristic will attempt to shift hunks such
-   that the last shortest common line occurs below the hunk with the rest 
of
-   the context above it.
+diff.compactionHeuristic::
+   Set this option to enable an experimental heuristic that
+   shifts the hunk boundary in an attempt to make the resulting
+   patch easier to read.
 
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 238f39c..b513023 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,11 +63,11 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
---shortest-line-heuristic::
---no-shortest-line-heuristic::
-   When possible, shift common shortest line in diff hunks below the hunk
-   such that the last common shortest line for each hunk is below, with the
-   rest of the context above the hunk.
+--compaction-heuristic::
+--no-compaction-heuristic::
+   These are to help debugging and tuning an experimental
+   heuristic that shifts the hunk boundary in an attempt to
+   make the resulting patch easier to read.
 
 --minimal::
Spend extra time to make sure the smallest possible
diff --git a/diff.c b/diff.c
index 276174c..02c75c3 100644
--- a/diff.c
+++ b/diff.c
@@ -25,7 +25,7 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_shortest_line_heuristic = 0;
+static int diff_compaction_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -184,8 +184,8 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
-   if (!strcmp(var, "diff.shortestlineheuristic")) {
-   diff_shortest_line_heuristic = git_config_bool(var, value);
+   if (!strcmp(var, "diff.compactionheuristic")) {
+   diff_compaction_heuristic = git_config_bool(var, value);
return 0;
}
if (!strcmp(var, "diff.autorefreshindex")) {
@@ -3240,8 +3240,8 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
-   if (diff_shortest_line_heuristic)
-   DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
+   if (diff_compaction_heuristic)
+   DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3719,10 +3719,10 @@ int diff_opt_parse(struct diff_options *options, const 
char **av, int ac)
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
-   else if (!strcmp(arg, "--shortest-line-heuristic"))
-   DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
-   else if (!strcmp(arg, "--no-shortest-line-heuristic"))
-   DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC);
+   else if (!strcmp(arg, "--compaction-heuristic"))
+   DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
+   else if (!strcmp(arg, "--no-compaction-heuristic"))
+   DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
else if (!strcmp(arg, "--patience"))
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller
On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +static int line_length(const char *recs)
>> +{
>> + char *s = strchr(recs, '\n');
>> + return s ? s - recs : strlen(recs);
>> +}
>
> It seems that you guys are discarding this "number of bytes on a
> line, no matter what these bytes are" idea, so this may be moot, but
> is there a guarantee that reading through recs until you happen to
> see a NUL is safe?
>
> Shouldn't the code that accesses a "line" be using the same "from
> here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
> having to scan the underlying string in an unbounded way?
>
>

I think we're going to make use of xdl_blankline instead of this or
our own "is_emptyline"

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Stefan Beller
On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +static int line_length(const char *recs)
>> +{
>> + char *s = strchr(recs, '\n');
>> + return s ? s - recs : strlen(recs);
>> +}
>
> It seems that you guys are discarding this "number of bytes on a
> line, no matter what these bytes are" idea, so this may be moot, but
> is there a guarantee that reading through recs until you happen to
> see a NUL is safe?

We discarded this idea as it produces to many errors.
(We'd be back at the 50:50 case, "is it really worth it?")

We will go back to the "empty line" heuristic, which will be solved
via xdl_blankline(rec[i]->ptr, rec[i]->size, flags); which could be inlined.
That will solve the CRLF issue as a CR is covered as a whitespace
(with CRLF you'd have to specify diff to ignore white spaces).

For the safety I assumed
* there is always a \n even on the last line by convention.
* in case it is not, the string is null terminated, hence
  strchr and strlen for the rescue.

>
> Shouldn't the code that accesses a "line" be using the same "from
> here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
> having to scan the underlying string in an unbounded way?

xdl_blankline will use ->size, so we'll be holding it right.

Thanks,
Stefan

>
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Junio C Hamano
Stefan Beller  writes:

> +static int line_length(const char *recs)
> +{
> + char *s = strchr(recs, '\n');
> + return s ? s - recs : strlen(recs);
> +}

It seems that you guys are discarding this "number of bytes on a
line, no matter what these bytes are" idea, so this may be moot, but
is there a guarantee that reading through recs until you happen to
see a NUL is safe?

Shouldn't the code that accesses a "line" be using the same "from
here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
having to scan the underlying string in an unbounded way?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Stefan Beller
On Fri, Apr 15, 2016 at 4:32 PM, Jacob Keller  wrote:
> On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller  wrote:
>> There's a few places that will need cleaning up (comments and such)
>> that mention empty line still, but that's not surprising. I am going
>> to test this for a bit on my local repos, and see if it makes any
>> difference to the old heuristic as well.
>>
>> Thanks,
>> Jake
>>
>
> I ran this heuristic on git.git and it produces tons of false positive
> transforms which are much lease readable (to me at least), far more
> than those produced by the newline/blank link heuristic did.
>
> I think we should stick with the empty line heuristic instead of this
> version, even if it's easier to implement this version.

I agree. The heuristic is worse as we often have these 50:50 chances
of messing stuff up.

>
> We still would need to figure out how to handle CRLF properly but it's
> worth resolving that than this heuristic is.
>
> Thanks,
> Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller
On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller  wrote:
> There's a few places that will need cleaning up (comments and such)
> that mention empty line still, but that's not surprising. I am going
> to test this for a bit on my local repos, and see if it makes any
> difference to the old heuristic as well.
>
> Thanks,
> Jake
>

I ran this heuristic on git.git and it produces tons of false positive
transforms which are much lease readable (to me at least), far more
than those produced by the newline/blank link heuristic did.

I think we should stick with the empty line heuristic instead of this
version, even if it's easier to implement this version.

We still would need to figure out how to handle CRLF properly but it's
worth resolving that than this heuristic is.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller
On Fri, Apr 15, 2016 at 4:01 PM, Stefan Beller  wrote:
> In order to produce the smallest possible diff and combine several diff
> hunks together, we implement a heuristic from GNU Diff which moves diff
> hunks forward as far as possible when we find common context above and
> below a diff hunk. This sometimes produces less readable diffs when
> writing C, Shell, or other programming languages, ie:
>
> ...
>  /*
> + *
> + *
> + */
> +
> +/*
> ...
>
> instead of the more readable equivalent of
>
> ...
> +/*
> + *
> + *
> + */
> +
>  /*
> ...
>
> Original discussion and testing found the following heuristic to be
> producing the desired output:
>
>   If there are diff chunks which can be shifted around, shift each hunk
>   such that the last common empty line is below the chunk with the rest
>   of the context above.
>
> This heuristic appears to resolve the above example and several other
> common issues without producing significantly weird results. When
> implementing this heuristic the handling of empty lines was awkward as
> it is unclear what an empty line is. ('\n' or do we include "\r\n" as it
> is common on Windows?) Instead we implement a slightly different heuristic:
>
>   If there are diff chunks which can be shifted around, find the shortest
>   line in the overlapping parts. Use the line with the shortest length that
>   occurs last as the last line of the chunk with the rest
>   of the context above.
>
> However, as with any heuristic it is not really known whether this will
> always be more optimal. Thus, leave the heuristic disabled by default.
>
> Add an XDIFF flag to enable this heuristic only conditionally. Add
> a diff command line option and diff configuration option to allow users
> to enable this option when desired.
>
> TODO:
> * Add tests
> * Add better/more documentation explaining the heuristic, possibly with
>   examples(?)
> * better name(?)
>

There's a few places that will need cleaning up (comments and such)
that mention empty line still, but that's not surprising. I am going
to test this for a bit on my local repos, and see if it makes any
difference to the old heuristic as well.

Thanks,
Jake

> Signed-off-by: Stefan Beller 
> Signed-off-by: Jacob Keller 
> Signed-off-by: Stefan Beller 
> ---
>  Documentation/diff-config.txt  |  6 ++
>  Documentation/diff-options.txt |  6 ++
>  diff.c | 11 +++
>  xdiff/xdiff.h  |  2 ++
>  xdiff/xdiffi.c | 29 +
>  5 files changed, 54 insertions(+)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index edba565..3d99a90 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -170,6 +170,12 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> +diff.shortestLineHeuristic::
> +   Set this option to true to enable the shortest line chunk heuristic 
> when
> +   producing diff output. This heuristic will attempt to shift hunks such
> +   that the last shortest common line occurs below the hunk with the 
> rest of
> +   the context above it.
> +
>  diff.algorithm::
> Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 4b0318e..b1ca83d 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
> Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> +--shortest-line-heuristic::
> +--no-shortest-line-heuristic::
> +   When possible, shift common shortest line in diff hunks below the hunk
> +   such that the last common shortest line for each hunk is below, with 
> the
> +   rest of the context above the hunk.
> +
>  --minimal::
> Spend extra time to make sure the smallest possible
> diff is produced.
> diff --git a/diff.c b/diff.c
> index 4dfe660..a02aff9 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #endif
>
>  static int diff_detect_rename_default;
> +static int diff_shortest_line_heuristic = 0;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char 
> *value, void *cb)
> diff_detect_rename_default = git_config_rename(var, value);
> return 0;
> }
> +   if (!strcmp(var, "diff.shortestlineheuristic")) {
> +   diff_shortest_line_heuristic = git_config_bool(var, value);
> +   return 0;
> +   }
> if (!strcmp(var, "diff.autorefreshindex")) {
> diff_auto_refresh_index = git_config_bool(var, value);
> return 0;
> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
> options->use_color =