Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-18 Thread Patrick Palka
On Thu, Jun 18, 2015 at 11:50 AM, Junio C Hamano gits...@pobox.com wrote:
 Patrick Palka patr...@parcs.ath.cx writes:

 Currently the diff-highlight script does not try to highlight hunks that
 have different numbers of removed/added lines.  But we can be a little
 smarter than that, without introducing much magic and complexity.

 In the case of unevenly-sized hunks, we could still highlight the first
 few (lexicographical) add/remove pairs.  It is not uncommon for hunks to
 have common prefixes, and in such a case this change is very useful
 for spotting differences.

 Signed-off-by: Patrick Palka patr...@parcs.ath.cx
 ---

 Patrick, git shortlog --no-merges contrib/diff-highlight/ is your
 friend to see who may be able to give you a good feedback.

Sorry about that.  I admit the sending of this patch was rushed for no
good reason.


 Jeff, what do you think?

 I have this nagging feeling that it is just as likely that two
 uneven hunks align at the top as they align at the bottom, so while
 this might not hurt it may not be the right approach for a better
 solution, in the sense that when somebody really wants to do a
 better solution, this change and the original code may need to be
 ripped out and redone from scratch.

Hmm, maybe. I stuck with assuming hunks are top-aligned because it
required less code to implement :)

The benefits of a simple dumb solution like assuming hunks align at
the top or bottom is that it remains very easy to visually match up
each highlighted deleted slice with its corresponding highlighted
added slice. If we start matching up similar lines or something like
that then it seems we would have to mostly forsake this benefit.  A
stupid algorithm in this case is nice because its output is
predictable.

A direct improvement upon this patch that would not require redoing
the whole script from scratch would be to first to calculate the
highlighting assuming the hunk aligns at the top, then to calculate
the highlighting assuming the hunk aligns at the bottom, and to pick
out of the two the highlighting with the least noise.  Though we
would still be out of luck if the hunk is more complicated than being
top-aligned or bottom-aligned.

By the way, what would it take to get something like this script into
git proper?  It is IMHO immensely useful even in its current form, yet
because it's not baked into the application hardly anybody knows about
it.


  contrib/diff-highlight/diff-highlight | 26 +-
  1 file changed, 17 insertions(+), 9 deletions(-)

 diff --git a/contrib/diff-highlight/diff-highlight 
 b/contrib/diff-highlight/diff-highlight
 index ffefc31..0dfbebd 100755
 --- a/contrib/diff-highlight/diff-highlight
 +++ b/contrib/diff-highlight/diff-highlight
 @@ -88,22 +88,30 @@ sub show_hunk {
   return;
   }

 - # If we have mismatched numbers of lines on each side, we could try to
 - # be clever and match up similar lines. But for now we are simple and
 - # stupid, and only handle multi-line hunks that remove and add the same
 - # number of lines.
 - if (@$a != @$b) {
 - print @$a, @$b;
 - return;
 - }
 + # We match up the first MIN(a, b) lines on each side.
 + my $c = @$a  @$b ? @$a : @$b;

 + # Highlight each pair, and print each removed line of that pair.
   my @queue;
 - for (my $i = 0; $i  @$a; $i++) {
 + for (my $i = 0; $i  $c; $i++) {
   my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]);
   print $rm;
   push @queue, $add;
   }
 +
 + # Print the remaining unmatched removed lines of the hunk.
 + for (my $i = $c; $i  @$a; $i++) {
 + print $a-[$i];
 + }
 +
 + # Print the added lines of each highlighted pair.
   print @queue;
 +
 + # Print the remaining unmatched added lines of the hunk.
 + for (my $i = $c; $i  @$b; $i++) {
 + print $b-[$i];
 + }
 +
  }

  sub highlight_pair {
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-18 Thread Patrick Palka

On Thu, 18 Jun 2015, Jeff King wrote:


On Thu, Jun 18, 2015 at 11:08:16AM -0700, Junio C Hamano wrote:


So as I said, I do not think it would hurt to have this as an
incremental improvement (albeit going in a possibly wrong
direction).

Of course, it is a separate question if this change makes the output
worse, by comparing unmatched early parts of two hunks and making
nonsense highlight by calling highlight_pair() more often.  As long
as that is not an issue, I am not opposed to this change, which was
what I meant to say by this might not hurt.


Yes, that is my big concern, and why I punted on mismatched-size hunks
in the first place. Now that we have a patch, it is easy enough to git
log -p | diff-highlight with the old and new versions to compare the
results.

It certainly does improve some cases. E.g.:

 -foo
 +foo 
 +bar

in a test script becomes more clear. But some of the output is not so
great. For instance, the very commit under discussion has a
confusing and useless highlight. Or take a documentation patch like
5c31acfb, where I find the highlights actively distracting. We are saved
a little by the if the whole line is different, do not highlight at
all behavior of 097128d1bc.


To fix the useless highlights for both evenly and unevenly sized hunks
(like when all but a semicolon on a line changes), one can loosen the
criterion for not highlighting from do not highlight if 0% of the
before and after lines are common between them to, say, do not
highlight if less than 10% of the before and after lines are common
between them.  Then most of these useless highlights are gone for both
evenly and unevenly sized hunks.

Here is a patch that changes the criterion as mentioned.  Testing this
change on the documentation patch 5c31acfb, only two pairs of lines are
highlighted instead of six.  On my original patch, the useless highlight
is gone.  The useless semicolon-related highlights on e.g. commit
99a2cfb are gone.

Ten percent is a modest threshold, and perhaps it should be increased
when highlighting unevenly sized hunks and decreased when highlighting
evenly sized hunks.

Of course, these patches are both hacks but they seem to be surprisingly
effective hacks especially when paired together.



So I dunno. IMHO this does more harm than good, and I would not want to
use it myself. But it is somewhat a matter of taste; I am not opposed to
making it a configurable option.


That is something I can do :)



-Peff



-- 8 --

Subject: [PATCH] diff-highlight: don't highlight lines that have little in
 common

---
 contrib/diff-highlight/diff-highlight | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index 85d2eb0..e4829ec 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -218,8 +218,13 @@ sub is_pair_interesting {
my $suffix_a = join('', @$a[($sa+1)..$#$a]);
my $suffix_b = join('', @$b[($sb+1)..$#$b]);

-   return $prefix_a !~ /^$COLOR*-$BORING*$/ ||
-  $prefix_b !~ /^$COLOR*\+$BORING*$/ ||
-  $suffix_a !~ /^$BORING*$/ ||
-  $suffix_b !~ /^$BORING*$/;
+   $prefix_a =~ s/^$COLOR*-$BORING*//;
+   $prefix_b =~ s/^$COLOR*\+$BORING*//;
+   $suffix_a =~ s/$BORING*$//;
+   $suffix_b =~ s/$BORING*$//;
+
+   # Only bother highlighting if at least 10% of each line is common among
+   # the lines.
+   return ((length($prefix_a)+length($suffix_a))*100 = @$a*10) 
+  ((length($prefix_b)+length($suffix_b))*100 = @$b*10);
 }
--
2.4.4.410.g43ed522.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-18 Thread Patrick Palka
On Thu, Jun 18, 2015 at 2:08 PM, Junio C Hamano gits...@pobox.com wrote:
 Patrick Palka patr...@parcs.ath.cx writes:

 I have this nagging feeling that it is just as likely that two
 uneven hunks align at the top as they align at the bottom, so while
 this might not hurt it may not be the right approach for a better
 solution, in the sense that when somebody really wants to do a
 better solution, this change and the original code may need to be
 ripped out and redone from scratch.

 Hmm, maybe. I stuck with assuming hunks are top-aligned because it
 required less code to implement :)

 Yeah, I understand that.

 If we will need to rip out only this change but keep the original in
 order to implement a future better solution, then we might be better
 off not having this change (if we anticipate such a better solution
 to come reasonably soon), because it would make it more work for the
 final improved solution.  But if we need to rip out the original as
 well as this change while we do so, then having this patch would not
 make it more work, either.

 So as I said, I do not think it would hurt to have this as an
 incremental improvement (albeit going in a possibly wrong
 direction).

 Of course, it is a separate question if this change makes the output
 worse, by comparing unmatched early parts of two hunks and making
 nonsense highlight by calling highlight_pair() more often.  As long
 as that is not an issue, I am not opposed to this change, which was
 what I meant to say by this might not hurt.


That makes sense.  The extra useless highlighting indeed is currently
a problem but it may yet be worked around.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-18 Thread Patrick Palka
On Thu, Jun 18, 2015 at 3:08 PM, Jeff King p...@peff.net wrote:
 On Thu, Jun 18, 2015 at 12:28:58PM -0400, Patrick Palka wrote:

 By the way, what would it take to get something like this script into
 git proper?  It is IMHO immensely useful even in its current form, yet
 because it's not baked into the application hardly anybody knows about
 it.

 I think if we were going to make it more official, it would make sense
 to do it inside the diff code itself (i.e., not as a separate script),
 and it might be reasonable at that point to actually do a real
 character-based diff rather than the hacky prefix/suffix thing (or
 possibly even integrate with the color-words patterns to find
 syntactically interesting breaks). There is some discussion in the
 Bugs section of contrib/diff-highlight/README.

 -Peff

Thanks for the pointers.  This is something I am interested in
implementing (though not any time soon).  I was actually in the
process of familiarizing myself with the diff code before I discovered
the existence of diff-highlight by accident.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-18 Thread Patrick Palka
On Thu, Jun 18, 2015 at 5:23 PM, Jeff King p...@peff.net wrote:
 On Thu, Jun 18, 2015 at 04:45:05PM -0400, Jeff King wrote:

 Still, I think this is probably a minority case, and it may be
 outweighed by the improvements. The real solution is to consider the
 hunk as a whole and do an LCS diff on it, which would show that yes,
 it's worth highlighting both of those spots, as they are a small
 percentage of the total hunk.

 I've been meaning to play with this for years, so I took the opportunity
 to spend a little time on it. :)

Cool!


 Below is a (slightly hacky) patch I came up with. It seems to work, and
 produces really great output in some cases. For instance, in 99a2cfb, it
 produces (I put highlighted bits in angle brackets):

   -   hashcpy(peeled, sha1);
   +   oidcpy(peeled, oid);

 It also produces nonsense like:

   -   unsigned char peeled[20];
   +   struct object_id peeled;

That's not even so bad.  The diff of the change itself is... interesting.


 but I think that is simply because my splitting function is terrible (it
 splits each byte, whereas we'd probably want to use whitespace and
 punctuation, or something content-specific).

I hope you can polish this.  It definitely has potential.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-18 Thread Patrick Palka

On Thu, 18 Jun 2015, Jeff King wrote:


On Thu, Jun 18, 2015 at 04:14:19PM -0400, Patrick Palka wrote:


in a test script becomes more clear. But some of the output is not so
great. For instance, the very commit under discussion has a
confusing and useless highlight. Or take a documentation patch like
5c31acfb, where I find the highlights actively distracting. We are saved
a little by the if the whole line is different, do not highlight at
all behavior of 097128d1bc.


To fix the useless highlights for both evenly and unevenly sized hunks
(like when all but a semicolon on a line changes), one can loosen the
criterion for not highlighting from do not highlight if 0% of the
before and after lines are common between them to, say, do not
highlight if less than 10% of the before and after lines are common
between them.  Then most of these useless highlights are gone for both
evenly and unevenly sized hunks.


Yeah, this is an idea I had considered but never actually experimented
with. It does make some things better, but it also makes some a little
worse. For example, in 8dbf3eb, the hunk:

-   const char *plain = diff_get_color(ecb-color_diff,
-  DIFF_PLAIN);
+   const char *context = diff_get_color(ecb-color_diff,
+DIFF_CONTEXT);

currently gets the plain/context change in the first line highlighted,
as well as the DIFF_PLAIN/DIFF_CONTEXT in the second line. With a 10%
limit, the second line isn't highlighted. That's correct by the
heuristic, but it's a bit harder to read, because the highlight draws
your eye to the first change, and it is easy to miss the second.


Good example, this actually exposes a bug in the heuristic.  Each line
is around 15 characters long and the common affix ); is 2 characters
long, which is about 15% of 15.  So the DIFF_PLAIN/DIFF_CONTEXT pair
ought to be highlighted.

The patch was unintentionally comparing the lengths of the common
affixes against the length of the entire line, whitespace and color
codes and all.  In effect this meant that the 10% threshold was much
higher.  We should compare against the length of the non-boring parts of
the whole line when determining the percentage in common.  Attached is a
revised patch.



Still, I think this is probably a minority case, and it may be
outweighed by the improvements. The real solution is to consider the
hunk as a whole and do an LCS diff on it, which would show that yes,
it's worth highlighting both of those spots, as they are a small
percentage of the total hunk.


Here is a patch that changes the criterion as mentioned.  Testing this
change on the documentation patch 5c31acfb, only two pairs of lines are
highlighted instead of six.  On my original patch, the useless highlight
is gone.  The useless semicolon-related highlights on e.g. commit
99a2cfb are gone.


Nice, the ones like 99a2cfb are definitely wrong (I had though to fix
them eventually by treating some punctuation as uninteresting, but I
suspect the percentage heuristic covers that reasonably well in
practice).


Of course, these patches are both hacks but they seem to be surprisingly
effective hacks especially when paired together.


The whole script is a (surprisingly effective) hack. ;)


So I dunno. IMHO this does more harm than good, and I would not want to
use it myself. But it is somewhat a matter of taste; I am not opposed to
making it a configurable option.


That is something I can do :)


Coupled with the 10%-threshold patch, I think it would be OK to include
it unconditionally. So far we've just been diffing the two outputs and
micro-analyzing them. The real test to me will be using it in practice
and seeing if it's helpful or annoying.


-- 8 --

Subject: [PATCH] diff-highlight: don't highlight lines that have little in
 common

---
 contrib/diff-highlight/diff-highlight | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index 85d2eb0..0cc2b60 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -218,8 +218,21 @@ sub is_pair_interesting {
my $suffix_a = join('', @$a[($sa+1)..$#$a]);
my $suffix_b = join('', @$b[($sb+1)..$#$b]);

-   return $prefix_a !~ /^$COLOR*-$BORING*$/ ||
-  $prefix_b !~ /^$COLOR*\+$BORING*$/ ||
-  $suffix_a !~ /^$BORING*$/ ||
-  $suffix_b !~ /^$BORING*$/;
+   $prefix_a =~ s/^$COLOR*-$BORING*//;
+   $prefix_b =~ s/^$COLOR*\+$BORING*//;
+   $suffix_a =~ s/$BORING*$//;
+   $suffix_b =~ s/$BORING*$//;
+
+   my $whole_a = join ('', @$a);
+   $whole_a =~ s/^$COLOR*-$BORING*//;
+   $whole_a =~ s/$BORING*$//;
+
+   my $whole_b = join ('', @$b);
+   $whole_b =~ s/^$COLOR*\+$BORING*//;
+   $whole_b =~ s/$BORING*$//;
+
+   # Only bother highlighting if at least

[PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-15 Thread Patrick Palka
Currently the diff-highlight script does not try to highlight hunks that
have different numbers of removed/added lines.  But we can be a little
smarter than that, without introducing much magic and complexity.

In the case of unevenly-sized hunks, we could still highlight the first
few (lexicographical) add/remove pairs.  It is not uncommon for hunks to
have common prefixes, and in such a case this change is very useful
for spotting differences.

Signed-off-by: Patrick Palka patr...@parcs.ath.cx
---
 contrib/diff-highlight/diff-highlight | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index ffefc31..0dfbebd 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -88,22 +88,30 @@ sub show_hunk {
return;
}
 
-   # If we have mismatched numbers of lines on each side, we could try to
-   # be clever and match up similar lines. But for now we are simple and
-   # stupid, and only handle multi-line hunks that remove and add the same
-   # number of lines.
-   if (@$a != @$b) {
-   print @$a, @$b;
-   return;
-   }
+   # We match up the first MIN(a, b) lines on each side.
+   my $c = @$a  @$b ? @$a : @$b;
 
+   # Highlight each pair, and print each removed line of that pair.
my @queue;
-   for (my $i = 0; $i  @$a; $i++) {
+   for (my $i = 0; $i  $c; $i++) {
my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]);
print $rm;
push @queue, $add;
}
+
+   # Print the remaining unmatched removed lines of the hunk.
+   for (my $i = $c; $i  @$a; $i++) {
+   print $a-[$i];
+   }
+
+   # Print the added lines of each highlighted pair.
print @queue;
+
+   # Print the remaining unmatched added lines of the hunk.
+   for (my $i = $c; $i  @$b; $i++) {
+   print $b-[$i];
+   }
+
 }
 
 sub highlight_pair {
-- 
2.4.3.413.ga5fe668

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] reset: optionally setup worktree and refresh index on --mixed

2014-02-16 Thread Patrick Palka
On Sat, Feb 15, 2014 at 9:28 PM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote:
 Refreshing index requires work tree. So we have to options: always set
 up work tree (and refuse to reset if failing to do so), or make
 refreshing index optional.

 As refreshing index is not the main task, it makes more sense to make
 it optional.

 Reported-by: Patrick Palka patr...@parcs.ath.cx
 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com

Thanks!  I can confirm that this change fixes my use case.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git-reset does not seem to respect GIT_WORK_TREE

2014-02-14 Thread Patrick Palka
Hi everyone,

I noticed that git-reset does not seem to respect GIT_WORK_TREE.  Here
is a simplified test case:

$ mkdir src_dir  cd src_dir
$ git init
$ touch A  git add A  git commit -m Dummy commit.
$ mkdir ../build_dir  cd ../build_dir
$ export GIT_WORK_TREE=../src_dir
$ export GIT_DIR=../src_dir/.git
$ git reset
Unstaged changes after reset:
D   A

The final command git reset erroneously suggests that the file A
does not exist in the working tree.  Does anybody know why git-reset
behaves this way?

Thanks,
Patrick
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: improve the example of overriding LESS via core.pager

2012-10-28 Thread Patrick Palka
You can override an option set in the LESS variable by simply prefixing
the command line option with `-+`. This is more robust than the previous
example if the default LESS options are to ever change.

Signed-off-by: Patrick Palka patr...@parcs.ath.cx
---
 Documentation/config.txt |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 11f320b..9a0544c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -538,14 +538,14 @@ core.pager::
`LESS` variable to some other value.  Alternately,
these settings can be overridden on a project or
global basis by setting the `core.pager` option.
-   Setting `core.pager` has no affect on the `LESS`
+   Setting `core.pager` has no effect on the `LESS`
environment variable behaviour above, so if you want
to override git's default settings this way, you need
to be explicit.  For example, to disable the S option
in a backward compatible manner, set `core.pager`
-   to `less -+$LESS -FRX`.  This will be passed to the
-   shell by git, which will translate the final command to
-   `LESS=FRSX less -+FRSX -FRX`.
+   to `less -+S`.  This will be passed to the shell by
+   git, which will translate the final command to
+   `LESS=FRSX less -+S`.
 
 core.whitespace::
A comma separated list of common whitespace problems to
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html