Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:
> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
> 
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

It's important to cover most common cases occurring naturally in
people's repositories, while trying to avoid false positives (the latter
is important more now, where gitweb doesn't check for rev name validity).

I guess that most common is to use non-hierarchical tags with ASCII-only
names for describe output, so while "æ/var-gf6727b0" won't be picked,
it is IMVHO also much less likely to occur.

> 
> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
> 
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
> 
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.

All right.  That's good and enough for one patch.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 

Acked-by: Jakub Narębski 

BTW. to add to whole "git describe" output or not discussion: having link
span whole revision marker makes for easier to use UI: larger are to hit.

> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d
> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()
> +-g[0-9a-fA-F]{7,40}
> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b

Nice using of eXplained regexp.  It is much easier to understand with
comments.

> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }
> 



Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I just ran into an example of a better reason for doing it like my
> patch is doing, which is that if you have some tag like:
>
> deployment-20160928-171914-16-g42e13d8
>
> With my patch the whole thing will be a link to the 42e13d8 commit,
> but with this suggestion both 20160928 and 42e13d8 would become commit
> links, the former one would be broken.
>
> Of course we could have some code that would detect that the whole \S+
> is part of one thing that ends in g,...

I think that this example shows a flaw not in the "suffix that looks
like an object name" approach, but more in the boundary regexp,
namely, the \b part; it is probably too loose.

And \S+ is not the right cue, either, for that matter.  IOW, "we
only should take hexstring, optionally prefixed with 'g', that
appears before the whitespace" is too strict, as a sentence

We broke the system with deployment-g42e13d8.

does want to link to 42e13d8, even though full-stop at the end is
not whitespace, and the existing regexp uses \b there as a rough
equivalent to saying "Here must be a whitespace or punctuation".

An attempt to tighten "what a punctuation is" by excluding '-' may
make that "timestamp is in the tagname" example work, but is not a
good solution, either, because two sentences can be concatenated
with an em-dash that is often typed as two hyphen in plain text,
resulting in something like

We broke the system with deployment-g42e13d8--sigh.

and we do want to treat the '-' after 42e13d8 as a punctuation after
a described object name.

So I agree 3/3 is good thing to do as-is.

Thanks.



Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Ævar Arnfjörð Bjarmason
On Sun, Oct 9, 2016 at 1:20 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano  wrote:
>> Ævar Arnfjörð Bjarmason   writes:
>>
>>> Change the log formatting function to know about "git describe" output
>>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>>>
>>> There are still many valid refnames that we don't link to
>>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>>> similarly it's trivially possible to create some refnames like
>>> "æ/var-gf6727b0" or which won't be picked up by this regex.
>>
>> Not a serious counter-proposal or suggestion, and certainly not an
>> objection to the patch I am responding to, but I wonder if it is so
>> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.
>>
>> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
>> allowed an optional 'g' in front of the hex, e.g.
>>
>> -   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> +   $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{
>>
>> wouldn't that be much simpler, covers more cases and sufficient?
>
> It would be more simpler, maybe that's the right approach. I have a
> preference for making the entire reference a link. I think it makes
> more sense for the UI.

I just ran into an example of a better reason for doing it like my
patch is doing, which is that if you have some tag like:

deployment-20160928-171914-16-g42e13d8

 With my patch the whole thing will be a link to the 42e13d8 commit,
but with this suggestion both 20160928 and 42e13d8 would become commit
links, the former one would be broken.

Of course we could have some code that would detect that the whole \S+
is part of one thing that ends in g, but the complexity in
doing that would be equivalent or more to doing what my patch is
doing.

>>> There's surely room for improvement here, but I just wanted to address
>>> the very common case of sticking "git describe" output into commit
>>> messages without trying to link to all possible refnames, that's going
>>> to be a rather futile exercise given that this is free text, and it
>>> would be prohibitively expensive to look up whether the references in
>>> question exist in our repository.
>>>
>>> There was on-list discussion about how we could do better than this
>>> patch. Junio suggested to update parse_commits() to call a new
>>> "gitweb--helper" command which would pass each of the revision
>>> candidates through "rev-parse --verify --quiet". That would cut down
>>> on our false positives (e.g. we'll link to "deadbeef"), and also allow
>>> us to be more aggressive in selecting candidate revisions.
>>>
>>> That may be too expensive to work in practice, or it may
>>> not. Investigating that would be a good follow-up to this patch.
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>>> ---
>>>  gitweb/gitweb.perl | 18 --
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index 92b5e91..7cf68f0 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl
>>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>>   my $line = shift;
>>>
>>>   $line = esc_html($line, -nbsp=>1);
>>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>>> + $line =~ s{
>>> +\b
>>> +(
>>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
>>> +# or hadoop-20160921-113441-20-g094fb7d
>>> +(?>> +[A-Za-z0-9.-]+
>>> +(?!\.) # refs can't end with ".", see check_refname_format()
>>> +-g[0-9a-fA-F]{7,40}
>>> +|
>>> +# Just a normal looking Git SHA1
>>> +[0-9a-fA-F]{7,40}
>>> +)
>>> +\b
>>> +}{
>>>   $cgi->a({-href => href(action=>"object", hash=>$1),
>>>   -class => "text"}, $1);
>>> - }eg;
>>> + }egx;
>>>
>>>   return $line;
>>>  }


Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-09 Thread Ævar Arnfjörð Bjarmason
On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Change the log formatting function to know about "git describe" output
>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>>
>> There are still many valid refnames that we don't link to
>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>> similarly it's trivially possible to create some refnames like
>> "æ/var-gf6727b0" or which won't be picked up by this regex.
>
> Not a serious counter-proposal or suggestion, and certainly not an
> objection to the patch I am responding to, but I wonder if it is so
> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.
>
> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
> allowed an optional 'g' in front of the hex, e.g.
>
> -   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> +   $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{
>
> wouldn't that be much simpler, covers more cases and sufficient?

It would be more simpler, maybe that's the right approach. I have a
preference for making the entire reference a link. I think it makes
more sense for the UI.

>> There's surely room for improvement here, but I just wanted to address
>> the very common case of sticking "git describe" output into commit
>> messages without trying to link to all possible refnames, that's going
>> to be a rather futile exercise given that this is free text, and it
>> would be prohibitively expensive to look up whether the references in
>> question exist in our repository.
>>
>> There was on-list discussion about how we could do better than this
>> patch. Junio suggested to update parse_commits() to call a new
>> "gitweb--helper" command which would pass each of the revision
>> candidates through "rev-parse --verify --quiet". That would cut down
>> on our false positives (e.g. we'll link to "deadbeef"), and also allow
>> us to be more aggressive in selecting candidate revisions.
>>
>> That may be too expensive to work in practice, or it may
>> not. Investigating that would be a good follow-up to this patch.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  gitweb/gitweb.perl | 18 --
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 92b5e91..7cf68f0 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>   my $line = shift;
>>
>>   $line = esc_html($line, -nbsp=>1);
>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> + $line =~ s{
>> +\b
>> +(
>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
>> +# or hadoop-20160921-113441-20-g094fb7d
>> +(?> +[A-Za-z0-9.-]+
>> +(?!\.) # refs can't end with ".", see check_refname_format()
>> +-g[0-9a-fA-F]{7,40}
>> +|
>> +# Just a normal looking Git SHA1
>> +[0-9a-fA-F]{7,40}
>> +)
>> +\b
>> +}{
>>   $cgi->a({-href => href(action=>"object", hash=>$1),
>>   -class => "text"}, $1);
>> - }eg;
>> + }egx;
>>
>>   return $line;
>>  }


Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-06 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

Not a serious counter-proposal or suggestion, and certainly not an
objection to the patch I am responding to, but I wonder if it is so
bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.

IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
allowed an optional 'g' in front of the hex, e.g.

-   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+   $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{

wouldn't that be much simpler, covers more cases and sufficient?

> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
>
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
>
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d
> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()
> +-g[0-9a-fA-F]{7,40}
> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b
> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }


[PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-06 Thread Ævar Arnfjörð Bjarmason
Change the log formatting function to know about "git describe" output
such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".

There are still many valid refnames that we don't link to
e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
similarly it's trivially possible to create some refnames like
"æ/var-gf6727b0" or which won't be picked up by this regex.

There's surely room for improvement here, but I just wanted to address
the very common case of sticking "git describe" output into commit
messages without trying to link to all possible refnames, that's going
to be a rather futile exercise given that this is free text, and it
would be prohibitively expensive to look up whether the references in
question exist in our repository.

There was on-list discussion about how we could do better than this
patch. Junio suggested to update parse_commits() to call a new
"gitweb--helper" command which would pass each of the revision
candidates through "rev-parse --verify --quiet". That would cut down
on our false positives (e.g. we'll link to "deadbeef"), and also allow
us to be more aggressive in selecting candidate revisions.

That may be too expensive to work in practice, or it may
not. Investigating that would be a good follow-up to this patch.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/gitweb.perl | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 92b5e91..7cf68f0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,10 +2036,24 @@ sub format_log_line_html {
my $line = shift;
 
$line = esc_html($line, -nbsp=>1);
-   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+   $line =~ s{
+\b
+(
+# The output of "git describe", e.g. v2.10.0-297-gf6727b0
+# or hadoop-20160921-113441-20-g094fb7d
+(?a({-href => href(action=>"object", hash=>$1),
-class => "text"}, $1);
-   }eg;
+   }egx;
 
return $line;
 }
-- 
2.9.3