Michael Wagner wrote:
> On Tue, May 27, 2014 at 04:22:42PM +0200, Jakub Narębski wrote:

>> Subject: [PATCH] gitweb: Harden UTF-8 handling in generated links
>>
>> esc_html() ensures that its input is properly UTF-8 encoded and marked
>> as UTF-8 with to_utf8().  Make esc_param() (used for query parameters
>> in generated URLs), esc_path_info() (for escaping path_info
>> components) and esc_url() use it too.
>>
>> This hardens gitweb against errors in UTF-8 handling; because
>> to_utf8() is idempotent it won't change correct output.
[...]
>>   sub esc_param {
>>      my $str = shift;
>>      return undef unless defined $str;
>> +
>> +    $str = to_utf8($str);
>>      $str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;
>>      $str =~ s/ /\+/g;
>> +
>>      return $str;
>>   }   
 
> While trying to view a "blob_plain" of "Gütekritierien.txt", a 404 error
> occured. "git_get_hash_by_path" tries to resolve the hash with the wrong
> filename (git ls-tree -z HEAD -- Gütekriterien.txt) and fails.
> 
> The filename needs the correct encoding. Something like this is probably
> needed for all filenames and should be done at a prior stage:

True.

First, I wonder why the tests I did for this situation didn't
show any errors even before the "harden href()" patch. What
is different in your config that you see those errors?

> ---
>   gitweb/gitweb.perl |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 77e1312..e4a50e7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -4725,7 +4725,7 @@ sub git_print_tree_entry {
>                  }
>                  print " | " .
>                          $cgi->a({-href => href(action=>"blob_plain", 
> hash_base=>$hash_base,
> -                                              
> file_name=>"$basedir$t->{'name'}")},
> +                                              file_name=>"$basedir" . 
> to_utf8($t->{'name'}))},

Second, my "harder href()" patch does not work for this because
concatenation of non-UFT8 with UTF8 string screws up Perl
knowledge what is and isn't UTF8.  So to_utf8() after concat
doesn't help.


>                                  "raw");
>                  print "</td>\n";
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to