Re: [PATCH] gitk: don't highlight submodule diff lines outside submodule diffs
06.11.2018 23:06, Stefan Beller пишет: On Tue, Nov 6, 2018 at 12:03 PM Роман Донченко wrote: A line that starts with " <" or " >" is not necessarily a submodule diff line. It might just be a context line in a normal diff, representing a line starting with " <" or " >" respectively. Use the currdiffsubmod variable to track whether we are currently inside a submodule diff and only highlight these lines if we are. This explanation makes sense, some prior art is at https://public-inbox.org/git/20181021163401.4458-1-du...@example.com/ which was not taken AFAICT. Didn't see that patch. That said, I think it's incorrect, since it never resets currdiffsubmod back to the empty string, so if a normal diff follows a submodule diff, the same issue will occur. (The `set $currdiffsubmod ""` lines that are already there are effectively useless because they set the variable whose name is the contents of currdiffsubmod, rather than currdiffsubmod itself. I assume it was a typo.) -Roman Thanks, Stefan Signed-off-by: Роман Донченко --- gitk | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gitk b/gitk index a14d7a1..6bb6dc6 100755 --- a/gitk +++ b/gitk @@ -8109,6 +8109,8 @@ proc parseblobdiffline {ids line} { } # start of a new file set diffinhdr 1 + set currdiffsubmod "" + $ctext insert end "\n" set curdiffstart [$ctext index "end - 1c"] lappend ctext_file_names "" @@ -8191,12 +8193,10 @@ proc parseblobdiffline {ids line} { } else { $ctext insert end "$line\n" filesep } -} elseif {![string compare -length 3 " >" $line]} { - set $currdiffsubmod "" +} elseif {$currdiffsubmod ne "" && ![string compare -length 3 " >" $line]} { set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" dresult -} elseif {![string compare -length 3 " <" $line]} { - set $currdiffsubmod "" +} elseif {$currdiffsubmod ne "" && ![string compare -length 3 " <" $line]} { set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" d0 } elseif {$diffinhdr} { -- 2.19.1.windows.1
[PATCH] gitk: don't highlight submodule diff lines outside submodule diffs
A line that starts with " <" or " >" is not necessarily a submodule diff line. It might just be a context line in a normal diff, representing a line starting with " <" or " >" respectively. Use the currdiffsubmod variable to track whether we are currently inside a submodule diff and only highlight these lines if we are. Signed-off-by: Роман Донченко --- gitk | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gitk b/gitk index a14d7a1..6bb6dc6 100755 --- a/gitk +++ b/gitk @@ -8109,6 +8109,8 @@ proc parseblobdiffline {ids line} { } # start of a new file set diffinhdr 1 + set currdiffsubmod "" + $ctext insert end "\n" set curdiffstart [$ctext index "end - 1c"] lappend ctext_file_names "" @@ -8191,12 +8193,10 @@ proc parseblobdiffline {ids line} { } else { $ctext insert end "$line\n" filesep } -} elseif {![string compare -length 3 " >" $line]} { - set $currdiffsubmod "" +} elseif {$currdiffsubmod ne "" && ![string compare -length 3 " >" $line]} { set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" dresult -} elseif {![string compare -length 3 " <" $line]} { - set $currdiffsubmod "" +} elseif {$currdiffsubmod ne "" && ![string compare -length 3 " <" $line]} { set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" d0 } elseif {$diffinhdr} { -- 2.19.1.windows.1
[PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
More specifically: * Add "\" to the list of characters not allowed in a token (see RFC 2047 errata). * Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides removing duplication, this also makes unquote_rfc2047 more stringent. * Allow both "q" and "Q" to identify the encoding. * Allow lowercase hexadecimal digits in the "Q" encoding. And, more on the cosmetic side: * Change the "encoded-text" regex to exclude rather than include characters, for clarity and consistency with "token". Signed-off-by: Роман Донченко Acked-by: Jeff King --- git-send-email.perl | 30 +++--- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index 9949db0..d461ffb 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 }; my $smtp; my $auth; +# Regexes for RFC 2047 productions. +my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/; +my $re_encoded_text = qr/[^? \000-\037\177-\377]+/; +my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/; + # Variables we fill in automatically, or via prompting: my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh, $initial_reply_to,$initial_subject,@files, @@ -913,15 +918,20 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; - my $encoding; - s{=\?([^?]+)\?q\?(.*?)\?=}{ - $encoding = $1; - my $e = $2; - $e =~ s/_/ /g; - $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg; - $e; + my $charset; + s{$re_encoded_word}{ + $charset = $1; + my $encoding = $2; + my $text = $3; + if ($encoding eq 'q' || $encoding eq 'Q') { + $text =~ s/_/ /g; + $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi; + $text; + } else { + $&; # other encodings not supported yet + } }eg; - return wantarray ? ($_, $encoding) : $_; + return wantarray ? ($_, $charset) : $_; } sub quote_rfc2047 { @@ -934,10 +944,8 @@ sub quote_rfc2047 { sub is_rfc2047_quoted { my $s = shift; - my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/; - my $encoded_text = qr/[!->@-~]+/; length($s) <= 75 && - $s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o; + $s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o; } sub subject_needs_rfc2047_quoting { -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). Signed-off-by: Роман Донченко Acked-by: Jeff King --- git-send-email.perl | 26 -- t/t9001-send-email.sh | 7 +++ 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index d461ffb..7d5cc8a 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -919,17 +919,23 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; my $charset; - s{$re_encoded_word}{ - $charset = $1; - my $encoding = $2; - my $text = $3; - if ($encoding eq 'q' || $encoding eq 'Q') { - $text =~ s/_/ /g; - $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi; - $text; - } else { - $&; # other encodings not supported yet + my $sep = qr/[ \t]+/; + s{$re_encoded_word(?:$sep$re_encoded_word)*}{ + my @words = split $sep, $&; + foreach (@words) { + m/$re_encoded_word/; + $charset = $1; + my $encoding = $2; + my $text = $3; + if ($encoding eq 'q' || $encoding eq 'Q') { + $_ = $text; + s/_/ /g; + s/=([0-9A-F]{2})/chr(hex($1))/egi; + } else { + # other encodings not supported yet + } } + join '', @words; }eg; return wantarray ? ($_, $charset) : $_; } diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 19a3ced..fa965ff 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is suppressed' " 'non_ascii_self_suppressed' " +# This name is long enough to force format-patch to split it into multiple +# encoded-words, assuming it uses UTF-8 with the "Q" encoding. +test_expect_success $PREREQ 'long non-ascii self name is suppressed' " + test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=m...@example.com' \ + 'long_non_ascii_self_suppressed' +" + test_expect_success $PREREQ 'sanitized self name is suppressed' " test_suppress_self_unquoted '\"A U. Thor\"' 'aut...@example.com' \ 'self_name_sanitized_suppressed' -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
Philip Oakley писал в своём письме Sun, 07 Dec 2014 20:48:05 +0300: From: "Роман Донченко" Jeff King писал в своём письме Sun, 07 Dec 2014 12:18:59 +0300: On Sat, Dec 06, 2014 at 10:36:23PM +0300, Роман Донченко wrote: One final note on this bit of code: if there are multiple encoded words, we grab the $charset from the final encoded word, and never report the earlier charsets. Technically they do not all have to be the same (rfc2047 even has an example where they are not). I think we can dismiss this, though, as: 1. It was like this before your patches (we might have seen multiple non-adjacent encoded words; you're just handling adjacent ones), and nobody has complained. 2. Using two separate encodings in the same header is sufficiently ridiculous that I can live with us not handling it properly. Yeah, that bugs me as well. But I think handling multiple encodings would require substantial reworking of the code, so I chickened out (with the same excuses :-)). Would that be worth a terse comment in the documentation change part of the patch? "Multiple (RFC2047) encodings are not supported.", or would that be bike shed noise. I didn't change any documentation... and in either case, they weren't supported in the first place, so I don't think it's anything I need to mention. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
Jeff King писал в своём письме Sun, 07 Dec 2014 12:18:59 +0300: On Sat, Dec 06, 2014 at 10:36:23PM +0300, Роман Донченко wrote: The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). Thanks. Both patches look good to me, and I'd be happy to have them applied as-is. I wrote a few comments below, but in all cases I think I convinced myself that what you wrote is best. I had the same concerns myself, and eventually convinced myself of the same. :-) One final note on this bit of code: if there are multiple encoded words, we grab the $charset from the final encoded word, and never report the earlier charsets. Technically they do not all have to be the same (rfc2047 even has an example where they are not). I think we can dismiss this, though, as: 1. It was like this before your patches (we might have seen multiple non-adjacent encoded words; you're just handling adjacent ones), and nobody has complained. 2. Using two separate encodings in the same header is sufficiently ridiculous that I can live with us not handling it properly. Yeah, that bugs me as well. But I think handling multiple encodings would require substantial reworking of the code, so I chickened out (with the same excuses :-)). Roman. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
More specifically: * Add "\" to the list of characters not allowed in a token (see RFC 2047 errata). * Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides removing duplication, this also makes unquote_rfc2047 more stringent. * Allow both "q" and "Q" to identify the encoding. * Allow lowercase hexadecimal digits in the "Q" encoding. And, more on the cosmetic side: * Change the "encoded-text" regex to exclude rather than include characters, for clarity and consistency with "token". Signed-off-by: Роман Донченко --- git-send-email.perl | 30 +++--- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index 9949db0..d461ffb 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 }; my $smtp; my $auth; +# Regexes for RFC 2047 productions. +my $re_token = qr/[^][()<>@,;:\\"\/?.= \000-\037\177-\377]+/; +my $re_encoded_text = qr/[^? \000-\037\177-\377]+/; +my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/; + # Variables we fill in automatically, or via prompting: my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh, $initial_reply_to,$initial_subject,@files, @@ -913,15 +918,20 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; - my $encoding; - s{=\?([^?]+)\?q\?(.*?)\?=}{ - $encoding = $1; - my $e = $2; - $e =~ s/_/ /g; - $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg; - $e; + my $charset; + s{$re_encoded_word}{ + $charset = $1; + my $encoding = $2; + my $text = $3; + if ($encoding eq 'q' || $encoding eq 'Q') { + $text =~ s/_/ /g; + $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi; + $text; + } else { + $&; # other encodings not supported yet + } }eg; - return wantarray ? ($_, $encoding) : $_; + return wantarray ? ($_, $charset) : $_; } sub quote_rfc2047 { @@ -934,10 +944,8 @@ sub quote_rfc2047 { sub is_rfc2047_quoted { my $s = shift; - my $token = qr/[^][()<>@,;:"\/?.= \000-\037\177-\377]+/; - my $encoded_text = qr/[!->@-~]+/; length($s) <= 75 && - $s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o; + $s =~ m/^(?:"[[:ascii:]]*"|$re_encoded_word)$/o; } sub subject_needs_rfc2047_quoting { -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). --- git-send-email.perl | 26 -- t/t9001-send-email.sh | 7 +++ 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index d461ffb..7d5cc8a 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -919,17 +919,23 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; my $charset; - s{$re_encoded_word}{ - $charset = $1; - my $encoding = $2; - my $text = $3; - if ($encoding eq 'q' || $encoding eq 'Q') { - $text =~ s/_/ /g; - $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi; - $text; - } else { - $&; # other encodings not supported yet + my $sep = qr/[ \t]+/; + s{$re_encoded_word(?:$sep$re_encoded_word)*}{ + my @words = split $sep, $&; + foreach (@words) { + m/$re_encoded_word/; + $charset = $1; + my $encoding = $2; + my $text = $3; + if ($encoding eq 'q' || $encoding eq 'Q') { + $_ = $text; + s/_/ /g; + s/=([0-9A-F]{2})/chr(hex($1))/egi; + } else { + # other encodings not supported yet + } } + join '', @words; }eg; return wantarray ? ($_, $charset) : $_; } diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 19a3ced..fa965ff 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is suppressed' " 'non_ascii_self_suppressed' " +# This name is long enough to force format-patch to split it into multiple +# encoded-words, assuming it uses UTF-8 with the "Q" encoding. +test_expect_success $PREREQ 'long non-ascii self name is suppressed' " + test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=m...@example.com' \ + 'long_non_ascii_self_suppressed' +" + test_expect_success $PREREQ 'sanitized self name is suppressed' " test_suppress_self_unquoted '\"A U. Thor\"' 'aut...@example.com' \ 'self_name_sanitized_suppressed' -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] send-email: handle adjacent RFC 2047-encoded words properly
Jeff King писал в своём письме Mon, 24 Nov 2014 18:36:09 +0300: On Mon, Nov 24, 2014 at 02:50:04AM +0300, Роман Донченко wrote: The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). I change the sender's name to an all-Cyrillic string in the tests so that its encoded form goes over the 76 characters in a line limit, forcing format-patch to split it into multiple encoded words. Since I have to modify the regular expression for an encoded word anyway, I take the opportunity to bring it closer to the spec, most notably disallowing embedded spaces and making it case-insensitive (thus allowing the encoding to be specified as both "q" and "Q"). The overall goal makes sense to me. Thanks for working on this. I have a few questions/comments, though. sub unquote_rfc2047 { local ($_) = @_; + + my $et = qr/[!->@-~]+/; # encoded-text from RFC 2047 + my $sep = qr/[ \t]+/; + my $encoded_word = qr/=\?($et)\?q\?($et)\?=/i; The first $et in $encoded_word is actually the charset, which is defined by RFC 2047 as: charset = token; see section 3 token = 1* especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / " <"> / "/" / "[" / "]" / "?" / "." / "=" Your regex is a little more liberal. I doubt that it is a big deal in practice (actually, in practice, I suspect [a-zA-Z0-9-] would be fine). But if we are tightening things up in general, it may make sense to do so here (and I notice that is_rfc2047_quoted does a more thorough $token definition, and it probably makes sense for the two functions to be consistent). Yeah, I did realize that token is more restrictive than encoded-text, but I didn't want to stray too far from the subject line of the patch. What I'll probably do is split the patch into two, one for regex tweaking and one for multiple-word handling. And yeah, I'll try to make the two functions use the same regexes. For your definition of encoded-text, RFC 2047 says: encoded-text = 1* It looks like you pulled the definition of $et from is_rfc2047_quoted, but I am not clear on where that original came from (it is from a3a8262, but that commit message does not explain the regex). No, it's actually an independent discovery. :-) I don't think it needs explanation, though - it's just a character class with two ranges covering every printable character but the question mark. Also, I note that we handle 'q'-style encodings here, but not 'b'. I wonder if it is worth adding that in while we are in the area (it is not a big deal if you always send-email git-generated patches, as we never generate it). I could add "b" decoding, but since format-patch never generates "b" encodings, testing would be a problem. And I'd rather not do it without any tests. + s{$encoded_word(?:$sep$encoded_word)+}{ If I am reading this right, it requires at least two $encoded_words. Should this "+" be a "*"? I hang my head in shame. Looks like I'll have to add more tests... + my @words = split $sep, $&; + foreach (@words) { + m/$encoded_word/; + $encoding = $1; + $_ = $2; + s/_/ /g; + s/=([0-9A-F]{2})/chr(hex($1))/eg; In the spirit of your earlier change, should this final regex be case-insensitive? RFC 2047 says only "Upper case should be used for hexadecimal digits "A" through "F." but that does not seem like a "MUST" to me. Sounds reasonable. Roman. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] send-email: handle adjacent RFC 2047-encoded words properly
Junio C Hamano писал в своём письме Mon, 24 Nov 2014 10:27:51 +0300: On Sun, Nov 23, 2014 at 3:50 PM, Роман Донченко wrote: The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). I change the sender's name to an all-Cyrillic string in the tests so that its encoded form goes over the 76 characters in a line limit, forcing format-patch to split it into multiple encoded words. Since I have to modify the regular expression for an encoded word anyway, I take the opportunity to bring it closer to the spec, most notably disallowing embedded spaces and making it case-insensitive (thus allowing the encoding to be specified as both "q" and "Q"). Signed-off-by: Роман Донченко This sounds like a worthy thing to do in general. I wonder if the C implementation we have for mailinfo needs similar update, though. I vaguely recall that we have case-insensitive start for q/b segments, but do not remember the details offhand. That's what git am uses, right? I think that already works correctly (or at least doesn't have the bug this patch fixes). I didn't do extensive testing or look at the code, though. Was the change to the test to use Cyrillic really necessary, or did it suffice if you simply extended the existsing "Funny Name" spelled with strange accents, but you substituted the whole string anyway? Until I found out what the new string says by running web-based translation on it, I felt somewhat uneasy. As I do not read Cyrillic/Russian, we may have been adding some profanity without knowing. It turns out that the string just says "Cyrillic Name", so I am not against using the new string, but it simply looked odd to replace the string whole-sale when you merely need a longer string. It made it look as if a bug was specific to Cyrillic when it wasn't. Ah, if only I had thought of including profanity beforehand. ;-) Seriously though, I just needed to hit the 76 character limit, and switching the keyboard layout is a lot easier than copypasting Latin letters with diacritics (plus I had trouble coming up with a long enough extension of "Funny Name"...). I can see how that's problematic, though; I'll change it. As you may notice by reading "git log --no-merges" from recent history, we tend not to say "I did X, I did Y". If the tone of the above message were more similar to them, it may have been easier to read. Technically, I said "I do", not "I did"... but sure, point taken. Roman. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] send-email: handle adjacent RFC 2047-encoded words properly
The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). I change the sender's name to an all-Cyrillic string in the tests so that its encoded form goes over the 76 characters in a line limit, forcing format-patch to split it into multiple encoded words. Since I have to modify the regular expression for an encoded word anyway, I take the opportunity to bring it closer to the spec, most notably disallowing embedded spaces and making it case-insensitive (thus allowing the encoding to be specified as both "q" and "Q"). Signed-off-by: Роман Донченко --- git-send-email.perl | 21 +++-- t/t9001-send-email.sh | 18 +- 2 files changed, 24 insertions(+), 15 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index 9949db0..4bb9f6f 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -913,13 +913,22 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; + + my $et = qr/[!->@-~]+/; # encoded-text from RFC 2047 + my $sep = qr/[ \t]+/; + my $encoded_word = qr/=\?($et)\?q\?($et)\?=/i; + my $encoding; - s{=\?([^?]+)\?q\?(.*?)\?=}{ - $encoding = $1; - my $e = $2; - $e =~ s/_/ /g; - $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg; - $e; + s{$encoded_word(?:$sep$encoded_word)+}{ + my @words = split $sep, $&; + foreach (@words) { + m/$encoded_word/; + $encoding = $1; + $_ = $2; + s/_/ /g; + s/=([0-9A-F]{2})/chr(hex($1))/eg; + } + join '', @words; }eg; return wantarray ? ($_, $encoding) : $_; } diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 19a3ced..318b870 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -236,7 +236,7 @@ test_expect_success $PREREQ 'self name with dot is suppressed' " " test_expect_success $PREREQ 'non-ascii self name is suppressed' " - test_suppress_self_quoted 'Füñný Nâmé' 'odd_?=m...@example.com' \ + test_suppress_self_quoted 'Кириллическое Имя' 'odd_?=m...@example.com' \ 'non_ascii_self_suppressed' " @@ -946,25 +946,25 @@ test_expect_success $PREREQ 'utf8 author is correctly passed on' ' clean_fake_sendmail && test_commit weird_author && test_when_finished "git reset --hard HEAD^" && - git commit --amend --author "Füñný Nâmé " && - git format-patch --stdout -1 >funny_name.patch && + git commit --amend --author "Кириллическое Имя " && + git format-patch --stdout -1 >nonascii_name.patch && git send-email --from="Example " \ --to=nob...@example.com \ --smtp-server="$(pwd)/fake.sendmail" \ - funny_name.patch && - grep "^From: Füñný Nâmé " msgtxt1 + nonascii_name.patch && + grep "^From: Кириллическое Имя " msgtxt1 ' test_expect_success $PREREQ 'utf8 sender is not duplicated' ' clean_fake_sendmail && test_commit weird_sender && test_when_finished "git reset --hard HEAD^" && - git commit --amend --author "Füñný Nâmé " && - git format-patch --stdout -1 >funny_name.patch && - git send-email --from="Füñný Nâmé " \ + git commit --amend --author "Кириллическое Имя " && + git format-patch --stdout -1 >nonascii_name.patch && + git send-email --from="Кириллическое Имя " \ --to=nob...@example.com \ --smtp-server="$(pwd)/fake.sendmail" \ - funny_name.patch && + nonascii_name.patch && grep "^From: " msgtxt1 >msgfrom && test_line_count = 1 msgfrom ' -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html