Re: [PATCH] wildmatch: correct isprint and isspace
Am 15.11.2012 13:19, schrieb Nguyễn Thái Ngọc Duy: On Thu, Nov 15, 2012 at 2:30 AM, René Scharfe rene.scha...@lsrfire.ath.cx wrote: Nevertheless, it's unfortunate that we have an isspace() that *almost* does what the widely known thing of the same name does. I'd shy away from changing git's version directly, because it's used more than a hundred times in the code, and estimating the impact of adding \v and \f to it. Perhaps renaming it to isgitspace() is a good first step, followed by adding a standard version of isspace() for wildmatch? There are just too many call sites of isspace() and there is a risk of new call sites coming in independently. So I think keeping isspace() as-is and using a different name for the standard version is probably a better choice. After having a closer look, where wildmatch is actually used -- matching filenames -- and I've not yet seen \v or \f in a filename, it's possibly unnecessary to do anything about isspace() right now. (It's probably more an issue that filenames can be localized, and we only support unlocalized character classes.) diff --git a/git-compat-util.h b/git-compat-util.h index 02f48f6..d4c3fda 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -486,6 +486,7 @@ extern const unsigned char sane_ctype[256]; #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] (mask)) != 0) #define isascii(x) (((x) ~0x7f) == 0) #define isspace(x) sane_istest(x,GIT_SPACE) +#define isspace_posix(x) (((x) = 9 (x) = 13) || (x) == 32) #define isdigit(x) sane_istest(x,GIT_DIGIT) #define isalpha(x) sane_istest(x,GIT_ALPHA) #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT) @@ -499,7 +500,8 @@ extern const unsigned char sane_ctype[256]; #define isxdigit(x) (hexval_table[x] != -1) This was from a previous patch, but maybe: hexval_table[(unsigned char)x] #define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \ GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \ - GIT_PATHSPEC_MAGIC)) + GIT_PATHSPEC_MAGIC) \ + (x) = 32) May I suggest the current is_print() implementation in master: #define isprint(x) ((x) = 0x20 (x) = 0x7e) To summarize my opinion: I no longer see a reason to correct isspace() (unless somebody with an actual use case complains), and a more POSIXly isprint() is already in master. = Nothing to do. :) Regards Jan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nd/wildmatch] Correct Git's version of isprint and isspace
Hi. Am 13.11.2012 11:46, schrieb Nguyễn Thái Ngọc Duy: Git's ispace does not include 11 and 12. Git's isprint includes control space characters (10-13). According to glibc-2.14.1 on C locale on Linux, this is wrong. This patch fixes it. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- I wrote a small C program to compare the result of all is* functions that Git replaces against the libc version. These are the only ones that differ. Which matches what Jan Schönherr commented. ctype.c | 6 +++--- git-compat-util.h | 11 ++- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/ctype.c b/ctype.c index 0bfebb4..71311a3 100644 --- a/ctype.c +++ b/ctype.c @@ -14,11 +14,11 @@ enum { P = GIT_PATHSPEC_MAGIC, /* other non-alnum, except for ] and } */ X = GIT_CNTRL, U = GIT_PUNCT, - Z = GIT_CNTRL | GIT_SPACE + Z = GIT_CNTRL_SPACE }; -const unsigned char sane_ctype[256] = { - X, X, X, X, X, X, X, X, X, Z, Z, X, X, Z, X, X, /* 0.. 15 */ +const unsigned int sane_ctype[256] = { + X, X, X, X, X, X, X, X, X, Z, Z, Z, Z, Z, X, X, /* 0.. 15 */ X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, /* 16.. 31 */ S, P, P, P, R, P, P, P, R, R, G, R, P, P, R, P, /* 32.. 47 */ D, D, D, D, D, D, D, D, D, D, P, P, P, P, P, G, /* 48.. 63 */ An alternative to switching from 1-byte to 4-byte values (don't we have a 2-byte datatype?), would be to free up GIT_CNTRL and simply do: #define iscntrl(x) ((x) 0x20) diff --git a/git-compat-util.h b/git-compat-util.h index 02f48f6..4ed3f94 100644 --- a/git-compat-util.h +++ b/git-compat-util.h [...] @@ -483,9 +483,10 @@ extern const unsigned char sane_ctype[256]; #define GIT_PATHSPEC_MAGIC 0x20 #define GIT_CNTRL 0x40 #define GIT_PUNCT 0x80 -#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] (mask)) != 0) +#define GIT_SPACE 0x100 +#define sane_istest(x,mask) ((sane_ctype[(unsigned int)(x)] (mask)) != 0) That should better be left (unsigned char)? We might access values after the array otherwise. (That said, it wasn't really correct before either, when there really is a possibility that x = 0x100.) Regards Jan PS: It looks like my isprint() version was given precedence over your isprint() version during the merge into next. That should also be sorted out, but I've no idea which one is actually better: two comparisons versus one cache lookup and a bitop... (though my guess is that comparisons are cheaper, but then we should also convert isdigit()...) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/7] Cure some format-patch wrapping and encoding issues
Hi all. [This is the second version of this series. If you still remember the first version, you might want to jump directly to the summary of changes below.] The main point of this series is to teach git to encode my name correctly, see patches 5+6, so that the decoded version is actually my name, so that send-email does not insist on adding a wrong superfluous From: line to the mail body. The other patches more mostly by-products that fix other issues I came across. Patch 1 fixes an old off-by-one error, so that wrapped text may now use all available columns. Patches 2 and 3 make the wrapping of header lines more correct, i. e., neither too early nor too late. Patch 4 does some refactoring, which is too unrelated to be included in one of the later patches. Patch 5 improves RFC 2047 encoding; patch 6 removes an old non-RFC conform workaround. Patch 7 is more an RFC, which seems to be a good idea from my point of view. Indeed, I thought the current implementation is erroneous, until Junio C Hamano pointed out, that this might be desired behavior. Thus, make up your mind about this one. The series is currently based on the maint branch, but it applies to master as well. It does also apply to next, but then my implementation of isprint() has to be dropped from patch 5. Changes in v2: - patch 1 is new and is a result of the v1 discussion - patch 5+6 split the old patch 4 into two patches - use of constants for maximum line lengths - even better adherence to RFC 2047 than v1 - updated commit messages/comments Regards Jan Jan H. Schönherr (7): utf8: fix off-by-one wrapping of text format-patch: do not wrap non-rfc2047 headers too early format-patch: do not wrap rfc2047 encoded headers too late format-patch: introduce helper function last_line_length() format-patch: make rfc2047 encoding more strict format-patch: fix rfc2047 address encoding with respect to rfc822 specials format-patch tests: check quoting/encoding in To: and Cc: headers git-compat-util.h | 2 + pretty.c| 149 +++ t/t4014-format-patch.sh | 231 ++-- t/t4202-log.sh | 4 +- utf8.c | 2 +- 5 Dateien geändert, 262 Zeilen hinzugefügt(+), 126 Zeilen entfernt(-) -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/7] utf8: fix off-by-one wrapping of text
From: Jan H. Schönherr schn...@cs.tu-berlin.de The wrapping logic in strbuf_add_wrapped_text() does currently not allow lines that entirely fill the allowed width, instead it wraps the line one character too early. For example, the text This is the sixth commit. formatted via %w(11,1,2) (wrap at 11 characters, 1 char indent of first line, 2 char indent of following lines) results in four lines: This is, the, sixth, commit. This is wrong, because the sixth is exactly 11 characters long, and thus allowed. Fix this by allowing the (width+1) character of a line to be a valid wrapping point if it is a whitespace character. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- v2: new patch, result of v1 discussion --- t/t4202-log.sh | 4 ++-- utf8.c | 2 +- 2 Dateien geändert, 3 Zeilen hinzugefügt(+), 3 Zeilen entfernt(-) diff --git a/t/t4202-log.sh b/t/t4202-log.sh index b3ac6be..584e3d8 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -72,9 +72,9 @@ cat expect EOF commit. EOF -test_expect_success 'format %w(12,1,2)' ' +test_expect_success 'format %w(11,1,2)' ' - git log -2 --format=%w(12,1,2)This is the %s commit. actual + git log -2 --format=%w(11,1,2)This is the %s commit. actual test_cmp expect actual ' diff --git a/utf8.c b/utf8.c index a544f15..28791a7 100644 --- a/utf8.c +++ b/utf8.c @@ -353,7 +353,7 @@ retry: c = *text; if (!c || isspace(c)) { - if (w width || !space) { + if (w = width || !space) { const char *start = bol; if (!c text == start) return w; -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/7] format-patch: do not wrap non-rfc2047 headers too early
From: Jan H. Schönherr schn...@cs.tu-berlin.de Do not wrap the second and later lines of non-rfc2047-encoded headers substantially before the 78 character limit. Instead of passing the remaining length of the first line as wrapping width, use the correct maximum length and tell strbuf_add_wrapped_bytes() how many characters of the first line are already used. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- v2: - removed off-by-one correction now handled by first patch - commit message clarifications --- pretty.c| 2 +- t/t4014-format-patch.sh | 60 - 2 Dateien geändert, 35 Zeilen hinzugefügt(+), 27 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index 8b1ea9f..71e4024 100644 --- a/pretty.c +++ b/pretty.c @@ -286,7 +286,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, if ((i + 1 len) (ch == '=' line[i+1] == '?')) goto needquote; } - strbuf_add_wrapped_bytes(sb, line, len, 0, 1, max_length - line_len); + strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, max_length); return; needquote: diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 959aa26..d66e358 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -752,16 +752,14 @@ M64=$M8$M8$M8$M8$M8$M8$M8$M8 M512=$M64$M64$M64$M64$M64$M64$M64$M64 cat expect 'EOF' Subject: [PATCH] foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar EOF -test_expect_success 'format-patch wraps extremely long headers (ascii)' ' +test_expect_success 'format-patch wraps extremely long subject (ascii)' ' echo content file git add file git commit -m $M512 @@ -807,28 +805,12 @@ test_expect_success 'format-patch wraps extremely long headers (rfc2047)' ' test_cmp expect subject ' -M8=foo_bar_ -M64=$M8$M8$M8$M8$M8$M8$M8$M8 -cat expect EOF -From: $M64 - foo...@foo.bar -EOF -test_expect_success 'format-patch wraps non-quotable headers' ' - rm -rf patches/ - echo content file - git add file - git commit -mfoo --author $M64 foo...@foo.bar - git format-patch --stdout -1 patch - sed -n /^From: /p; /^ /p; /^$/q patch from - test_cmp expect from -' - check_author() { echo content file git add file GIT_AUTHOR_NAME=$1 git commit -m author-check git format-patch --stdout -1 patch - grep ^From: patch actual + sed -n /^From: /p; /^ /p; /^$/q patch actual test_cmp expect actual } @@ -853,6 +835,32 @@ test_expect_success 'rfc2047-encoded headers also double-quote 822 specials' ' check_author Föo B. Bar ' +cat expect EOF +From: foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_ + aut...@example.com +EOF +test_expect_success 'format-patch wraps moderately long from-header (ascii)' ' + check_author foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_ +' + +cat expect 'EOF' +From: Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar + Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo + Bar Foo Bar Foo Bar Foo Bar aut...@example.com +EOF +test_expect_success 'format-patch wraps extremely long from-header (ascii)' ' + check_author Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar +' + +cat expect 'EOF' +From: Foo.Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar + Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo + Bar Foo Bar Foo Bar Foo Bar aut...@example.com +EOF +test_expect_success 'format-patch wraps extremely long from-header (rfc822)' ' + check_author Foo.Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar +' + cat expect
[PATCH v2 3/7] format-patch: do not wrap rfc2047 encoded headers too late
From: Jan H. Schönherr schn...@cs.tu-berlin.de Encoded characters add more than one character at once to an encoded header. Include all characters that are about to be added in the length calculation for wrapping. Additionally, RFC 2047 imposes a maximum line length of 76 characters if that line contains an rfc2047 encoded word. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- v2: - use constants for both, the 76 and 78 char limit - rephrase comment --- pretty.c| 26 +- t/t4014-format-patch.sh | 58 + 2 Dateien geändert, 51 Zeilen hinzugefügt(+), 33 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index 71e4024..da75879 100644 --- a/pretty.c +++ b/pretty.c @@ -263,6 +263,9 @@ static void add_rfc822_quoted(struct strbuf *out, const char *s, int len) static int is_rfc2047_special(char ch) { + if (ch == ' ' || ch == '\n') + return 1; + return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')); } @@ -270,6 +273,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, const char *encoding) { static const int max_length = 78; /* per rfc2822 */ + static const int max_encoded_length = 76; /* per rfc2047 */ int i; int line_len; @@ -295,23 +299,25 @@ needquote: line_len += strlen(encoding) + 5; /* 5 for =??q? */ for (i = 0; i len; i++) { unsigned ch = line[i] 0xFF; + int is_special = is_rfc2047_special(ch); + + /* +* According to RFC 2047, we could encode the special character +* ' ' (space) with '_' (underscore) for readability. But many +* programs do not understand this and just leave the +* underscore in place. Thus, we do nothing special here, which +* causes ' ' to be encoded as '=20', avoiding this problem. +*/ - if (line_len = max_length - 2) { + if (line_len + 2 + (is_special ? 3 : 1) max_encoded_length) { strbuf_addf(sb, ?=\n =?%s?q?, encoding); line_len = strlen(encoding) + 5 + 1; /* =??q? plus SP */ } - /* -* We encode ' ' using '=20' even though rfc2047 -* allows using '_' for readability. Unfortunately, -* many programs do not understand this and just -* leave the underscore in place. -*/ - if (is_rfc2047_special(ch) || ch == ' ' || ch == '\n') { + if (is_special) { strbuf_addf(sb, =%02X, ch); line_len += 3; - } - else { + } else { strbuf_addch(sb, ch); line_len++; } diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index d66e358..1d5636d 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -772,30 +772,31 @@ M8=föö bar M64=$M8$M8$M8$M8$M8$M8$M8$M8 M512=$M64$M64$M64$M64$M64$M64$M64$M64 cat expect 'EOF' -Subject: [PATCH] =?UTF-8?q?f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar
[PATCH v2 4/7] format-patch: introduce helper function last_line_length()
From: Jan H. Schönherr schn...@cs.tu-berlin.de Currently, an open-coded loop to calculate the length of the last line of a string buffer is used in multiple places. Move that code into a function of its own. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- pretty.c | 25 + 1 Datei geändert, 13 Zeilen hinzugefügt(+), 12 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index da75879..482402d 100644 --- a/pretty.c +++ b/pretty.c @@ -240,6 +240,17 @@ static int has_rfc822_specials(const char *s, int len) return 0; } +static int last_line_length(struct strbuf *sb) +{ + int i; + + /* How many bytes are already used on the last line? */ + for (i = sb-len - 1; i = 0; i--) + if (sb-buf[i] == '\n') + break; + return sb-len - (i + 1); +} + static void add_rfc822_quoted(struct strbuf *out, const char *s, int len) { int i; @@ -275,13 +286,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, static const int max_length = 78; /* per rfc2822 */ static const int max_encoded_length = 76; /* per rfc2047 */ int i; - int line_len; - - /* How many bytes are already used on the current line? */ - for (i = sb-len - 1; i = 0; i--) - if (sb-buf[i] == '\n') - break; - line_len = sb-len - (i+1); + int line_len = last_line_length(sb); for (i = 0; i len; i++) { int ch = line[i]; @@ -346,7 +351,6 @@ void pp_user_info(const struct pretty_print_context *pp, if (pp-fmt == CMIT_FMT_EMAIL) { char *name_tail = strchr(line, ''); int display_name_length; - int final_line; if (!name_tail) return; while (line name_tail isspace(name_tail[-1])) @@ -361,10 +365,7 @@ void pp_user_info(const struct pretty_print_context *pp, add_rfc2047(sb, quoted.buf, quoted.len, encoding); strbuf_release(quoted); } - for (final_line = 0; final_line sb-len; final_line++) - if (sb-buf[sb-len - final_line - 1] == '\n') - break; - if (namelen - display_name_length + final_line 78) { + if (namelen - display_name_length + last_line_length(sb) 78) { strbuf_addch(sb, '\n'); if (!isspace(name_tail[0])) strbuf_addch(sb, ' '); -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/7] format-patch: make rfc2047 encoding more strict
From: Jan H. Schönherr schn...@cs.tu-berlin.de RFC 2047 requires more characters to be encoded than it is currently done. Especially, RFC 2047 distinguishes between allowed remaining characters in encoded words in addresses (From, To, etc.) and other headers, such as Subject. Make add_rfc2047() and is_rfc2047_special() location dependent and include all non-allowed characters to hopefully be RFC 2047 conform. This especially fixes a problem, where RFC 822 specials (e. g. .) were left unencoded in addresses, which was solved with a non-standard-conform workaround in the past (which is going to be removed in a follow-up patch). Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- v2: - part of restructured patch 4 of v1 - disallow even more characters in is_rfc2047_special() The implementation of isprint() should later probably be substituted by the one from Nguyen: http://article.gmane.org/gmane.comp.version-control.git/207666 --- git-compat-util.h | 2 ++ pretty.c| 67 +++-- t/t4014-format-patch.sh | 15 --- 3 Dateien geändert, 72 Zeilen hinzugefügt(+), 12 Zeilen entfernt(-) diff --git a/git-compat-util.h b/git-compat-util.h index 42d..d4ea446 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -475,6 +475,7 @@ extern const char tolower_trans_tbl[256]; #undef isdigit #undef isalpha #undef isalnum +#undef isprint #undef islower #undef isupper #undef tolower @@ -492,6 +493,7 @@ extern unsigned char sane_ctype[256]; #define isdigit(x) sane_istest(x,GIT_DIGIT) #define isalpha(x) sane_istest(x,GIT_ALPHA) #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT) +#define isprint(x) ((x) = 0x20 (x) = 0x7e) #define islower(x) sane_iscase(x, 1) #define isupper(x) sane_iscase(x, 0) #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL) diff --git a/pretty.c b/pretty.c index 482402d..613e4ea 100644 --- a/pretty.c +++ b/pretty.c @@ -272,16 +272,65 @@ static void add_rfc822_quoted(struct strbuf *out, const char *s, int len) strbuf_addch(out, ''); } -static int is_rfc2047_special(char ch) +enum rfc2047_type { + RFC2047_SUBJECT, + RFC2047_ADDRESS, +}; + +static int is_rfc2047_special(char ch, enum rfc2047_type type) { - if (ch == ' ' || ch == '\n') + /* +* rfc2047, section 4.2: +* +*8-bit values which correspond to printable ASCII characters other +*than =, ?, and _ (underscore), MAY be represented as those +*characters. (But see section 5 for restrictions.) In +*particular, SPACE and TAB MUST NOT be represented as themselves +*within encoded words. +*/ + + /* +* rule out non-ASCII characters and non-printable characters (the +* non-ASCII check should be redundant as isprint() is not localized +* and only knows about ASCII, but be defensive about that) +*/ + if (non_ascii(ch) || !isprint(ch)) + return 1; + + /* +* rule out special printable characters (' ' should be the only +* whitespace character considered printable, but be defensive and use +* isspace()) +*/ + if (isspace(ch) || ch == '=' || ch == '?' || ch == '_') return 1; - return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')); + /* +* rfc2047, section 5.3: +* +*As a replacement for a 'word' entity within a 'phrase', for example, +*one that precedes an address in a From, To, or Cc header. The ABNF +*definition for 'phrase' from RFC 822 thus becomes: +* +*phrase = 1*( encoded-word / word ) +* +*In this case the set of characters that may be used in a Q-encoded +*'encoded-word' is restricted to: upper and lower case ASCII +*letters, decimal digits, !, *, +, -, /, =, and _ +*(underscore, ASCII 95.). An 'encoded-word' that appears within a +*'phrase' MUST be separated from any adjacent 'word', 'text' or +*'special' by 'linear-white-space'. +*/ + + if (type != RFC2047_ADDRESS) + return 0; + + /* '=' and '_' are special cases and have been checked above */ + return !(isalnum(ch) || ch == '!' || ch == '*' || ch == '+' || ch == '-' || ch == '/'); } static void add_rfc2047(struct strbuf *sb, const char *line, int len, - const char *encoding) + const char *encoding, enum rfc2047_type type) { static const int max_length = 78; /* per rfc2822 */ static const int max_encoded_length = 76; /* per rfc2047 */ @@ -304,7 +353,7 @@ needquote: line_len += strlen(encoding) + 5; /* 5 for =??q? */ for (i = 0; i len; i++) { unsigned ch = line[i] 0xFF; - int is_special = is_rfc2047_special(ch
[PATCH v2 6/7] format-patch: fix rfc2047 address encoding with respect to rfc822 specials
From: Jan H. Schönherr schn...@cs.tu-berlin.de According to RFC 2047 and RFC 822, rfc2047 encoded words and and rfc822 quoted strings do not mix. Since add_rfc2047() no longer leaves RFC 822 specials behind, the quoting is also no longer necessary to create a standard-conform mail. Remove the quoting, when RFC 2047 encoding takes place. This actually requires to refactor add_rfc2047() a bit, so that the different cases can be distinguished. With this patch, my own name gets correctly decoded as Jan H. Schönherr (without quotes) and not as Jan H. Schönherr (with quotes). Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- v2: - part of restructured patch 4 of v1 - use constants for both, the 76 and 78 char limit - select correct maximum length for possible final folding - removed off-by-one correction now handled by first patch --- pretty.c| 49 - t/t4014-format-patch.sh | 2 +- 2 Dateien geändert, 33 Zeilen hinzugefügt(+), 18 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index 613e4ea..413e758 100644 --- a/pretty.c +++ b/pretty.c @@ -231,7 +231,7 @@ static int is_rfc822_special(char ch) } } -static int has_rfc822_specials(const char *s, int len) +static int needs_rfc822_quoting(const char *s, int len) { int i; for (i = 0; i len; i++) @@ -329,25 +329,29 @@ static int is_rfc2047_special(char ch, enum rfc2047_type type) return !(isalnum(ch) || ch == '!' || ch == '*' || ch == '+' || ch == '-' || ch == '/'); } -static void add_rfc2047(struct strbuf *sb, const char *line, int len, - const char *encoding, enum rfc2047_type type) +static int needs_rfc2047_encoding(const char *line, int len, + enum rfc2047_type type) { - static const int max_length = 78; /* per rfc2822 */ - static const int max_encoded_length = 76; /* per rfc2047 */ int i; - int line_len = last_line_length(sb); for (i = 0; i len; i++) { int ch = line[i]; if (non_ascii(ch) || ch == '\n') - goto needquote; + return 1; if ((i + 1 len) (ch == '=' line[i+1] == '?')) - goto needquote; + return 1; } - strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, max_length); - return; -needquote: + return 0; +} + +static void add_rfc2047(struct strbuf *sb, const char *line, int len, + const char *encoding, enum rfc2047_type type) +{ + static const int max_encoded_length = 76; /* per rfc2047 */ + int i; + int line_len = last_line_length(sb); + strbuf_grow(sb, len * 3 + strlen(encoding) + 100); strbuf_addf(sb, =?%s?q?, encoding); line_len += strlen(encoding) + 5; /* 5 for =??q? */ @@ -383,6 +387,7 @@ void pp_user_info(const struct pretty_print_context *pp, const char *what, struct strbuf *sb, const char *line, const char *encoding) { + int max_length = 78; /* per rfc2822 */ char *date; int namelen; unsigned long time; @@ -406,17 +411,21 @@ void pp_user_info(const struct pretty_print_context *pp, name_tail--; display_name_length = name_tail - line; strbuf_addstr(sb, From: ); - if (!has_rfc822_specials(line, display_name_length)) { + if (needs_rfc2047_encoding(line, display_name_length, RFC2047_ADDRESS)) { add_rfc2047(sb, line, display_name_length, encoding, RFC2047_ADDRESS); - } else { + max_length = 76; /* per rfc2047 */ + } else if (needs_rfc822_quoting(line, display_name_length)) { struct strbuf quoted = STRBUF_INIT; add_rfc822_quoted(quoted, line, display_name_length); - add_rfc2047(sb, quoted.buf, quoted.len, - encoding, RFC2047_ADDRESS); + strbuf_add_wrapped_bytes(sb, quoted.buf, quoted.len, + -6, 1, max_length); strbuf_release(quoted); + } else { + strbuf_add_wrapped_bytes(sb, line, display_name_length, + -6, 1, max_length); } - if (namelen - display_name_length + last_line_length(sb) 78) { + if (namelen - display_name_length + last_line_length(sb) max_length) { strbuf_addch(sb, '\n'); if (!isspace(name_tail[0])) strbuf_addch(sb, ' '); @@ -1336,6 +1345,7 @@ void pp_title_line(const struct pretty_print_context *pp
[PATCH v2 7/7] format-patch tests: check quoting/encoding in To: and Cc: headers
From: Jan H. Schönherr schn...@cs.tu-berlin.de git-format-patch does currently not parse user supplied extra header values (e. g., --cc, --add-header) and just replays them. That forces users to add them RFC 2822/2047 conform in encoded form, e. g. --cc '=?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= ...' which is inconvenient. We would want to update git-format-patch to accept human-readable input --cc 'Jan H. Schönherr ...' and handle the encoding, wrapping and quoting internally in the future, similar to what is already done in git-send-email. The necessary code should mostly exist in the code paths that handle the From: and Subject: headers. Whether we want to do this only for the git-format-patch options --to and --cc (and the corresponding config options) or also for user supplied headers via --add-header, is open for discussion. For now, add test_expect_failure tests for To: and Cc: headers as a reminder and fix tests that would otherwise fail should this get implemented. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- This patch is RFC material. There are a few reasons, why this is a good idea and also a few, why it is bad: Pro: - current git-format-patch behavior differs from git-send-email - we should be able to use the address format that git uses elsewhere (e. g., author and committer info) - necessary code mostly exists Con: - changes current behavior - make code more complex (Feel free to add more.) The first drawback can be mitigated by checking whether the input is already properly encoded, so that we do not accidentally double-encode things. git-send-email does that, but that's written in Perl, so we would need even more code. For now, this is only about _addresses_ supplied to git-format-patch, not _headers_. We could also validate/encode/wrap user supplied headers. RFC 2822/2047 is specific enough to allow that. But there is no point thinking about that without the intention of encoding addresses. v2: - updated commit message as suggested by Junio C Hamano --- t/t4014-format-patch.sh | 98 + 1 Datei geändert, 66 Zeilen hinzugefügt(+), 32 Zeilen entfernt(-) diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index e024eb8..ad9f69e 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -110,73 +110,107 @@ test_expect_success 'replay did not screw up the log message' ' test_expect_success 'extra headers' ' - git config format.headers To: R. E. Cipient rcipi...@example.com + git config format.headers To: R E Cipient rcipi...@example.com - git config --add format.headers Cc: S. E. Cipient scipi...@example.com + git config --add format.headers Cc: S E Cipient scipi...@example.com git format-patch --stdout master..side patch2 sed -e /^\$/q patch2 hdrs2 - grep ^To: R. E. Cipient rcipi...@example.com\$ hdrs2 - grep ^Cc: S. E. Cipient scipi...@example.com\$ hdrs2 + grep ^To: R E Cipient rcipi...@example.com\$ hdrs2 + grep ^Cc: S E Cipient scipi...@example.com\$ hdrs2 ' test_expect_success 'extra headers without newlines' ' - git config --replace-all format.headers To: R. E. Cipient rcipi...@example.com - git config --add format.headers Cc: S. E. Cipient scipi...@example.com + git config --replace-all format.headers To: R E Cipient rcipi...@example.com + git config --add format.headers Cc: S E Cipient scipi...@example.com git format-patch --stdout master..side patch3 sed -e /^\$/q patch3 hdrs3 - grep ^To: R. E. Cipient rcipi...@example.com\$ hdrs3 - grep ^Cc: S. E. Cipient scipi...@example.com\$ hdrs3 + grep ^To: R E Cipient rcipi...@example.com\$ hdrs3 + grep ^Cc: S E Cipient scipi...@example.com\$ hdrs3 ' test_expect_success 'extra headers with multiple To:s' ' - git config --replace-all format.headers To: R. E. Cipient rcipi...@example.com - git config --add format.headers To: S. E. Cipient scipi...@example.com + git config --replace-all format.headers To: R E Cipient rcipi...@example.com + git config --add format.headers To: S E Cipient scipi...@example.com git format-patch --stdout master..side patch4 sed -e /^\$/q patch4 hdrs4 - grep ^To: R. E. Cipient rcipi...@example.com,\$ hdrs4 - grep ^ *S. E. Cipient scipi...@example.com\$ hdrs4 + grep ^To: R E Cipient rcipi...@example.com,\$ hdrs4 + grep ^ *S E Cipient scipi...@example.com\$ hdrs4 ' -test_expect_success 'additional command line cc' ' +test_expect_success 'additional command line cc (ascii)' ' - git config --replace-all format.headers Cc: R. E. Cipient rcipi...@example.com + git config --replace-all format.headers Cc: R E Cipient rcipi...@example.com + git format-patch --cc=S E Cipient scipi...@example.com --stdout master..side | sed -e /^\$/q patch5 + grep
Re: [PATCH v5 02/12] ctype: support iscntrl, ispunct, isxdigit and isprint
Hi Nguyen. I just had a need for isprint() myself, and then I found your code here. I had a look at the POSIX locale as describe here: http://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/POSIX Some remarks below. Am 14.10.2012 16:26, schrieb Nguyen Thai Ngoc Duy: -- 8 -- diff --git a/ctype.c b/ctype.c index faeaf34..0bfebb4 100644 --- a/ctype.c +++ b/ctype.c @@ -11,18 +11,21 @@ enum { D = GIT_DIGIT, G = GIT_GLOB_SPECIAL, /* *, ?, [, \\ */ R = GIT_REGEX_SPECIAL, /* $, (, ), +, ., ^, {, | */ - P = GIT_PATHSPEC_MAGIC /* other non-alnum, except for ] and } */ + P = GIT_PATHSPEC_MAGIC, /* other non-alnum, except for ] and } */ + X = GIT_CNTRL, + U = GIT_PUNCT, + Z = GIT_CNTRL | GIT_SPACE }; const unsigned char sane_ctype[256] = { - 0, 0, 0, 0, 0, 0, 0, 0, 0, S, S, 0, 0, S, 0, 0, /* 0.. 15 */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 16.. 31 */ + X, X, X, X, X, X, X, X, X, Z, Z, X, X, Z, X, X, /* 0.. 15 */ + X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, /* 16.. 31 */ Normal isspace() also includes vertical tab (11) and form-feed (12) as white-space characters. Is there a reason, why they are not included here? S, P, P, P, R, P, P, P, R, R, G, R, P, P, R, P, /* 32.. 47 */ D, D, D, D, D, D, D, D, D, D, P, P, P, P, P, G, /* 48.. 63 */ P, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, /* 64.. 79 */ - A, A, A, A, A, A, A, A, A, A, A, G, G, 0, R, P, /* 80.. 95 */ + A, A, A, A, A, A, A, A, A, A, A, G, G, U, R, P, /* 80.. 95 */ P, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, /* 96..111 */ - A, A, A, A, A, A, A, A, A, A, A, R, R, 0, P, 0, /* 112..127 */ + A, A, A, A, A, A, A, A, A, A, A, R, R, U, P, X, /* 112..127 */ /* Nothing in the 128.. range */ }; diff --git a/git-compat-util.h b/git-compat-util.h index f8b859c..db77f3e 100644 --- a/git-compat-util.h +++ b/git-compat-util.h [...] @@ -527,6 +533,13 @@ extern const unsigned char sane_ctype[256]; #define isupper(x) sane_iscase(x, 0) #define is_glob_special(x) sane_istest(x,GIT_GLOB_SPECIAL) #define is_regex_special(x) sane_istest(x,GIT_GLOB_SPECIAL | GIT_REGEX_SPECIAL) +#define iscntrl(x) (sane_istest(x,GIT_CNTRL)) +#define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \ + GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC) +#define isxdigit(x) (hexval_table[x] != -1) +#define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \ + GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \ + GIT_PATHSPEC_MAGIC)) Normal isprint() only includes space (32) from the white-space characters. The other white-space characters are not considered printable. Do we want to stay close to the original, or not? Regards Jan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] format-patch: do not wrap rfc2047 encoded headers too late
Am 09.10.2012 21:30, schrieb Junio C Hamano: Jan H. Schönherr schn...@cs.tu-berlin.de writes: ... static int is_rfc2047_special(char ch) { +/* + * We encode ' ' using '=20' even though rfc2047 + * allows using '_' for readability. Unfortunately, + * many programs do not understand this and just + * leave the underscore in place. + */ The sentence break made me read the above three times to understand what it is trying to say. Unfortunately refers to what happens if we were to use '_', but it initially appeared to be describing some bug due to our encoding ' ' as '=20'. Perhaps like this? /* * rfc2047 allows '_' to encode ' ' for readability, but * many programs do not understand ...; encode ' ' using * '=20' instead to avoid the problem. */ I was just moving that comment (and the following check) around, but I'll update the comment in the next version. +if (ch == ' ' || ch == '\n') +return 1; The comment justifies why this if (ch == ' '), which could be part of the return below, separately is done, but nothing explains why you add '\n' (and not other controls, e.g. '\t') to the mix. The check for '\n' was introduced in commit c22e7de3 (format-patch: rfc2047-encode newlines in headers). The commit log was: These should generally never happen, as we already concatenate multiples in subjects into a single line. But let's be defensive, since not encoding them means we will output malformed headers. Having again a look at RFC 2047, I see that we should be even more strict and not allow any non-printable character to be passed through unencoded. I guess that adds another patch to the series. Hmm... Maybe I can split patch 4 into two patches, one that mostly fixes is_rfc2047_special() and one that avoids 822 quoting when doing 2047 encoding. return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')); } static void add_rfc2047(struct strbuf *sb, const char *line, int len, const char *encoding) { -static const int max_length = 78; /* per rfc2822 */ +static const int max_length = 76; /* per rfc2047 */ int i; int line_len; @@ -286,7 +295,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, if ((i + 1 len) (ch == '=' line[i+1] == '?')) goto needquote; } -strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, max_length+1); +strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, 78+1); return; Yuck. If you do want to retain 78 for non-quoted output for backward compatibility, that is OK, but if that is the case, please introduce a new constant max_quoted_length or something to stand for 76 and use it in the needquote: part below. Will do. Regards Jan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Cure some format-patch wrapping and encoding issues
Am 09.10.2012 21:07, schrieb Junio C Hamano: Jan H. Schönherr schn...@cs.tu-berlin.de writes: During the creation of this series, I came across the strbuf wrapping functions, and I wonder if there is an off-by-one issue. Consider the following excerpt from t4202: ... Yeah, that does sound like an off-by-one bug. When we as end users say %w(72), we do expect some lines fill to the 72nd column, not stopping at the 71st. I suspect that dates back to the very first implementation of %w() but I think we should fix it (perhaps as a separate patch either the earliest or the last in the series). I will include a fix for that, then. (But I won't be able the send out the next round of this series before next week.) Regards Jan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] format-patch: fix rfc2047 address encoding with respect to rfc822 specials
From: Jan H. Schönherr schn...@cs.tu-berlin.de According to RFC 2047 and RFC 822, rfc2047 encoded words and and rfc822 quoted strings do not mix. Be more strict about rfc2047 encoded words in addresses, so that it is a bit more conform to RFC 2047. (Especially, my own name gets correctly decoded as Jan H. Schönherr (without quotes) and not as Jan H. Schönherr (with quotes).) Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- pretty.c| 80 ++--- t/t4014-format-patch.sh | 11 +-- 2 Dateien geändert, 71 Zeilen hinzugefügt(+), 20 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index ee76219..f3a7383 100644 --- a/pretty.c +++ b/pretty.c @@ -231,7 +231,7 @@ static int is_rfc822_special(char ch) } } -static int has_rfc822_specials(const char *s, int len) +static int needs_rfc822_quoting(const char *s, int len) { int i; for (i = 0; i len; i++) @@ -272,7 +272,12 @@ static void add_rfc822_quoted(struct strbuf *out, const char *s, int len) strbuf_addch(out, ''); } -static int is_rfc2047_special(char ch) +enum rfc2047_type { + RFC2047_SUBJECT, + RFC2047_ADDRESS, +}; + +static int is_rfc2047_special(char ch, enum rfc2047_type type) { /* * We encode ' ' using '=20' even though rfc2047 @@ -283,33 +288,62 @@ static int is_rfc2047_special(char ch) if (ch == ' ' || ch == '\n') return 1; - return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')); + if (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')) + return 1; + + if (type != RFC2047_ADDRESS) + return 0; + + /* +* rfc2047, section 5.3: +* +*As a replacement for a 'word' entity within a 'phrase', for example, +*one that precedes an address in a From, To, or Cc header. The ABNF +*definition for 'phrase' from RFC 822 thus becomes: +* +*phrase = 1*( encoded-word / word ) +* +*In this case the set of characters that may be used in a Q-encoded +*'encoded-word' is restricted to: upper and lower case ASCII +*letters, decimal digits, !, *, +, -, /, =, and _ +*(underscore, ASCII 95.). An 'encoded-word' that appears within a +*'phrase' MUST be separated from any adjacent 'word', 'text' or +*'special' by 'linear-white-space'. +*/ + + /* '=' and '_' are special cases and have been checked above */ + return !(isalnum(ch) || ch == '!' || ch == '*' || ch == '+' || ch == '-' || ch == '/'); } -static void add_rfc2047(struct strbuf *sb, const char *line, int len, - const char *encoding) +static int needs_rfc2047_encoding(const char *line, int len, + enum rfc2047_type type) { - static const int max_length = 76; /* per rfc2047 */ int i; - int line_len = last_line_length(sb); for (i = 0; i len; i++) { int ch = line[i]; if (non_ascii(ch) || ch == '\n') - goto needquote; + return 1; if ((i + 1 len) (ch == '=' line[i+1] == '?')) - goto needquote; + return 1; } - strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, 78+1); - return; -needquote: + return 0; +} + +static void add_rfc2047(struct strbuf *sb, const char *line, int len, + const char *encoding, enum rfc2047_type type) +{ + static const int max_length = 76; /* per rfc2047 */ + int i; + int line_len = last_line_length(sb); + strbuf_grow(sb, len * 3 + strlen(encoding) + 100); strbuf_addf(sb, =?%s?q?, encoding); line_len += strlen(encoding) + 5; /* 5 for =??q? */ for (i = 0; i len; i++) { unsigned ch = line[i] 0xFF; - int is_special = is_rfc2047_special(ch); + int is_special = is_rfc2047_special(ch, type); if (line_len + 2 + (is_special ? 3 : 1) max_length) { strbuf_addf(sb, ?=\n =?%s?q?, encoding); @@ -355,13 +389,18 @@ void pp_user_info(const struct pretty_print_context *pp, name_tail--; display_name_length = name_tail - line; strbuf_addstr(sb, From: ); - if (!has_rfc822_specials(line, display_name_length)) { - add_rfc2047(sb, line, display_name_length, encoding); - } else { + if (needs_rfc2047_encoding(line, display_name_length, RFC2047_ADDRESS)) { + add_rfc2047(sb, line, display_name_length, + encoding, RFC2047_ADDRESS); + } else if (needs_rfc822_quoting(line, display_name_length
[PATCH 5/5] format-patch: tests: check rfc822+rfc2047 in to+cc headers
From: Jan H. Schönherr schn...@cs.tu-berlin.de Do some checks for RFC 822 and RFC 2047 support in To: and Cc: headers and fix ambiguous old checks. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- t/t4014-format-patch.sh | 98 + 1 Datei geändert, 66 Zeilen hinzugefügt(+), 32 Zeilen entfernt(-) diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 1a3b6e8..65ab4c9 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -110,73 +110,107 @@ test_expect_success 'replay did not screw up the log message' ' test_expect_success 'extra headers' ' - git config format.headers To: R. E. Cipient rcipi...@example.com + git config format.headers To: R E Cipient rcipi...@example.com - git config --add format.headers Cc: S. E. Cipient scipi...@example.com + git config --add format.headers Cc: S E Cipient scipi...@example.com git format-patch --stdout master..side patch2 sed -e /^\$/q patch2 hdrs2 - grep ^To: R. E. Cipient rcipi...@example.com\$ hdrs2 - grep ^Cc: S. E. Cipient scipi...@example.com\$ hdrs2 + grep ^To: R E Cipient rcipi...@example.com\$ hdrs2 + grep ^Cc: S E Cipient scipi...@example.com\$ hdrs2 ' test_expect_success 'extra headers without newlines' ' - git config --replace-all format.headers To: R. E. Cipient rcipi...@example.com - git config --add format.headers Cc: S. E. Cipient scipi...@example.com + git config --replace-all format.headers To: R E Cipient rcipi...@example.com + git config --add format.headers Cc: S E Cipient scipi...@example.com git format-patch --stdout master..side patch3 sed -e /^\$/q patch3 hdrs3 - grep ^To: R. E. Cipient rcipi...@example.com\$ hdrs3 - grep ^Cc: S. E. Cipient scipi...@example.com\$ hdrs3 + grep ^To: R E Cipient rcipi...@example.com\$ hdrs3 + grep ^Cc: S E Cipient scipi...@example.com\$ hdrs3 ' test_expect_success 'extra headers with multiple To:s' ' - git config --replace-all format.headers To: R. E. Cipient rcipi...@example.com - git config --add format.headers To: S. E. Cipient scipi...@example.com + git config --replace-all format.headers To: R E Cipient rcipi...@example.com + git config --add format.headers To: S E Cipient scipi...@example.com git format-patch --stdout master..side patch4 sed -e /^\$/q patch4 hdrs4 - grep ^To: R. E. Cipient rcipi...@example.com,\$ hdrs4 - grep ^ *S. E. Cipient scipi...@example.com\$ hdrs4 + grep ^To: R E Cipient rcipi...@example.com,\$ hdrs4 + grep ^ *S E Cipient scipi...@example.com\$ hdrs4 ' -test_expect_success 'additional command line cc' ' +test_expect_success 'additional command line cc (ascii)' ' - git config --replace-all format.headers Cc: R. E. Cipient rcipi...@example.com + git config --replace-all format.headers Cc: R E Cipient rcipi...@example.com + git format-patch --cc=S E Cipient scipi...@example.com --stdout master..side | sed -e /^\$/q patch5 + grep ^Cc: R E Cipient rcipi...@example.com,\$ patch5 + grep ^ *S E Cipient scipi...@example.com\$ patch5 +' + +test_expect_failure 'additional command line cc (rfc822)' ' + + git config --replace-all format.headers Cc: R E Cipient rcipi...@example.com git format-patch --cc=S. E. Cipient scipi...@example.com --stdout master..side | sed -e /^\$/q patch5 - grep ^Cc: R. E. Cipient rcipi...@example.com,\$ patch5 - grep ^ *S. E. Cipient scipi...@example.com\$ patch5 + grep ^Cc: R E Cipient rcipi...@example.com,\$ patch5 + grep ^ *S. E. Cipient scipi...@example.com\$ patch5 ' test_expect_success 'command line headers' ' git config --unset-all format.headers - git format-patch --add-header=Cc: R. E. Cipient rcipi...@example.com --stdout master..side | sed -e /^\$/q patch6 - grep ^Cc: R. E. Cipient rcipi...@example.com\$ patch6 + git format-patch --add-header=Cc: R E Cipient rcipi...@example.com --stdout master..side | sed -e /^\$/q patch6 + grep ^Cc: R E Cipient rcipi...@example.com\$ patch6 ' test_expect_success 'configuration headers and command line headers' ' - git config --replace-all format.headers Cc: R. E. Cipient rcipi...@example.com - git format-patch --add-header=Cc: S. E. Cipient scipi...@example.com --stdout master..side | sed -e /^\$/q patch7 - grep ^Cc: R. E. Cipient rcipi...@example.com,\$ patch7 - grep ^ *S. E. Cipient scipi...@example.com\$ patch7 + git config --replace-all format.headers Cc: R E Cipient rcipi...@example.com + git format-patch --add-header=Cc: S E Cipient scipi...@example.com --stdout master..side | sed -e /^\$/q patch7 + grep ^Cc: R E Cipient rcipi...@example.com,\$ patch7 + grep ^ *S E Cipient scipi...@example.com
[PATCH 0/5] Cure some format-patch wrapping and encoding issues
Hi all. The main point of this series is to teach git to encode my name correctly, see patch 4, so that the decoded version is actually my name, so that send-email does not insist on adding a wrong superfluous From: line to the mail body. But as always, you notice some other things going wrong. Here, patches 1 and 2 make the wrapping of header lines more correct, i. e., neither too early nor too late. Patch 3 does some refactoring, which is too unrelated to be included in patch 4 itself. Patch 5 points out further problems, but leaves the actual fixing to someone else. The series is currently based on the maint branch, but it applies to the others as well. During the creation of this series, I came across the strbuf wrapping functions, and I wonder if there is an off-by-one issue. Consider the following excerpt from t4202: cat expect EOF This is the sixth commit. This is the fifth commit. EOF test_expect_success 'format %w(12,1,2)' ' git log -2 --format=%w(12,1,2)This is the %s commit. actual test_cmp expect actual ' So this sets a maximum width of 12 characters. Is that 12 character limit supposed to include the final newline, or not? Because the test above and my series are only correct if the final newline is included, i. e., at most eleven visible characters. If this should mean at most 12 visible characters instead, then the output should look like this: This is the sixth commit. This is the fifth commit. (In that case, I would repost an updated version of this series.) Regards Jan Jan H. Schönherr (5): format-patch: do not wrap non-rfc2047 headers too early format-patch: do not wrap rfc2047 encoded headers too late format-patch: introduce helper function last_line_length() format-patch: fix rfc2047 address encoding with respect to rfc822 specials format-patch: tests: check rfc822+rfc2047 in to+cc headers pretty.c| 121 ++ t/t4014-format-patch.sh | 227 ++-- 2 Dateien geändert, 229 Zeilen hinzugefügt(+), 119 Zeilen entfernt(-) -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] format-patch: do not wrap non-rfc2047 headers too early
From: Jan H. Schönherr schn...@cs.tu-berlin.de Do not wrap the second and later lines of an ASCII header substantially before the 78 character limit. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- pretty.c| 2 +- t/t4014-format-patch.sh | 60 - 2 Dateien geändert, 35 Zeilen hinzugefügt(+), 27 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index 8b1ea9f..f5caecb 100644 --- a/pretty.c +++ b/pretty.c @@ -286,7 +286,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, if ((i + 1 len) (ch == '=' line[i+1] == '?')) goto needquote; } - strbuf_add_wrapped_bytes(sb, line, len, 0, 1, max_length - line_len); + strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, max_length+1); return; needquote: diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 959aa26..d66e358 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -752,16 +752,14 @@ M64=$M8$M8$M8$M8$M8$M8$M8$M8 M512=$M64$M64$M64$M64$M64$M64$M64$M64 cat expect 'EOF' Subject: [PATCH] foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo - bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar - foo bar foo bar foo bar foo bar + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar EOF -test_expect_success 'format-patch wraps extremely long headers (ascii)' ' +test_expect_success 'format-patch wraps extremely long subject (ascii)' ' echo content file git add file git commit -m $M512 @@ -807,28 +805,12 @@ test_expect_success 'format-patch wraps extremely long headers (rfc2047)' ' test_cmp expect subject ' -M8=foo_bar_ -M64=$M8$M8$M8$M8$M8$M8$M8$M8 -cat expect EOF -From: $M64 - foo...@foo.bar -EOF -test_expect_success 'format-patch wraps non-quotable headers' ' - rm -rf patches/ - echo content file - git add file - git commit -mfoo --author $M64 foo...@foo.bar - git format-patch --stdout -1 patch - sed -n /^From: /p; /^ /p; /^$/q patch from - test_cmp expect from -' - check_author() { echo content file git add file GIT_AUTHOR_NAME=$1 git commit -m author-check git format-patch --stdout -1 patch - grep ^From: patch actual + sed -n /^From: /p; /^ /p; /^$/q patch actual test_cmp expect actual } @@ -853,6 +835,32 @@ test_expect_success 'rfc2047-encoded headers also double-quote 822 specials' ' check_author Föo B. Bar ' +cat expect EOF +From: foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_ + aut...@example.com +EOF +test_expect_success 'format-patch wraps moderately long from-header (ascii)' ' + check_author foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_foo_bar_ +' + +cat expect 'EOF' +From: Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar + Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo + Bar Foo Bar Foo Bar Foo Bar aut...@example.com +EOF +test_expect_success 'format-patch wraps extremely long from-header (ascii)' ' + check_author Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar +' + +cat expect 'EOF' +From: Foo.Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar + Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo + Bar Foo Bar Foo Bar Foo Bar aut...@example.com +EOF +test_expect_success 'format-patch wraps extremely long from-header (rfc822)' ' + check_author Foo.Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar +' + cat expect 'EOF' Subject: header with . in it EOF -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] format-patch: introduce helper function last_line_length()
From: Jan H. Schönherr schn...@cs.tu-berlin.de Currently, an open-coded loop to calculate the length of the last line of a string buffer is used in multiple places. Move that code into a function of its own. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- pretty.c | 25 + 1 Datei geändert, 13 Zeilen hinzugefügt(+), 12 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index daf8581..ee76219 100644 --- a/pretty.c +++ b/pretty.c @@ -240,6 +240,17 @@ static int has_rfc822_specials(const char *s, int len) return 0; } +static int last_line_length(struct strbuf *sb) +{ + int i; + + /* How many bytes are already used on the last line? */ + for (i = sb-len - 1; i = 0; i--) + if (sb-buf[i] == '\n') + break; + return sb-len - (i + 1); +} + static void add_rfc822_quoted(struct strbuf *out, const char *s, int len) { int i; @@ -280,13 +291,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, { static const int max_length = 76; /* per rfc2047 */ int i; - int line_len; - - /* How many bytes are already used on the current line? */ - for (i = sb-len - 1; i = 0; i--) - if (sb-buf[i] == '\n') - break; - line_len = sb-len - (i+1); + int line_len = last_line_length(sb); for (i = 0; i len; i++) { int ch = line[i]; @@ -344,7 +349,6 @@ void pp_user_info(const struct pretty_print_context *pp, if (pp-fmt == CMIT_FMT_EMAIL) { char *name_tail = strchr(line, ''); int display_name_length; - int final_line; if (!name_tail) return; while (line name_tail isspace(name_tail[-1])) @@ -359,10 +363,7 @@ void pp_user_info(const struct pretty_print_context *pp, add_rfc2047(sb, quoted.buf, quoted.len, encoding); strbuf_release(quoted); } - for (final_line = 0; final_line sb-len; final_line++) - if (sb-buf[sb-len - final_line - 1] == '\n') - break; - if (namelen - display_name_length + final_line 78) { + if (namelen - display_name_length + last_line_length(sb) 78) { strbuf_addch(sb, '\n'); if (!isspace(name_tail[0])) strbuf_addch(sb, ' '); -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] format-patch: do not wrap rfc2047 encoded headers too late
From: Jan H. Schönherr schn...@cs.tu-berlin.de Encoded characters add more than one character at once to an encoded header. Include all characters that are about to be added in the length calculation for wrapping. Additionally, RFC 2047 imposes a maximum line length of 76 characters if that line contains an rfc2047 encoded word. Signed-off-by: Jan H. Schönherr schn...@cs.tu-berlin.de --- pretty.c| 24 +++- t/t4014-format-patch.sh | 58 + 2 Dateien geändert, 49 Zeilen hinzugefügt(+), 33 Zeilen entfernt(-) diff --git a/pretty.c b/pretty.c index f5caecb..daf8581 100644 --- a/pretty.c +++ b/pretty.c @@ -263,13 +263,22 @@ static void add_rfc822_quoted(struct strbuf *out, const char *s, int len) static int is_rfc2047_special(char ch) { + /* +* We encode ' ' using '=20' even though rfc2047 +* allows using '_' for readability. Unfortunately, +* many programs do not understand this and just +* leave the underscore in place. +*/ + if (ch == ' ' || ch == '\n') + return 1; + return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')); } static void add_rfc2047(struct strbuf *sb, const char *line, int len, const char *encoding) { - static const int max_length = 78; /* per rfc2822 */ + static const int max_length = 76; /* per rfc2047 */ int i; int line_len; @@ -286,7 +295,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, if ((i + 1 len) (ch == '=' line[i+1] == '?')) goto needquote; } - strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, max_length+1); + strbuf_add_wrapped_bytes(sb, line, len, -line_len, 1, 78+1); return; needquote: @@ -295,19 +304,14 @@ needquote: line_len += strlen(encoding) + 5; /* 5 for =??q? */ for (i = 0; i len; i++) { unsigned ch = line[i] 0xFF; + int is_special = is_rfc2047_special(ch); - if (line_len = max_length - 2) { + if (line_len + 2 + (is_special ? 3 : 1) max_length) { strbuf_addf(sb, ?=\n =?%s?q?, encoding); line_len = strlen(encoding) + 5 + 1; /* =??q? plus SP */ } - /* -* We encode ' ' using '=20' even though rfc2047 -* allows using '_' for readability. Unfortunately, -* many programs do not understand this and just -* leave the underscore in place. -*/ - if (is_rfc2047_special(ch) || ch == ' ' || ch == '\n') { + if (is_special) { strbuf_addf(sb, =%02X, ch); line_len += 3; } diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index d66e358..1d5636d 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -772,30 +772,31 @@ M8=föö bar M64=$M8$M8$M8$M8$M8$M8$M8$M8 M512=$M64$M64$M64$M64$M64$M64$M64$M64 cat expect 'EOF' -Subject: [PATCH] =?UTF-8?q?f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= - =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar?= +Subject: [PATCH] =?UTF-8?q?f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f