[PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec
More specifically: * Add \ to the list of characters not allowed in a token (see RFC 2047 errata). * Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides removing duplication, this also makes unquote_rfc2047 more stringent. * Allow both q and Q to identify the encoding. * Allow lowercase hexadecimal digits in the Q encoding. And, more on the cosmetic side: * Change the encoded-text regex to exclude rather than include characters, for clarity and consistency with token. Signed-off-by: Роман Донченко d...@corrigendum.ru Acked-by: Jeff King p...@peff.net --- git-send-email.perl | 30 +++--- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index 9949db0..d461ffb 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 }; my $smtp; my $auth; +# Regexes for RFC 2047 productions. +my $re_token = qr/[^][()@,;:\\\/?.= \000-\037\177-\377]+/; +my $re_encoded_text = qr/[^? \000-\037\177-\377]+/; +my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/; + # Variables we fill in automatically, or via prompting: my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh, $initial_reply_to,$initial_subject,@files, @@ -913,15 +918,20 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; - my $encoding; - s{=\?([^?]+)\?q\?(.*?)\?=}{ - $encoding = $1; - my $e = $2; - $e =~ s/_/ /g; - $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg; - $e; + my $charset; + s{$re_encoded_word}{ + $charset = $1; + my $encoding = $2; + my $text = $3; + if ($encoding eq 'q' || $encoding eq 'Q') { + $text =~ s/_/ /g; + $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi; + $text; + } else { + $; # other encodings not supported yet + } }eg; - return wantarray ? ($_, $encoding) : $_; + return wantarray ? ($_, $charset) : $_; } sub quote_rfc2047 { @@ -934,10 +944,8 @@ sub quote_rfc2047 { sub is_rfc2047_quoted { my $s = shift; - my $token = qr/[^][()@,;:\/?.= \000-\037\177-\377]+/; - my $encoded_text = qr/[!-@-~]+/; length($s) = 75 - $s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o; + $s =~ m/^(?:[[:ascii:]]*|$re_encoded_word)$/o; } sub subject_needs_rfc2047_quoting { -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
The RFC says that they are to be concatenated after decoding (i.e. the intervening whitespace is ignored). Signed-off-by: Роман Донченко d...@corrigendum.ru Acked-by: Jeff King p...@peff.net --- git-send-email.perl | 26 -- t/t9001-send-email.sh | 7 +++ 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index d461ffb..7d5cc8a 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -919,17 +919,23 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; my $charset; - s{$re_encoded_word}{ - $charset = $1; - my $encoding = $2; - my $text = $3; - if ($encoding eq 'q' || $encoding eq 'Q') { - $text =~ s/_/ /g; - $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi; - $text; - } else { - $; # other encodings not supported yet + my $sep = qr/[ \t]+/; + s{$re_encoded_word(?:$sep$re_encoded_word)*}{ + my @words = split $sep, $; + foreach (@words) { + m/$re_encoded_word/; + $charset = $1; + my $encoding = $2; + my $text = $3; + if ($encoding eq 'q' || $encoding eq 'Q') { + $_ = $text; + s/_/ /g; + s/=([0-9A-F]{2})/chr(hex($1))/egi; + } else { + # other encodings not supported yet + } } + join '', @words; }eg; return wantarray ? ($_, $charset) : $_; } diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 19a3ced..fa965ff 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is suppressed' 'non_ascii_self_suppressed' +# This name is long enough to force format-patch to split it into multiple +# encoded-words, assuming it uses UTF-8 with the Q encoding. +test_expect_success $PREREQ 'long non-ascii self name is suppressed' + test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=m...@example.com' \ + 'long_non_ascii_self_suppressed' + + test_expect_success $PREREQ 'sanitized self name is suppressed' test_suppress_self_unquoted '\A U. Thor\' 'aut...@example.com' \ 'self_name_sanitized_suppressed' -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] strbuf.h: format asciidoc code blocks as 4-space indent
On 12/12/2014 11:39 PM, Jonathan Nieder wrote: Jeff King wrote: This is much easier to read when the whole thing is stuffed inside a comment block. And there is precedent for this convention in markdown (and just in general ascii text). Signed-off-by: Jeff King p...@peff.net --- Reviewed-by: Jonathan Nieder jrnie...@gmail.com As a side note, I actually find markdown much more pleasant to read and write than asciidoc. I do, too. Quoting in asciidoc is a nightmare. Peff, thanks for working on this. I think it is a definite improvement. I suggest that we accept the use of asciidoc/markdown's convention of using backwards quotes to mark code snippets (especially identifier names) within comments *anywhere* in our code base. For example, this appears in refs.c: /* * Create a struct ref_entry object for the specified dirname. * dirname is the name of the directory with a trailing slash * (e.g., refs/heads/) or for the top-level directory. */ I claim that it is more readable with a tiny bit of markup: /* * Create a `struct ref_entry` object for the specified `dirname`. * `dirname` is the name of the directory with a trailing slash * (e.g., refs/heads/) or for the top-level directory. */ Marking up `struct ref_entry` helps make it clear that the two words belong together, and marking up `dirname` makes it clear that we are talking about a specific identifier (in this case, a function parameter). Currently, comments use a mix of unadorned text, single-quoted text, and double-quoted text when talking about code. I think the asciidoc/markdown convention is clearer [1]. I think we shouldn't be pedantic about this. When a comment is readable with no markup, there's no need to add markup. And incorrect markup shouldn't by itself be reason to reject a patch. But in many examples, a little bit of markup makes the text less ambiguous and easier to read. Michael [1] Yes, I see the irony in trying to improve a mixture of three conventions by adding a fourth one. -- Michael Haggerty mhag...@alum.mit.edu -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [wish] Revert changes in git gui
Hi Bert, yes and no. I couldn't find a message from you requesting a merge of this patch. Maybe it's me, I am not familiar with the way it works for Git. @Git developers: Do you consider merging Bert's patch? If not, what's the reason? Bye Christoph On 12.12.2014 at 10:28, Bert Wesarg wrote: On Fri, Dec 12, 2014 at 9:27 AM, Christoph Grüninger f...@grueninger.de wrote: Hi Bert, your commit is more than half a year old. Do you intent to include that into Git master? If not, what's the reason? Thats a really odd question to a person who posted this patch to the mailling list the fist place, isn't it? If anything you should have asked the git gui developers and community, why they didn't show interest to have this in master, right? Bert -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [wish] Revert changes in git gui
From: Christoph Grüninger f...@grueninger.de Hi Bert, yes and no. I couldn't find a message from you requesting a merge of this patch. Maybe it's me, I am not familiar with the way it works for Git. @Git developers: Do you consider merging Bert's patch? If not, what's the reason? Bye Christoph On 12.12.2014 at 10:28, Bert Wesarg wrote: On Fri, Dec 12, 2014 at 9:27 AM, Christoph Grüninger f...@grueninger.de wrote: Hi Bert, your commit is more than half a year old. Do you intent to include that into Git master? If not, what's the reason? Thats a really odd question to a person who posted this patch to the mailling list the fist place, isn't it? If anything you should have asked the git gui developers and community, why they didn't show interest to have this in master, right? Hi, Git gui isn't maintained by Junio himself : http://git-blame.blogspot.co.uk/2011/04/note-from-maintainer.html quote Although the following are included in git.git repository, they have their own authoritative repository and maintainers: a.. git-gui/ comes from git-gui project, maintained by Pat Thoyts: git://repo.or.cz/git-gui.git /quote Perhaps copy the original patch to Pat, with the justifications, reviews and Acks, to see if it's acceptable. -- Philip -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] Making reflog modifications part of the transactions API
On 12/12/2014 10:16 PM, ronnie sahlberg wrote: On Fri, Dec 12, 2014 at 11:17 AM, Michael Haggerty mhag...@alum.mit.edu wrote: [...] What am I missing? My original idea was to clean up a bit of the reflog handling API and have one single transaction API for all ref and reflog operations. Think future use cases where you have a database backend for both refs and reflogs. It would be very nice to have a single atomic transaction that would either commit or fail atomically any update to refs and/or reflogs. Otherwise you would have all consumers of the API have to invent their own transaction and rewind support : 'oh the ref transaction failed and I have already done the reflog commit, have to manually uncommit And this quickly becomes quite burdensome for consumers. I think a transaction API should remove this burden from consumers and make it as easy as possible to use the API. Conditional of if it is desireable to have transactions for reflogs at all. Nobody is against ACID. But the API to trigger an ACID update doesn't always have to look like ref_transaction_begin() ref_transaction_update_XXX() /* ... */ ref_transaction_commit() The reflog_expire() function that I wrote does everything within an internal transaction that the caller doesn't have to know about. Similarly, the reflog update that happens when a reference is updated also occurs within a transaction, even without the caller having to ask for the reflog update explicitly. About the cleanup part. The current API, and I think also the current direction of both my old patches (which I think did not go far enough in the transactifications) or this current patchseries is that they all have a very confusing and inconsistent API for reflog updates. With this I mean, sometimes reflog updates happen within a transaction as a side effect of a ref_transaction_update(). Other times reflog updates happens ooutside of transactions by calling a special reflog API. I.e. reflogs sometimes update as part of a transaction and sometimes not. A follow up question then on this API is what should happen if you have a transaction open, but not committed, and while the transaction is open you call the non-transactional reflog API for a reflog for the same ref that is already beeing/or going to be/ updated as the ref-update-side-effect ? The same thing happens as when two independent processes try to update the same reference simultaneously: one fails. But there is currently no need for any command that would do such a thing. I think an api where sometimes you operate on foo from within a transaction and sometimes you do not, and if you do the latter when a transaction is open, it is unclear what should happen is not great. IMHO, refs and reflog updates are related and I think: * a transaction should be the ONLY way to mutate either a ref or a reflog or both. * if you update both a ref and a reflog then that should happen as part of one single transaction. * (later) it would probably make the API better if the code was refactored so write_ref_sha1() will NOT call log_ref_write() anymore and instead make the reflog update that happens explicit perhaps by calling something like a ref_transaction_update_reflog() as part of the ref_transaction_update() call. I disagree. The reflog should be updated *whenever* a reference is updated (for references that have reflogs enabled). So why should callers have to remember to trigger the reflog update as an extra step? It's better that the reflog update is an intrinsic part of updating the reference; that way nobody can forget it. Michael -- Michael Haggerty mhag...@alum.mit.edu -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 4/4] attr: avoid heavy work when we know the specified attr is not defined
On Tue, Dec 09, 2014 at 04:18:57PM -0800, Junio C Hamano wrote: Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: +static void collect_selected_attrs(const char *path, int num, + struct git_attr_check *check) +{ + struct attr_stack *stk; + int i, pathlen, rem, dirlen; + int basename_offset; + + pathlen = split_path(path, dirlen, basename_offset); + prepare_attr_stack(path, dirlen); + if (cannot_trust_maybe_real) { + for (i = 0; i git_attr_nr; i++) + check_all_attr[i].value = ATTR__UNKNOWN; Judging from the fact that (1) the only caller calls this function in this fashion based on the setting of cannot-trust bit, (2) this and the other function the only caller calls share the same code in their beginning part, and (3) the body of the if() statement here duplicates the code from collect_all_attrs(), I smell that a much better split is possible. Why isn't this all inside a single function collect_all_attrs()? That single function may no longer be collect_ALL_attrs, so renaming it to collect_attrs() is fine, but then that function may have this if () to initialize all of them to ATTR__UNKNOWN or do the else part we see below, and when organized that way we do not need to have duplicated code (or split_path() helper function), no? Something like this? Definitely looks better. -- 8 -- diff --git a/attr.c b/attr.c index b80e52b..0f828e3 100644 --- a/attr.c +++ b/attr.c @@ -33,9 +33,11 @@ struct git_attr { unsigned h; int attr_nr; int maybe_macro; + int maybe_real; char name[FLEX_ARRAY]; }; static int attr_nr; +static int cannot_trust_maybe_real; static struct git_attr_check *check_all_attr; static struct git_attr *(git_attr_hash[HASHSIZE]); @@ -97,6 +99,7 @@ static struct git_attr *git_attr_internal(const char *name, int len) a-next = git_attr_hash[pos]; a-attr_nr = attr_nr++; a-maybe_macro = 0; + a-maybe_real = 0; git_attr_hash[pos] = a; REALLOC_ARRAY(check_all_attr, attr_nr); @@ -269,6 +272,10 @@ static struct match_attr *parse_attr_line(const char *line, const char *src, /* Second pass to fill the attr_states */ for (cp = states, i = 0; *cp; i++) { cp = parse_attr(src, lineno, cp, (res-state[i])); + if (!is_macro) + res-state[i].attr-maybe_real = 1; + if (res-state[i].attr-maybe_macro) + cannot_trust_maybe_real = 1; } return res; @@ -713,7 +720,9 @@ static int macroexpand_one(int nr, int rem) * Collect all attributes for path into the array pointed to by * check_all_attr. */ -static void collect_all_attrs(const char *path) +static void collect_some_attrs(const char *path, int num, + struct git_attr_check *check) + { struct attr_stack *stk; int i, pathlen, rem, dirlen; @@ -736,6 +745,19 @@ static void collect_all_attrs(const char *path) prepare_attr_stack(path, dirlen); for (i = 0; i attr_nr; i++) check_all_attr[i].value = ATTR__UNKNOWN; + if (num !cannot_trust_maybe_real) { + rem = 0; + for (i = 0; i num; i++) { + if (!check[i].attr-maybe_real) { + struct git_attr_check *c; + c = check_all_attr + check[i].attr-attr_nr; + c-value = ATTR__UNSET; + rem++; + } + } + if (rem == num) + return; + } rem = attr_nr; for (stk = attr_stack; 0 rem stk; stk = stk-prev) @@ -746,7 +768,7 @@ int git_check_attr(const char *path, int num, struct git_attr_check *check) { int i; - collect_all_attrs(path); + collect_some_attrs(path, num, check); for (i = 0; i num; i++) { const char *value = check_all_attr[check[i].attr-attr_nr].value; @@ -762,7 +784,7 @@ int git_all_attrs(const char *path, int *num, struct git_attr_check **check) { int i, count, j; - collect_all_attrs(path); + collect_some_attrs(path, 0, NULL); /* Count the number of attributes that are set. */ count = 0; -- 8 -- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 23/23] untracked cache: guard and disable on system changes
On Fri, Dec 12, 2014 at 3:41 AM, Torsten Bögershausen tbo...@web.de wrote: Even if I share the the concerns that the cache may work on one system, but not on the other, there should be better ways to protect from that. Using the uname does not really help, if you move one repo from NTFS to VFAT, we will not detect it (assuming we use Windows). (And how much do we need to support the move of a repo ?) There is a concern that this may not work, when different clients are accessing the repo, using the UNTR extension. Some kind of sanity check would be good to have, what can be done ? The most important things are the timestamps. I can think of 2 sanity checks: - If the modified time stamp of a directory is older then the create time of any file, the UNTR cache can not be used. - If the timestamp of a file changes, but the sha1 sum is the same, what does this mean? The file (or the whole repo) has been copied, or the time stamping does not work. A simple verification of the FS could be to stat() .git/, create a temp file, delete it and stat() again. If mtime does not change, the FS is unusable for UNTR. This is a slow test. Some filesytem only supports second resolution timestamps. If you create and delete the file so fast, mtime may remain in the same second even if the fs is supported. We need to wait a second between those operations (this is why update-index --untracked-cache takes several seconds). So it cannot be done often (i.e. at startup of every command) Then we could extend the uname idea: Create a string in UNTR which is a collection of lines like this: Working-For: Linux;/mnt/nfs/projects/project1 Not-OK-For: WIndows:/a:/project1 (Of course the strings can be made nicer, and '\n' is URL-encoded.) Each system that is not listed needs to probe the repo, add another line and re-write the index. We can even add a best-for line, and invalidate the UNTR every 12 hours or so. It starts to look complicated. How about letting the user deal with it? UNTR will support storing multiple location lines. Whenever UNTR is needed at a new location, it's disabled. The user has to use 'update-index' to test the new location and add it to UNTR. The user can choose to keep current locations (network access), or replace them all with the new one (repo move), or just mark the new location unusable so the extension is kept for use in other places, but warning at this place is suppressed. THe localtion consists of worktree path, system name and host name. Should we think about having an ASCII area for additional information, which is part of the stone, but the content is flexible ? These lines are already in free form. If the running system generates the same string as stored in UNTR, it's allowed to use the extension. We need code to understand this content, so flexibility must be within limit.. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html