[PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec

2014-12-14 Thread Роман Донченко
More specifically:

* Add \ to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both q and Q to identify the encoding.

* Allow lowercase hexadecimal digits in the Q encoding.

And, more on the cosmetic side:

* Change the encoded-text regex to exclude rather than include characters,
  for clarity and consistency with token.

Signed-off-by: Роман Донченко d...@corrigendum.ru
Acked-by: Jeff King p...@peff.net
---
 git-send-email.perl | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()@,;:\\\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
local ($_) = @_;
-   my $encoding;
-   s{=\?([^?]+)\?q\?(.*?)\?=}{
-   $encoding = $1;
-   my $e = $2;
-   $e =~ s/_/ /g;
-   $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-   $e;
+   my $charset;
+   s{$re_encoded_word}{
+   $charset = $1;
+   my $encoding = $2;
+   my $text = $3;
+   if ($encoding eq 'q' || $encoding eq 'Q') {
+   $text =~ s/_/ /g;
+   $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+   $text;
+   } else {
+   $; # other encodings not supported yet
+   }
}eg;
-   return wantarray ? ($_, $encoding) : $_;
+   return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
my $s = shift;
-   my $token = qr/[^][()@,;:\/?.= \000-\037\177-\377]+/;
-   my $encoded_text = qr/[!-@-~]+/;
length($s) = 75 
-   $s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o;
+   $s =~ m/^(?:[[:ascii:]]*|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly

2014-12-14 Thread Роман Донченко
The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).

Signed-off-by: Роман Донченко d...@corrigendum.ru
Acked-by: Jeff King p...@peff.net
---
 git-send-email.perl   | 26 --
 t/t9001-send-email.sh |  7 +++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index d461ffb..7d5cc8a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -919,17 +919,23 @@ $time = time - scalar $#files;
 sub unquote_rfc2047 {
local ($_) = @_;
my $charset;
-   s{$re_encoded_word}{
-   $charset = $1;
-   my $encoding = $2;
-   my $text = $3;
-   if ($encoding eq 'q' || $encoding eq 'Q') {
-   $text =~ s/_/ /g;
-   $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
-   $text;
-   } else {
-   $; # other encodings not supported yet
+   my $sep = qr/[ \t]+/;
+   s{$re_encoded_word(?:$sep$re_encoded_word)*}{
+   my @words = split $sep, $;
+   foreach (@words) {
+   m/$re_encoded_word/;
+   $charset = $1;
+   my $encoding = $2;
+   my $text = $3;
+   if ($encoding eq 'q' || $encoding eq 'Q') {
+   $_ = $text;
+   s/_/ /g;
+   s/=([0-9A-F]{2})/chr(hex($1))/egi;
+   } else {
+   # other encodings not supported yet
+   }
}
+   join '', @words;
}eg;
return wantarray ? ($_, $charset) : $_;
 }
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 19a3ced..fa965ff 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is 
suppressed' 
'non_ascii_self_suppressed'
 
 
+# This name is long enough to force format-patch to split it into multiple
+# encoded-words, assuming it uses UTF-8 with the Q encoding.
+test_expect_success $PREREQ 'long non-ascii self name is suppressed' 
+   test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=m...@example.com' \
+   'long_non_ascii_self_suppressed'
+
+
 test_expect_success $PREREQ 'sanitized self name is suppressed' 
test_suppress_self_unquoted '\A U. Thor\' 'aut...@example.com' \
'self_name_sanitized_suppressed'
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] strbuf.h: format asciidoc code blocks as 4-space indent

2014-12-14 Thread Michael Haggerty
On 12/12/2014 11:39 PM, Jonathan Nieder wrote:
 Jeff King wrote:
 
 This is much easier to read when the whole thing is stuffed
 inside a comment block. And there is precedent for this
 convention in markdown (and just in general ascii text).

 Signed-off-by: Jeff King p...@peff.net
 ---
 
 Reviewed-by: Jonathan Nieder jrnie...@gmail.com
 
 As a side note, I actually find markdown much more pleasant to read and
 write than asciidoc.
 
 I do, too.  Quoting in asciidoc is a nightmare.

Peff, thanks for working on this. I think it is a definite improvement.

I suggest that we accept the use of asciidoc/markdown's convention of
using backwards quotes to mark code snippets (especially identifier
names) within comments *anywhere* in our code base. For example, this
appears in refs.c:

/*
 * Create a struct ref_entry object for the specified dirname.
 * dirname is the name of the directory with a trailing slash
 * (e.g., refs/heads/) or  for the top-level directory.
 */

I claim that it is more readable with a tiny bit of markup:

/*
 * Create a `struct ref_entry` object for the specified `dirname`.
 * `dirname` is the name of the directory with a trailing slash
 * (e.g., refs/heads/) or  for the top-level directory.
 */

Marking up `struct ref_entry` helps make it clear that the two words
belong together, and marking up `dirname` makes it clear that we are
talking about a specific identifier (in this case, a function parameter).

Currently, comments use a mix of unadorned text, single-quoted text, and
double-quoted text when talking about code. I think the
asciidoc/markdown convention is clearer [1].

I think we shouldn't be pedantic about this. When a comment is readable
with no markup, there's no need to add markup. And incorrect markup
shouldn't by itself be reason to reject a patch. But in many examples, a
little bit of markup makes the text less ambiguous and easier to read.

Michael

[1] Yes, I see the irony in trying to improve a mixture of three
conventions by adding a fourth one.

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [wish] Revert changes in git gui

2014-12-14 Thread Christoph Grüninger
Hi Bert,
yes and no. I couldn't find a message from you requesting a merge of
this patch. Maybe it's me, I am not familiar with the way it works for Git.

@Git developers: Do you consider merging Bert's patch? If not, what's
the reason?

Bye
Christoph


On 12.12.2014 at 10:28, Bert Wesarg wrote:
 On Fri, Dec 12, 2014 at 9:27 AM, Christoph Grüninger f...@grueninger.de 
 wrote:
 Hi Bert,
 your commit is more than half a year old. Do you intent to include that
 into Git master? If not, what's the reason?
 
 Thats a really odd question to a person who posted this patch to the
 mailling list the fist place, isn't it? If anything you should have
 asked the git gui developers and community, why they didn't show
 interest to have this in master, right?
 
 Bert
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [wish] Revert changes in git gui

2014-12-14 Thread Philip Oakley

From: Christoph Grüninger f...@grueninger.de

Hi Bert,
yes and no. I couldn't find a message from you requesting a merge of
this patch. Maybe it's me, I am not familiar with the way it works for
Git.

@Git developers: Do you consider merging Bert's patch? If not, what's
the reason?

Bye
Christoph


On 12.12.2014 at 10:28, Bert Wesarg wrote:

On Fri, Dec 12, 2014 at 9:27 AM, Christoph Grüninger
f...@grueninger.de wrote:

Hi Bert,
your commit is more than half a year old. Do you intent to include
that
into Git master? If not, what's the reason?


Thats a really odd question to a person who posted this patch to the
mailling list the fist place, isn't it? If anything you should have
asked the git gui developers and community, why they didn't show
interest to have this in master, right?


Hi,
Git gui isn't maintained by Junio himself :
http://git-blame.blogspot.co.uk/2011/04/note-from-maintainer.html
quote
Although the following are included in git.git repository, they have
their own authoritative repository and maintainers:

 a.. git-gui/ comes from git-gui project, maintained by Pat Thoyts:

git://repo.or.cz/git-gui.git
/quote

Perhaps copy the original patch to Pat, with the justifications, reviews 
and Acks, to see if it's acceptable.

--
Philip

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Making reflog modifications part of the transactions API

2014-12-14 Thread Michael Haggerty
On 12/12/2014 10:16 PM, ronnie sahlberg wrote:
 On Fri, Dec 12, 2014 at 11:17 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 [...]
 What am I missing?
 
 My original idea was to clean up a bit of the reflog handling API and
 have one single transaction API for all  ref and reflog operations.
 
 Think future use cases where you have a database backend for both refs
 and reflogs. It would be very nice to have a single atomic transaction
 that would either commit or fail atomically any update to refs and/or
 reflogs.
 Otherwise you would have all consumers of the API have to invent their
 own transaction and rewind support : 'oh the ref transaction failed
 and I have already done the reflog commit,   have to manually uncommit
 
 And this quickly becomes quite burdensome for consumers.
 
 I think a transaction API should remove this burden from consumers and
 make it as easy as possible to use the API.
 
 Conditional of if it is desireable to have transactions for reflogs at all.

Nobody is against ACID. But the API to trigger an ACID update doesn't
always have to look like

ref_transaction_begin()
ref_transaction_update_XXX()
/* ... */
ref_transaction_commit()

The reflog_expire() function that I wrote does everything within an
internal transaction that the caller doesn't have to know about.
Similarly, the reflog update that happens when a reference is updated
also occurs within a transaction, even without the caller having to ask
for the reflog update explicitly.

 About the cleanup part. The current API, and I think also the current
 direction of both my old patches (which I think did not go far enough
 in the transactifications) or this current patchseries is that they
 all
 have a very confusing and inconsistent API for reflog updates.
 With this I mean,   sometimes reflog updates happen within a
 transaction as a side effect of a ref_transaction_update().
 Other times reflog updates happens ooutside of transactions by calling
 a special reflog API.
 
 I.e.  reflogs sometimes update as part of a transaction and sometimes not.
 A follow up question then on this API is what should happen if you
 have a transaction open, but not committed, and while the transaction
 is open you call the non-transactional reflog API for a reflog for the
 same ref that is already beeing/or going to be/ updated as the
 ref-update-side-effect ?

The same thing happens as when two independent processes try to update
the same reference simultaneously: one fails. But there is currently no
need for any command that would do such a thing.

 I think an api where sometimes you operate on foo from within a
 transaction and sometimes you do not, and if you do the latter when a
 transaction is open, it is unclear what should happen is not great.
 IMHO, refs and reflog updates are related and I think:
 
 * a transaction should be the ONLY way to mutate either a ref or a
 reflog or both.
 * if you update both a ref and a reflog then that should happen as
 part of one single transaction.
 * (later) it would probably make the API better if the code was
 refactored so write_ref_sha1() will NOT call log_ref_write() anymore
 and instead make the reflog update that happens explicit perhaps by
 calling something like a ref_transaction_update_reflog() as part of
 the ref_transaction_update() call.

I disagree. The reflog should be updated *whenever* a reference is
updated (for references that have reflogs enabled). So why should
callers have to remember to trigger the reflog update as an extra step?
It's better that the reflog update is an intrinsic part of updating the
reference; that way nobody can forget it.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 4/4] attr: avoid heavy work when we know the specified attr is not defined

2014-12-14 Thread Duy Nguyen
On Tue, Dec 09, 2014 at 04:18:57PM -0800, Junio C Hamano wrote:
 Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:
 
  +static void collect_selected_attrs(const char *path, int num,
  +  struct git_attr_check *check)
  +{
  +   struct attr_stack *stk;
  +   int i, pathlen, rem, dirlen;
  +   int basename_offset;
  +
  +   pathlen = split_path(path, dirlen, basename_offset);
  +   prepare_attr_stack(path, dirlen);
  +   if (cannot_trust_maybe_real) {
  +   for (i = 0; i  git_attr_nr; i++)
  +   check_all_attr[i].value = ATTR__UNKNOWN;
 
 Judging from the fact that
 
  (1) the only caller calls this function in this fashion based on the
  setting of cannot-trust bit,
 
  (2) this and the other function the only caller calls share the
  same code in their beginning part, and
 
  (3) the body of the if() statement here duplicates the code from
  collect_all_attrs(),
 
 I smell that a much better split is possible.
 
 Why isn't this all inside a single function collect_all_attrs()?
 That single function may no longer be collect_ALL_attrs, so renaming
 it to collect_attrs() is fine, but then that function may have this
 if () to initialize all of them to ATTR__UNKNOWN or do the else part
 we see below, and when organized that way we do not need to have
 duplicated code (or split_path() helper function), no?

Something like this? Definitely looks better.

-- 8 --
diff --git a/attr.c b/attr.c
index b80e52b..0f828e3 100644
--- a/attr.c
+++ b/attr.c
@@ -33,9 +33,11 @@ struct git_attr {
unsigned h;
int attr_nr;
int maybe_macro;
+   int maybe_real;
char name[FLEX_ARRAY];
 };
 static int attr_nr;
+static int cannot_trust_maybe_real;
 
 static struct git_attr_check *check_all_attr;
 static struct git_attr *(git_attr_hash[HASHSIZE]);
@@ -97,6 +99,7 @@ static struct git_attr *git_attr_internal(const char *name, 
int len)
a-next = git_attr_hash[pos];
a-attr_nr = attr_nr++;
a-maybe_macro = 0;
+   a-maybe_real = 0;
git_attr_hash[pos] = a;
 
REALLOC_ARRAY(check_all_attr, attr_nr);
@@ -269,6 +272,10 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
/* Second pass to fill the attr_states */
for (cp = states, i = 0; *cp; i++) {
cp = parse_attr(src, lineno, cp, (res-state[i]));
+   if (!is_macro)
+   res-state[i].attr-maybe_real = 1;
+   if (res-state[i].attr-maybe_macro)
+   cannot_trust_maybe_real = 1;
}
 
return res;
@@ -713,7 +720,9 @@ static int macroexpand_one(int nr, int rem)
  * Collect all attributes for path into the array pointed to by
  * check_all_attr.
  */
-static void collect_all_attrs(const char *path)
+static void collect_some_attrs(const char *path, int num,
+  struct git_attr_check *check)
+
 {
struct attr_stack *stk;
int i, pathlen, rem, dirlen;
@@ -736,6 +745,19 @@ static void collect_all_attrs(const char *path)
prepare_attr_stack(path, dirlen);
for (i = 0; i  attr_nr; i++)
check_all_attr[i].value = ATTR__UNKNOWN;
+   if (num  !cannot_trust_maybe_real) {
+   rem = 0;
+   for (i = 0; i  num; i++) {
+   if (!check[i].attr-maybe_real) {
+   struct git_attr_check *c;
+   c = check_all_attr + check[i].attr-attr_nr;
+   c-value = ATTR__UNSET;
+   rem++;
+   }
+   }
+   if (rem == num)
+   return;
+   }
 
rem = attr_nr;
for (stk = attr_stack; 0  rem  stk; stk = stk-prev)
@@ -746,7 +768,7 @@ int git_check_attr(const char *path, int num, struct 
git_attr_check *check)
 {
int i;
 
-   collect_all_attrs(path);
+   collect_some_attrs(path, num, check);
 
for (i = 0; i  num; i++) {
const char *value = 
check_all_attr[check[i].attr-attr_nr].value;
@@ -762,7 +784,7 @@ int git_all_attrs(const char *path, int *num, struct 
git_attr_check **check)
 {
int i, count, j;
 
-   collect_all_attrs(path);
+   collect_some_attrs(path, 0, NULL);
 
/* Count the number of attributes that are set. */
count = 0;
-- 8 --
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 23/23] untracked cache: guard and disable on system changes

2014-12-14 Thread Duy Nguyen
On Fri, Dec 12, 2014 at 3:41 AM, Torsten Bögershausen tbo...@web.de wrote:
 Even if I share the the concerns that the cache may work on one system,
 but not on the other, there should be better ways to protect from that.

 Using the uname does not really help, if you move one repo from NTFS to VFAT,
 we will not detect it (assuming we use Windows).
 (And how much do we need to support the move of a repo ?)

 There is a concern that this may not work, when different clients are 
 accessing
 the repo, using the UNTR extension.

 Some kind of sanity check would be good to have, what can be done ?
 The most important things are the timestamps.
 I can think of 2 sanity checks:
 - If the modified time stamp of a directory is older then the create time of 
 any file,
   the UNTR cache can not be used.
 - If the timestamp of a file changes, but the sha1 sum is the same, what does 
 this mean?
   The file (or the whole repo) has been copied, or the time stamping does not 
 work.

 A simple verification of the FS could be to stat() .git/, create a temp file, 
 delete it and
 stat() again. If mtime does not change, the FS is unusable for UNTR.

This is a slow test. Some filesytem only supports second resolution
timestamps. If you create and delete the file so fast, mtime may
remain in the same second even if the fs is supported. We need to wait
a second between those operations (this is why update-index
--untracked-cache takes several seconds). So it cannot be done often
(i.e. at startup of every command)

 Then we could extend the uname idea:
 Create a string in UNTR which is a collection of lines like this:

 Working-For: Linux;/mnt/nfs/projects/project1
 Not-OK-For: WIndows:/a:/project1
 (Of course the strings can be made nicer, and '\n' is URL-encoded.)

 Each system that is not listed needs to probe the repo, add another line
 and re-write the index.

 We can even add a best-for line, and invalidate the UNTR every 12 hours or 
 so.

It starts to look complicated. How about letting the user deal with
it? UNTR will support storing multiple location lines. Whenever UNTR
is needed at a new location, it's disabled. The user has to use
'update-index' to test the new location and add it to UNTR. The user
can choose to keep current locations (network access), or replace them
all with the new one (repo move), or just mark the new location
unusable so the extension is kept for use in other places, but warning
at this place is suppressed. THe localtion consists of worktree
path, system name and host name.

 Should we think about having an ASCII area for additional information, which 
 is part
 of the stone, but the content is flexible ?

These lines are already in free form. If the running system generates
the same string as stored in UNTR, it's allowed to use the extension.
We need code to understand this content, so flexibility must be within
limit..
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html