Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-19 Thread Johannes Schindelin
Hi Junio,

On Wed, 18 Jan 2017, Junio C Hamano wrote:

> Johannes Sixt  writes:
> 
> > Am 17.01.2017 um 20:17 schrieb Junio C Hamano:
> >> So... can we move this forward?
> >
> > I have no objects anymore.
> 
> Alright, thanks.  
> 
> Dscho what's your assessment?

I still think it will be a problem. I'll address that problem when I get
bug reports, to unblock you.

Ciao,
Johannes


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-18 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 17.01.2017 um 20:17 schrieb Junio C Hamano:
>> So... can we move this forward?
>
> I have no objects anymore.
>
> -- Hannes

Alright, thanks.  

Dscho what's your assessment?  My "I do not think" before the quoted
"can we move" above was more about giving a statement for people to
argue against, by saying "no your understanding is wrong", in the
hope that it would highlight why we need more work in this area if
needed and in what way.

Thanks.


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-18 Thread Johannes Schindelin
Hi Junio,

On Tue, 17 Jan 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Sun, 15 Jan 2017, Junio C Hamano wrote:
> >
> >> * js/prepare-sequencer-more (2017-01-09) 38 commits
> 
> > The only outstanding review comments I know about are your objection
> > to the name of the read_env_script() function (which I shot down as
> > bogus),
> 
> Not the "name", but the implementation not sharing the same code
> with "am" codepath even though they originate from the same shell
> function and serve the same purpose.

They do not originate from the same shell function at all. The scripted
git-am used a function called get_author_ident_from_commit to read the
author-script file. It also used "eval $(cat .../author-script)" to
evaluate the script later on.

And while git-rebase--interactive.sh's ". .../author-script" is very
similar to that latter invocation, I grant you that, the claim that those
codepaths originate from the same shell function is false.

Worse.

git-am.sh not only used a different method to read out the author-script,
and not only left no way for the user to modify that author-script, there
are many more differences in the exact code paths when comparing git-am.sh
to git-rebase--interactive.sh.

git-am.sh goes out of its way to not only apply the patch manually but
also to use the low-level `write-tree` and `commit-tree` commands. The
builtin am reproduces this as faithfully as possible (the author-script is
not actually evaluated, of course, but the builtin am knows that it has
to validate the author-script's contents for the first code path reading
the author-script anyway, so it reuses that function for the second code
path, knowing fully well the exact format of that file because it knows it
wrote it without any chance of the user messing it up).

The builtin am also avoids spawning the low-level commands, opting instead
to call the functions in libgit.a directly. This is all done properly, and
as a consequence, the environment variables GIT_AUTHOR_* never become,
ahem, environment variables in the builtin am, but an "author" parameter
(which is a const char * pointing to a ready-to-use ident line) that gets
passed to the commit_tree() function.

Now, let's have a brief look at git-rebase--interactive.sh.

It was rightfully nicknamed "cherry-pick on drugs" in its early days,
because that is what it does: it calls `git cherry-pick` *most* of the
time.

So what does the (sequencer via being called from the) rebase--helper do?
Yep, exactly: it calls the internal do_pick() function that is the working
horse of the cherry-pick. And does that working horse call write_tree()
and commit_tree()? No, it does not. It *spawns a git-commit process*! And
the way to specify the author there is to pass the GIT_AUTHOR_*
environment variables.

Let's recapitulate.

Not only did the scripted am use a totally different code path to *read*
the author-script than rebase -i, the code path after that also differed
substantially from rebase -i to the point that very, very different Git
commands were called.

It also so happens that the proper way to convert those code paths into
builtin code uses very, very different functions from libgit.a that
require very, very different handling.

The builtin am never sets the environment variables GIT_AUTHOR_* but
instead constructs a complete ident line from it, and therefore *must*
validate the contents of author-script.

The sequencer *has to* set the GIT_AUTHOR_* environment variables because
it calls `git commit` in a different process *that already validates the
GIT_AUTHOR_* variables*.

Even worse (and this is not the first, or second, or third time I pointed
this out): if you mistook the identical file name, and identical content,
to mean that the different code paths *must* use a *single* function to
read the author-script (nevermind that the am code path reads it into a
structure that is specifically designed to support the "am" operation, and
nothing else), you would not only force the sequencer call to provide a
data structure it does not want, not only to validate the contents
unnecessarily, but it would have to lego the parsed contents *back into
almost the original shape as the original contents of the author-script*.
So it would have to add tons of code relative to the current version, for
no benefit at all, *increasing your maintenance burden for no good
reason*.

This repeated suggestion to reuse the code path to read the author-script
repeatedly ignores the fact that we have to do very, very different things
with the contents in those two code paths.

And if that was not enough: we are talking about code that is rock solid,
has been tested for almost a *year*, half of that year with a painful
cross-validation by running old and new rebase -i and comparing their
output. And this rock solid code is what you want to force me to change to
code that neither makes sense nor is tested well.

And I refuse to be forced to do that: we just proved colle

Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Johannes Sixt

Am 17.01.2017 um 20:17 schrieb Junio C Hamano:

So... can we move this forward?


I have no objects anymore.

-- Hannes



Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Jeff King
On Tue, Jan 17, 2017 at 12:32:26PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > That was my general impression, too. But I seem to recall it was you in
> > a nearby thread saying that:
> >
> >   if (foo)
> > bar();
> >   else {
> > one();
> > two();
> >   }
> >
> > was wrong. Maybe I misunderstood.
> 
> If it were a new code written like the above, that would have been
> fine.  If a new code written with both sides inside {}, that would
> have been fine, too.
> 
> IIRC, it was that the original had {} on both, and a patch tried to
> turn that into the above, triggering "why are we churning between
> two acceptable forms?"

Ah, OK. I didn't follow that discussion closely enough to realize that.

-Peff


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Junio C Hamano
Jeff King  writes:

> That was my general impression, too. But I seem to recall it was you in
> a nearby thread saying that:
>
>   if (foo)
>   bar();
>   else {
> one();
>   two();
>   }
>
> was wrong. Maybe I misunderstood.

If it were a new code written like the above, that would have been
fine.  If a new code written with both sides inside {}, that would
have been fine, too.

IIRC, it was that the original had {} on both, and a patch tried to
turn that into the above, triggering "why are we churning between
two acceptable forms?"


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Sun, 15 Jan 2017, Junio C Hamano wrote:
>
>> * js/prepare-sequencer-more (2017-01-09) 38 commits
>
> I think that it adds confusion rather than value to specifically use a
> different branch name than I indicated in my submission, unless there is a
> really good reason to do so (which I fail to see here).

I've been meaning to rename it to match yours, for the exact reason.
The only reason was I needed my time spent to deal with other
topics, and reusing the same topic name as I used very first time
was less work.  I'll find time to update it (if you are curious, it
is not just "git branch -m").

Thanks.

> The only outstanding review comments I know about are your objection to
> the name of the read_env_script() function (which I shot down as bogus),

Not the "name", but the implementation not sharing the same code
with "am" codepath even though they originate from the same shell
function and serve the same purpose.

> and the rather real bug fix I sent out as a fixup! which you may want to
> squash in (in the alternative, I can mailbomb v4 of the entire sequencer-i
> patch series, that is your choice).

I'd rather see the "make elengant" thing redone in a more sensible
way, but I feel that it is waste of my time to help you see the
distinction between maintainable code and code that happens to work
with today's requirement, so I give up, hoping that other people
will later refactor the code that should be shared after the series
lands.  I'll squash the fixup! thing in, and as I already said a few
days ago, I do not think we'd want v4 (rather we'd want to go
incremental).


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Jeff King
On Tue, Jan 17, 2017 at 11:20:58AM -0800, Junio C Hamano wrote:

> > Documentation/CodingGuidelines says:
> >
> >  - We avoid using braces unnecessarily.  I.e.
> >
> > if (bla) {
> > x = 1;
> > }
> >
> >is frowned upon.  A gray area is when the statement extends
> >over a few lines, and/or you have a lengthy comment atop of
> >it.  Also, like in the Linux kernel, if there is a long list
> >of "else if" statements, it can make sense to add braces to
> >single line blocks.
> >
> > I think this is pretty clearly the "gray area" mentioned there. Which
> > yes, does not say "definitely do it this way", but I hope makes it clear
> > that you're supposed to use judgement about readability.
> 
> I always took "gray area" to mean "we do not have strong preference
> either way, i.e.
> 
>  * It is OK for you to write your new code in either style (the
>usual "match existing style in surrounding code" applies,
>obviously);
> 
>  * It is not OK for you to churn the codebase with a patch that only
>changes existing code to flip between the two styles.

That was my general impression, too. But I seem to recall it was you in
a nearby thread saying that:

  if (foo)
bar();
  else {
one();
two();
  }

was wrong. Maybe I misunderstood.

-Peff


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Junio C Hamano
Jeff King  writes:

> Documentation/CodingGuidelines says:
>
>  - We avoid using braces unnecessarily.  I.e.
>
> if (bla) {
> x = 1;
> }
>
>is frowned upon.  A gray area is when the statement extends
>over a few lines, and/or you have a lengthy comment atop of
>it.  Also, like in the Linux kernel, if there is a long list
>of "else if" statements, it can make sense to add braces to
>single line blocks.
>
> I think this is pretty clearly the "gray area" mentioned there. Which
> yes, does not say "definitely do it this way", but I hope makes it clear
> that you're supposed to use judgement about readability.

I always took "gray area" to mean "we do not have strong preference
either way, i.e.

 * It is OK for you to write your new code in either style (the
   usual "match existing style in surrounding code" applies,
   obviously);

 * It is not OK for you to churn the codebase with a patch that only
   changes existing code to flip between the two styles.


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jan 16, 2017 at 09:33:07PM +0100, Johannes Sixt wrote:
>
>> However, Jeff's patch is intended to catch exactly these cases (not for the
>> cases where this happens accidentally, but when they happen with malicious
>> intent).
>> 
>> We are talking about user-provided data that is reproduced by die() or
>> error(). I daresay that we do not have a single case where it is intended
>> that this data is intentionally multi-lined, like a commit message. It can
>> only be an accident or malicious when it spans across lines.
>> 
>> I know we allow CR and LF in file names, but in all cases where such a name
>> appears in an error message, it is *not important* that the data is
>> reproduced exactly. On the contrary, it is usually more helpful to know that
>> something strange is going on. The question marks are a strong indication to
>> the user for this.
>
> Yes, exactly. Thanks for explaining this better than I obviously was
> doing. :)

Yup.  In the "funny filename" case, the updated code indeed is doing
the right thing.  So one remaining possible issue is if we ourselves
deliberately feed a raw CR (or any non-filtered control code) to
error() and friends because their native effect (like returning the
carriage to overwrite what we have already said with the remainder
of the message by using CR) to functions that go through vreportf()
interface.  I personally do not think there is any---the progress
meter code does use CR for that but they don't do vreportf().

I do not think we write "message\r\n" even for Windows; we rely on
vfprintf() and other stdio layer to turn "message\n" into
"message\r\n" before it hits write(2) or its equivalent?  So I
understand that this should affect LF and CRLF systems the same way
(meaning, no negative effect).

So... can we move this forward?



Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-17 Thread Jeff King
On Mon, Jan 16, 2017 at 05:33:44PM -0800, Jacob Keller wrote:

> > I am certainly sympathetic to the idea that the "?" is ugly and
> > potentially confusing. But I think it's at least a step forward for this
> > particular example.
> 
> Would it be possible to convert the CR to a literal \r printing? Since
> it's pretty common to show these characters as slash-escaped? (Or is
> that too much of a Unix world thing?) I know I'd find \r less
> confusing than '?'

I discussed this in the original commit message.

Yes, it's possible, but it's a little tricky. We want to fprintf() the
whole thing as a unit (to increase the chances of it being done in an
atomic write()). We don't want to use a dynamic buffer, since we might
be called from a signal handler, or when malloc has failed, etc.

So I tried to stick to something that could reliably done in place. We
could probably get by with 2 static buffers and copy from one to the
other (if stack space is an issue, the current one is 4K, which is
probably excessively large anyway).

Mostly I just assumed that it was highly unlikely anybody would see this
escaping in the first place. So I tried to go with the simplest
solution.

-Peff


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Jacob Keller
On Mon, Jan 16, 2017 at 2:00 PM, Jeff King  wrote:
> On Mon, Jan 16, 2017 at 06:06:35PM +0100, Johannes Schindelin wrote:
>
>> >  And I think that would apply to any input parameter we show via
>> >  error(), etc, if it is connected to a newline (ideally we would
>> >  show newlines as "?", too, but we cannot tell the difference
>> >  between ones from parameters, and ones that are part of the error
>> >  message).
>>
>> I think it is doing users a really great disservice to munge up CR or LF
>> into question marks. I *guarantee* you that it confuses users. And not
>> because they are dumb, but because the code violates the Law of Least
>> Surprise.
>
> I'm not sure if you realize that with stock git, the example from your
> test looks like this (at least in my terminal):
>
>   $ git.v2.11.0 rev-parse --abbrev-ref "$(printf 'CR/LF\r\n')" >/dev/null
>   ': unknown revision or path not in the working tree.
>   Use '--' to separate paths from revisions, like this:
>   'git  [...] -- [...]'
>
> The "\r" causes us to overwrite the rest of the message, including the
> actual filename. With my patch it's:
>
>   $ git rev-parse --abbrev-ref "$(printf 'CR/LF\r\n')" >/dev/null
>   fatal: ambiguous argument 'CR/LF?': unknown revision or path not in the 
> working tree.
>   Use '--' to separate paths from revisions, like this:
>   'git  [...] -- [...]'
>
> I am certainly sympathetic to the idea that the "?" is ugly and
> potentially confusing. But I think it's at least a step forward for this
> particular example.
>

Would it be possible to convert the CR to a literal \r printing? Since
it's pretty common to show these characters as slash-escaped? (Or is
that too much of a Unix world thing?) I know I'd find \r less
confusing than '?'

Thanks,
Jake


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Jeff King
On Mon, Jan 16, 2017 at 06:06:35PM +0100, Johannes Schindelin wrote:

> >  And I think that would apply to any input parameter we show via
> >  error(), etc, if it is connected to a newline (ideally we would
> >  show newlines as "?", too, but we cannot tell the difference
> >  between ones from parameters, and ones that are part of the error
> >  message).
> 
> I think it is doing users a really great disservice to munge up CR or LF
> into question marks. I *guarantee* you that it confuses users. And not
> because they are dumb, but because the code violates the Law of Least
> Surprise.

I'm not sure if you realize that with stock git, the example from your
test looks like this (at least in my terminal):

  $ git.v2.11.0 rev-parse --abbrev-ref "$(printf 'CR/LF\r\n')" >/dev/null
  ': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git  [...] -- [...]'

The "\r" causes us to overwrite the rest of the message, including the
actual filename. With my patch it's:

  $ git rev-parse --abbrev-ref "$(printf 'CR/LF\r\n')" >/dev/null
  fatal: ambiguous argument 'CR/LF?': unknown revision or path not in the 
working tree.
  Use '--' to separate paths from revisions, like this:
  'git  [...] -- [...]'

I am certainly sympathetic to the idea that the "?" is ugly and
potentially confusing. But I think it's at least a step forward for this
particular example.

I'll snip liberally from the rest of your response, because I think what
JSixt wrote covers it.

> > > While at it, let's lose the unnecessary curly braces.
> > 
> > Please don't. Obviously C treats the "if/else" as a single unit, but
> > IMHO it's less error-prone to include the braces any time there are
> > multiple visual lines. E.g., something like:
> > 
> >   while (foo)
> > if (bar)
> > one();
> > else
> > two();
> > three();
> > 
> > is much easier to spot as wrong when you would require braces either
> > way (and not relevant here, but I'd say that even an inner block with a
> > comment deserves braces for the same reason).
> 
> There is no documentation about the preferred coding style.

Documentation/CodingGuidelines says:

 - We avoid using braces unnecessarily.  I.e.

if (bla) {
x = 1;
}

   is frowned upon.  A gray area is when the statement extends
   over a few lines, and/or you have a lengthy comment atop of
   it.  Also, like in the Linux kernel, if there is a long list
   of "else if" statements, it can make sense to add braces to
   single line blocks.

I think this is pretty clearly the "gray area" mentioned there. Which
yes, does not say "definitely do it this way", but I hope makes it clear
that you're supposed to use judgement about readability.

-Peff


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Jeff King
On Mon, Jan 16, 2017 at 09:33:07PM +0100, Johannes Sixt wrote:

> However, Jeff's patch is intended to catch exactly these cases (not for the
> cases where this happens accidentally, but when they happen with malicious
> intent).
> 
> We are talking about user-provided data that is reproduced by die() or
> error(). I daresay that we do not have a single case where it is intended
> that this data is intentionally multi-lined, like a commit message. It can
> only be an accident or malicious when it spans across lines.
> 
> I know we allow CR and LF in file names, but in all cases where such a name
> appears in an error message, it is *not important* that the data is
> reproduced exactly. On the contrary, it is usually more helpful to know that
> something strange is going on. The question marks are a strong indication to
> the user for this.

Yes, exactly. Thanks for explaining this better than I obviously was
doing. :)

> > If you absolutely insist, I will spend time to find a plausible example
> > and use that in the regression test.
> 
> I don't want to see you on an endeavor with dubious results. I'd prefer to
> wait until the first case of "incorrectly munged data" is reported because,
> as I said, I have a gut feeling that there is none.

Agreed.

-Peff


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Johannes Sixt

Am 16.01.2017 um 18:06 schrieb Johannes Schindelin:

On Mon, 16 Jan 2017, Jeff King wrote:

Hmm.  I am not sure to what degree CRLFs are actually a problem here.
Keep in mind these are error messages generated via error(), and so not
processing arbitrary data. I can imagine that CRs might come from:


Please note the regression test I added. It uses rev-parse's --abbrev-ref
option which quotes the argument when erroring out. This argument then
gets munged.

So error() (or in this case, die()) *very much* processes arbitrary data.

I *know* that rev-parse --abbrev-ref is an artificial example, it is
highly unlikely that anybody will use

git rev-parse --abbrev-ref "$()"

However, there are plenty other cases in regular Git usage where arguments
are generated by external programs to which we have no business dictating a
specific line ending style.


However, Jeff's patch is intended to catch exactly these cases (not for 
the cases where this happens accidentally, but when they happen with 
malicious intent).


We are talking about user-provided data that is reproduced by die() or 
error(). I daresay that we do not have a single case where it is 
intended that this data is intentionally multi-lined, like a commit 
message. It can only be an accident or malicious when it spans across lines.


I know we allow CR and LF in file names, but in all cases where such a 
name appears in an error message, it is *not important* that the data is 
reproduced exactly. On the contrary, it is usually more helpful to know 
that something strange is going on. The question marks are a strong 
indication to the user for this.



If you absolutely insist, I will spend time to find a plausible example
and use that in the regression test.


I don't want to see you on an endeavor with dubious results. I'd prefer 
to wait until the first case of "incorrectly munged data" is reported 
because, as I said, I have a gut feeling that there is none.



I am certainly open to loosening the sanitizing for CR to make things
work seamlessly on Windows. But I am having trouble imagining a case
that is actually negatively impacted.


I came to the same conclusion. I regret having sent out a warning 
message in, well, such a haste(*), without thinking the case through 
first. IMHO, Jeff's patch should be fine as is.


(*) literally; I had to catch a train.

-- Hannes



Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Johannes Schindelin
Hi Peff,

On Mon, 16 Jan 2017, Jeff King wrote:

> On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:
> 
> > > >  An error message with an ASCII control character like '\r' in it
> > > >  can alter the message to hide its early part, which is problematic
> > > >  when a remote side gives such an error message that the local side
> > > >  will relay with a "remote: " prefix.
> > > >
> > > >  Will merge to 'next'.
> > > 
> > > Please be not too hasty with advancing this topic to master. I could 
> > > imagine
> > > that there is some unwanted fallout on systems where the end-of-line 
> > > marker
> > > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > > expect regressions in *my* typical workflow.
> > 
> > Thank you for being so attentive.
> > 
> > This topic branch would indeed have caused problems. Worse: it would have
> > caused problems that are not covered by our test suite, as Git for
> > Windows' own utilities do not generate CR/LF line endings. So this
> > regression would have bit our users. Nasty.
> 
> Hmm.  I am not sure to what degree CRLFs are actually a problem here.
> Keep in mind these are error messages generated via error(), and so not
> processing arbitrary data. I can imagine that CRs might come from:

Please note the regression test I added. It uses rev-parse's --abbrev-ref
option which quotes the argument when erroring out. This argument then
gets munged.

So error() (or in this case, die()) *very much* processes arbitrary data.

I *know* that rev-parse --abbrev-ref is an artificial example, it is
highly unlikely that anybody will use

git rev-parse --abbrev-ref "$()"

However, there are plenty other cases in regular Git usage where arguments
are generated by external programs to which we have no business dictating a
specific line ending style.

If you absolutely insist, I will spend time to find a plausible example
and use that in the regression test. However, that would not really
improve anything, as the purpose of the regression test is simply to
demonstrate that a user-provided argument's CR/LF does not get munged
incorrectly. And the test I provided serves that purpose perfectly.

>   1. A parameter like a filename or revision name. If I ask for a
>  rev-parse of "foo\r\n", then it's probably useful to mention the
>  "\r" in the error message, rather than ignoring it (or converting
>  it to a line-feed).
> 
>  And I think that would apply to any input parameter we show via
>  error(), etc, if it is connected to a newline (ideally we would
>  show newlines as "?", too, but we cannot tell the difference
>  between ones from parameters, and ones that are part of the error
>  message).

I think it is doing users a really great disservice to munge up CR or LF
into question marks. I *guarantee* you that it confuses users. And not
because they are dumb, but because the code violates the Law of Least
Surprise.

> I am certainly open to loosening the sanitizing for CR to make things
> work seamlessly on Windows. But I am having trouble imagining a case
> that is actually negatively impacted.

Git accepts all kinds of arguments, not just file names. It is totally
legitimate, and you probably can show use cases of that yourself, to
generate those arguments by other programs.

These programs can generate CR/LF line endings.

While well-intentioned, your changes could make things even unreadable.
Certainly confusing.

> > -- snipsnap --
> > Subject: [PATCH] fixup! vreport: sanitize ASCII control chars
> 
> Given the subtlety here, I'd much rather have a patch on top.

Fine.

> > The original patch is incorrect, as it turns carriage returns into
> > question marks.
> > 
> > However, carriage returns should be left alone when preceding a line feed,
> > and simply turned into line feeds otherwise.
> 
> The question of whether to leave CRLFs alone is addressed above. But I
> do not understand why you'd want a lone CR to be converted to a line
> feed. Running:
> 
>   git rev-parse --abbrev-ref "$(printf "foo\r")"
> 
> with my patch gets you:
> 
>   fatal: ambiguous argument 'foo?': unknown revision or path not in the 
> working tree.
> 
> But with yours:
> 
>   fatal: ambiguous argument 'foo
>   ': unknown revision or path not in the working tree.
> 
> Obviously the "?" thing is a lossy transformation,

s/lossy/confusing/

Probably not to you, because you came up with the transformation, but to
literally everybody else.

> but I do not see how a newline is any less confusing (but see below for
> some thoughts).

The more common bug report to be expected would show that multi-line
arguments are converted to ending in question marks. That is not the user
experience for which I am aiming.

> > To make the end result more readable, the logic is inverted so that the
> > common case (no substitution) is handled first.
> > 
> > While at it, let's lose the unnecessary curly braces.
> 
> Please don't. Obviously C treats the

Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Jeff King
On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:

> > >  An error message with an ASCII control character like '\r' in it
> > >  can alter the message to hide its early part, which is problematic
> > >  when a remote side gives such an error message that the local side
> > >  will relay with a "remote: " prefix.
> > >
> > >  Will merge to 'next'.
> > 
> > Please be not too hasty with advancing this topic to master. I could imagine
> > that there is some unwanted fallout on systems where the end-of-line marker
> > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > expect regressions in *my* typical workflow.
> 
> Thank you for being so attentive.
> 
> This topic branch would indeed have caused problems. Worse: it would have
> caused problems that are not covered by our test suite, as Git for
> Windows' own utilities do not generate CR/LF line endings. So this
> regression would have bit our users. Nasty.

Hmm.  I am not sure to what degree CRLFs are actually a problem here.
Keep in mind these are error messages generated via error(), and so not
processing arbitrary data. I can imagine that CRs might come from:

  1. A parameter like a filename or revision name. If I ask for a
 rev-parse of "foo\r\n", then it's probably useful to mention the
 "\r" in the error message, rather than ignoring it (or converting
 it to a line-feed).

 And I think that would apply to any input parameter we show via
 error(), etc, if it is connected to a newline (ideally we would
 show newlines as "?", too, but we cannot tell the difference
 between ones from parameters, and ones that are part of the error
 message).

  2. The printf-fmt strings themselves. But these come from C code,
 which just uses "\n". My impression is that it is fprintf() which
 is responsible for converting that to "\r\n". And we are doing our
 sanitizing here between an snprintf(), and an fprintf() of the
 result. So it should see only the raw "\n" fields.

I am certainly open to loosening the sanitizing for CR to make things
work seamlessly on Windows. But I am having trouble imagining a case
that is actually negatively impacted.

> -- snipsnap --
> Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

Given the subtlety here, I'd much rather have a patch on top.

> The original patch is incorrect, as it turns carriage returns into
> question marks.
> 
> However, carriage returns should be left alone when preceding a line feed,
> and simply turned into line feeds otherwise.

The question of whether to leave CRLFs alone is addressed above. But I
do not understand why you'd want a lone CR to be converted to a line
feed. Running:

  git rev-parse --abbrev-ref "$(printf "foo\r")"

with my patch gets you:

  fatal: ambiguous argument 'foo?': unknown revision or path not in the working 
tree.

But with yours:

  fatal: ambiguous argument 'foo
  ': unknown revision or path not in the working tree.

Obviously the "?" thing is a lossy transformation, but I do not see how
a newline is any less confusing (but see below for some thoughts).

> To make the end result more readable, the logic is inverted so that the
> common case (no substitution) is handled first.
> 
> While at it, let's lose the unnecessary curly braces.

Please don't. Obviously C treats the "if/else" as a single unit, but
IMHO it's less error-prone to include the braces any time there are
multiple visual lines. E.g., something like:

  while (foo)
if (bar)
one();
else
two();
three();

is much easier to spot as wrong when you would require braces either
way (and not relevant here, but I'd say that even an inner block with a
comment deserves braces for the same reason).

> We also add a regression test that verifies that carriage returns are
> handled properly. And as we expect CR/LF to be handled properly also on
> platforms other than Windows, this test case is not guarded by the MINGW
> prerequisite.

I am not sure what "properly" means here. In your test:

> +# This test verifies that the error reporting functions handle CR correctly.
> +# --abbrev-ref is simply used out of convenience, as it reports an error 
> including
> +# a user-specified argument.
> +test_expect_success 'error messages treat CR/LF correctly' '
> + test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 
> 2>err &&
> + grep "^fatal: .*CR/LF$" err
> +'

The "\n" is eaten by the shell, and git sees only "CR/LF\r". So we are
not testing the CRLF case in vreportf() at all.

We do end up with "CR/LF\n" in vreportf(), which is presumably converted
by fprintf() to "CR/LF\r\n" on Windows. And so perhaps that is why you
are doing the "convert \r to \n" thing above.

But I still think it's not doing the right thing. Git _didn't_ see CRLF,
it saw CR.

-Peff


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Johannes Schindelin
Hi Hannes,

On Mon, 16 Jan 2017, Johannes Sixt wrote:

> Am 16.01.2017 um 02:51 schrieb Junio C Hamano:
> > * jk/vreport-sanitize (2017-01-11) 2 commits
> >  - vreport: sanitize ASCII control chars
> >  - Revert "vreportf: avoid intermediate buffer"
> >
> >  An error message with an ASCII control character like '\r' in it
> >  can alter the message to hide its early part, which is problematic
> >  when a remote side gives such an error message that the local side
> >  will relay with a "remote: " prefix.
> >
> >  Will merge to 'next'.
> 
> Please be not too hasty with advancing this topic to master. I could imagine
> that there is some unwanted fallout on systems where the end-of-line marker
> CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> expect regressions in *my* typical workflow.

Thank you for being so attentive.

This topic branch would indeed have caused problems. Worse: it would have
caused problems that are not covered by our test suite, as Git for
Windows' own utilities do not generate CR/LF line endings. So this
regression would have bit our users. Nasty.

Something like this is necessary, at least:

-- snipsnap --
Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

The original patch is incorrect, as it turns carriage returns into
question marks.

However, carriage returns should be left alone when preceding a line feed,
and simply turned into line feeds otherwise.

To make the end result more readable, the logic is inverted so that the
common case (no substitution) is handled first.

While at it, let's lose the unnecessary curly braces.

We also add a regression test that verifies that carriage returns are
handled properly. And as we expect CR/LF to be handled properly also on
platforms other than Windows, this test case is not guarded by the MINGW
prerequisite.

Spotted by Hannes Sixt.

Signed-off-by: Johannes Schindelin 
---
 t/t-basic.sh | 8 
 usage.c  | 9 ++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 60811a3a7c..d494cb7c05 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -1063,4 +1063,12 @@ test_expect_success 'very long name in the index handled 
sanely' '
test $len = 4098
 '
 
+# This test verifies that the error reporting functions handle CR correctly.
+# --abbrev-ref is simply used out of convenience, as it reports an error 
including
+# a user-specified argument.
+test_expect_success 'error messages treat CR/LF correctly' '
+   test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 
2>err &&
+   grep "^fatal: .*CR/LF$" err
+'
+
 test_done
diff --git a/usage.c b/usage.c
index 50a6ccee44..71bc7c0329 100644
--- a/usage.c
+++ b/usage.c
@@ -15,10 +15,13 @@ void vreportf(const char *prefix, const char *err, va_list 
params)
char *p;
 
vsnprintf(msg, sizeof(msg), err, params);
-   for (p = msg; *p; p++) {
-   if (iscntrl(*p) && *p != '\t' && *p != '\n')
+   for (p = msg; *p; p++)
+   if (!iscntrl(*p) || *p == '\t' || *p == '\n')
+   continue;
+   else if (*p != '\r')
*p = '?';
-   }
+   else if (p[1] != '\n')
+   *p = '\n';
fprintf(fh, "%s%s\n", prefix, msg);
 }
 
-- 
2.11.0.windows.3



Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Johannes Schindelin
Hi Junio,

On Sun, 15 Jan 2017, Junio C Hamano wrote:

> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

A suggestion: since it is very, very tedious to find the latest
iteration's thread, as well as the corresponding commit in `pu` (or
whatever other branch you use), and since you auto-generate these lists
anyway, it would make things less cumbersome if the commit names of the
tips, as well as the Message-IDs of the cover-letter (or first patch) were
displayed with the topic branches.

It still would not address e.g. the problem that the original authors are
not notified about the current state of their submission by these What's
Cooking mails.

But it would improve the situation.

> * js/difftool-builtin (2017-01-09) 5 commits
>  - t7800: run both builtin and scripted difftool, for now
>  - difftool: implement the functionality in the builtin
>  - difftool: add a skeleton for the upcoming builtin
>  - git_exec_path: do not return the result of getenv()

This patch was not in my patch submission. Sneaking it into this topic
branch is not incorrect.

It does not matter, though, as I will drop the Coverity patches from this
patch series: the conflation of Coverity fixes with the builtin difftool
was a mistake, and I will thereby address that mistake.

Ciao,
Johannes


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-16 Thread Johannes Schindelin
Hi Junio,

On Sun, 15 Jan 2017, Junio C Hamano wrote:

> * js/prepare-sequencer-more (2017-01-09) 38 commits

I think that it adds confusion rather than value to specifically use a
different branch name than I indicated in my submission, unless there is a
really good reason to do so (which I fail to see here).

>  - sequencer (rebase -i): write out the final message
>  - sequencer (rebase -i): write the progress into files
>  - sequencer (rebase -i): show the progress
>  - sequencer (rebase -i): suggest --edit-todo upon unknown command
>  - sequencer (rebase -i): show only failed cherry-picks' output
>  - sequencer (rebase -i): show only failed `git commit`'s output
>  - sequencer: use run_command() directly
>  - sequencer: make reading author-script more elegant
>  - sequencer (rebase -i): differentiate between comments and 'noop'
>  - sequencer (rebase -i): implement the 'drop' command
>  - sequencer (rebase -i): allow rescheduling commands
>  - sequencer (rebase -i): respect strategy/strategy_opts settings
>  - sequencer (rebase -i): respect the rebase.autostash setting
>  - sequencer (rebase -i): run the post-rewrite hook, if needed
>  - sequencer (rebase -i): record interrupted commits in rewritten, too
>  - sequencer (rebase -i): copy commit notes at end
>  - sequencer (rebase -i): set the reflog message consistently
>  - sequencer (rebase -i): refactor setting the reflog message
>  - sequencer (rebase -i): allow fast-forwarding for edit/reword
>  - sequencer (rebase -i): implement the 'reword' command
>  - sequencer (rebase -i): leave a patch upon error
>  - sequencer (rebase -i): update refs after a successful rebase
>  - sequencer (rebase -i): the todo can be empty when continuing
>  - sequencer (rebase -i): skip some revert/cherry-pick specific code path
>  - sequencer (rebase -i): remove CHERRY_PICK_HEAD when no longer needed
>  - sequencer (rebase -i): allow continuing with staged changes
>  - sequencer (rebase -i): write an author-script file
>  - sequencer (rebase -i): implement the short commands
>  - sequencer (rebase -i): add support for the 'fixup' and 'squash' commands
>  - sequencer (rebase -i): write the 'done' file
>  - sequencer (rebase -i): learn about the 'verbose' mode
>  - sequencer (rebase -i): implement the 'exec' command
>  - sequencer (rebase -i): implement the 'edit' command
>  - sequencer (rebase -i): implement the 'noop' command
>  - sequencer: support a new action: 'interactive rebase'
>  - sequencer: use a helper to find the commit message
>  - sequencer: move "else" keyword onto the same line as preceding brace
>  - sequencer: avoid unnecessary curly braces
> 
>  The sequencer has further been extended in preparation to act as a
>  back-end for "rebase -i".
> 
>  Waiting for review comments to be addressed.

The only outstanding review comments I know about are your objection to
the name of the read_env_script() function (which I shot down as bogus),
and the rather real bug fix I sent out as a fixup! which you may want to
squash in (in the alternative, I can mailbomb v4 of the entire sequencer-i
patch series, that is your choice).

Ciao,
Johannes


Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-15 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 16.01.2017 um 02:51 schrieb Junio C Hamano:
>> * jk/vreport-sanitize (2017-01-11) 2 commits
>>  - vreport: sanitize ASCII control chars
>>  - Revert "vreportf: avoid intermediate buffer"
>>
>>  An error message with an ASCII control character like '\r' in it
>>  can alter the message to hide its early part, which is problematic
>>  when a remote side gives such an error message that the local side
>>  will relay with a "remote: " prefix.
>>
>>  Will merge to 'next'.
>
> Please be not too hasty with advancing this topic to master. I could
> imagine that there is some unwanted fallout on systems where the
> end-of-line marker CRLF is common. Though, I haven't tested the topic
> myself, yet, nor do I expect regressions in *my* typical workflow.

Thanks; will wait for a further discussion on the topic's thread
then.



Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-15 Thread Johannes Sixt

Am 16.01.2017 um 02:51 schrieb Junio C Hamano:

* jk/vreport-sanitize (2017-01-11) 2 commits
 - vreport: sanitize ASCII control chars
 - Revert "vreportf: avoid intermediate buffer"

 An error message with an ASCII control character like '\r' in it
 can alter the message to hide its early part, which is problematic
 when a remote side gives such an error message that the local side
 will relay with a "remote: " prefix.

 Will merge to 'next'.


Please be not too hasty with advancing this topic to master. I could 
imagine that there is some unwanted fallout on systems where the 
end-of-line marker CRLF is common. Though, I haven't tested the topic 
myself, yet, nor do I expect regressions in *my* typical workflow.


-- Hannes



What's cooking in git.git (Jan 2017, #02; Sun, 15)

2017-01-15 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* bw/read-blob-data-does-not-modify-index-state (2017-01-11) 1 commit
 - index: improve constness for reading blob data

 Code clean-up.

 Will merge to 'next'.


* ep/commit-static-buf-cleanup (2017-01-13) 2 commits
 - builtin/commit.c: switch to xstrfmt(), instead of snprintf()
 - builtin/commit.c: remove the PATH_MAX limitation via dynamic allocation

 Code clean-up.

 Expecting a reroll.
 The tip one would instead be done with strbuf.
 cf. 


* jk/grep-e-could-be-extended-beyond-posix (2017-01-11) 1 commit
 - t7810: avoid assumption about invalid regex syntax

 Tighten a test to avoid mistaking an extended ERE regexp engine as
 a PRE regexp engine.

 Will merge to 'next'.


* jk/vreport-sanitize (2017-01-11) 2 commits
 - vreport: sanitize ASCII control chars
 - Revert "vreportf: avoid intermediate buffer"

 An error message with an ASCII control character like '\r' in it
 can alter the message to hide its early part, which is problematic
 when a remote side gives such an error message that the local side
 will relay with a "remote: " prefix.

 Will merge to 'next'.


* sb/unpack-trees-super-prefix (2017-01-12) 5 commits
 - SQUASH
 - unpack-trees: support super-prefix option
 - t1001: modernize style
 - t1000: modernize style
 - read-tree: use OPT_BOOL instead of OPT_SET_INT

 "git read-tree" and its underlying unpack_trees() machinery learned
 to report problematic paths prefixed with the --super-prefix option.

 Expecting a reroll.
 The first three are in good shape.  The last one needs a better
 explanation and possibly an update to its test.
 cf. 


* bw/grep-recurse-submodules (2016-12-22) 12 commits
  (merged to 'next' on 2016-12-22 at 1ede815b8d)
 + grep: search history of moved submodules
 + grep: enable recurse-submodules to work on  objects
 + grep: optionally recurse into submodules
 + grep: add submodules as a grep source type
 + submodules: load gitmodules file from commit sha1
 + submodules: add helper to determine if a submodule is initialized
 + submodules: add helper to determine if a submodule is populated
  (merged to 'next' on 2016-12-22 at fea8fa870f)
 + real_path: canonicalize directory separators in root parts
 + real_path: have callers use real_pathdup and strbuf_realpath
 + real_path: create real_pathdup
 + real_path: convert real_path_internal to strbuf_realpath
 + real_path: resolve symlinks by hand
 (this branch is tangled with bw/realpath-wo-chdir.)

 "git grep" has been taught to optionally recurse into submodules.

 Will merge to 'master'.


* sb/submodule-config-tests (2017-01-12) 2 commits
 - t7411: test lookup of uninitialized submodules
 - t7411: quote URLs

 Will merge to 'next'.


* sb/submodule-doc (2017-01-12) 3 commits
 - submodules: add a background story
 - submodule update documentation: don't repeat ourselves
 - submodule documentation: add options to the subcommand

 Needs review.


* sb/submodule-embed-gitdir (2017-01-12) 1 commit
 - submodule absorbgitdirs: mention in docstring help

 Help-text fix.

 Will merge to 'next'.


* sb/submodule-init (2017-01-12) 1 commit
 - submodule update --init: display correct path from submodule

 Error message fix.

 Will merge to 'next'.


* vn/diff-ihc-config (2017-01-12) 1 commit
 - diff: add interhunk context config option

 "git diff" learned diff.interHunkContext configuration variable
 that gives the default value for its --inter-hunk-context option.

 Will merge to 'next'.


* ad/bisect-terms (2017-01-13) 1 commit
 - Documentation/bisect: improve on (bad|new) and (good|bad)

 Documentation fix.

 Will merge to 'next'.


* bw/attr (2017-01-15) 27 commits
 - attr: reformat git_attr_set_direction() function
 - attr: push the bare repo check into read_attr()
 - attr: store attribute stacks in hashmap
 - attr: tighten const correctness with git_attr and match_attr
 - attr: remove maybe-real, maybe-macro from git_attr
 - attr: eliminate global check_all_attr array
 - attr: use hashmap for attribute dictionary
 - attr: change validity check for attribute names to use positive logic
 - attr: pass struct attr_check to collect_some_attrs
 - attr: retire git_check_attrs() API
 - attr: convert git_check_attrs() callers to use the new API
 - attr: convert git_all_attrs() to use "struct attr_check"
 - attr: (re)introduce git_check_attr() and struct attr_check
 - attr: rename function and struct related to checking attributes
 - attr.c: outline the future plans by heavily commenting
 - Documentation/gitattributes.txt: fix a typo
 - attr.c: add