Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-11 Thread Joachim Durchholz

Am 11.04.2017 um 01:23 schrieb Ævar Arnfjörð Bjarmason:
> * Much of it fails due to GIT_CEILING_DIRECTORIES not working with
> dirs with ":" in the name.

Oh. That might hit me: I'm using URLs for parent directory names in a 
cache directory.


urlencode may or may not work:

> t0021-conversion.sh = 37(%), 58(:)

Even if it works, I'll have a cache directory with pretty unreadable 
names. I had hoped I could avoid this kind of ugliness.


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Jeff King
On Tue, Apr 11, 2017 at 01:23:32AM +0200, Ævar Arnfjörð Bjarmason wrote:

> * Most of the tests fail because git clone can't deal with cloning a
> repo with a \r in the path. The error we produce when we try is quite
> bad and doesn't indicate what went wrong:
> 
> $ rm -rf /tmp/git.*; mkdir /tmp/git.$(perl -e 'print chr 13') && cd
> /tmp/git.* && git init --bare b && file b && git clone b c
> /b/tialized empty Git repository in /tmp/git.
> b: directory
> Cloning into 'c'...
> fatal: '/tmp/git. /b' does not appear to be a git repository
> fatal: Could not read from remote repository.
> 
> Please make sure you have the correct access rights
> and the repository exists.
> 
> So file(1) shows it exists, a strace shows that git knows it exists at
> some point, but something gets lost along the way.

It's the round-trip through the config. It writes:

  [remote "origin"]
  url = /tmp/git.^M/b

into the config (where ^M is a literal CR). And then on reading it back,
unquoted whitespace in config values is converted to spaces, which is
why you see "git. /b" in the error message (the other ones are ugly
because it's writing the raw CR in via printf, and your terminal
respects it. Try "2>&1 | cat -A" when debugging this sort of thing (our
error and warning messages clean up cruft like this already, but
informational messages don't always go through vreportf()).

Anyway, something like this would fix it (it actually adds "\r" to the
config format, which is how we handle \n and \t; we could do it without
touching the parsing side if we taught store_write_pair() to recognize
that \r needs to go in double-quotes).

diff --git a/config.c b/config.c
index 1a4d85537..4a36a37ba 100644
--- a/config.c
+++ b/config.c
@@ -526,6 +526,9 @@ static char *parse_value(void)
case 'n':
c = '\n';
break;
+   case 'r':
+   c = '\r';
+   break;
/* Some characters escape as themselves */
case '\\': case '"':
break;
@@ -2172,6 +2175,9 @@ static int store_write_pair(int fd, const char *key, 
const char *value)
case '\t':
strbuf_addstr(, "\\t");
break;
+   case '\r':
+   strbuf_addstr(, "\\r");
+   break;
case '"':
case '\\':
strbuf_addch(, '\\');

-Peff


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Ævar Arnfjörð Bjarmason
On Sun, Apr 9, 2017 at 9:11 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Change the test library to insert non-alphanumeric ASCII characters
> into the TRASH_DIRECTORY name, that's the directory the test library
> creates, chdirs to and runs each individual test from.

I did a bit more work on getting more granularity on why things were
failing. I wrote a script[1] that runs each of the failing tests once
with each individual character in the non-alphanumeric ASCII range to
narrow down issues.

I skipped the svn tests, and something went wrong in the middle of my
run to make the http tests fail on everything, but the gist of it is:

* Most of the tests fail because git clone can't deal with cloning a
repo with a \r in the path. The error we produce when we try is quite
bad and doesn't indicate what went wrong:

$ rm -rf /tmp/git.*; mkdir /tmp/git.$(perl -e 'print chr 13') && cd
/tmp/git.* && git init --bare b && file b && git clone b c
/b/tialized empty Git repository in /tmp/git.
b: directory
Cloning into 'c'...
fatal: '/tmp/git. /b' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

So file(1) shows it exists, a strace shows that git knows it exists at
some point, but something gets lost along the way.

* Much of it fails due to GIT_CEILING_DIRECTORIES not working with
dirs with ":" in the name.

* That leaves the rest at:

$ grep FAIL results | grep -v -e svn -e svk  | awk '{print $1 " "
$2}'|perl -nE 'chomp;my ($f, $c) = split / /, $_; push @{$f{$f}} =>
$c; END { for my $k (sort keys %f) { say "$k = " . join ", ", map { $_
< 32 ? $_ : "$_(" . chr($_) . ")" } @{$f{$k}} } }'|grep -E -v
'^.{150,}$'|grep -E -v -e '= 13$' -e '= 58\(:\)$' -e '= 13, 58\(:\)$'
t0021-conversion.sh = 37(%), 58(:)
t0060-path-utils.sh = 39(')
t0302-credential-store.sh = 9, 32( ), 34("), 40((), 39('), 38(&),
41()), 59(;), 60(<), 62(>), 96(`), 124(|)
t1305-config-include.sh = 9, 13, 34("), 35(#), 59(;), 92(\), 91([)
t1500-rev-parse.sh = 9, 13
t1510-repo-setup.sh = 13, 92(\)
t3900-i18n-commit.sh = 9, 32( ), 40((), 41()), 38(&), 59(;), 60(<),
62(>), 124(|)
t4030-diff-textconv.sh = 13, 34("), 96(`)
t4031-diff-rewrite-binary.sh = 13, 34("), 96(`)
t5150-request-pull.sh = 13, 92(\)
t5310-pack-bitmaps.sh = 9, 13
t5407-post-rewrite-hook.sh = 34("), 96(`)
t5505-remote.sh = 13, 35(#)
t5516-fetch-push.sh = 13, 39(')
t5601-clone.sh = 13, 34("), 39('), 96(`)
t7003-filter-branch.sh = 34("), 96(`)
t7008-grep-binary.sh = 13, 34("), 96(`)
t7402-submodule-rebase.sh = 13, 34("), 96(`)
t7405-submodule-merge.sh = 34("), 92(\)
t7406-submodule-update.sh = 13, 9, 34("), 35(#), 38(&), 59(;), 92(\)
t7407-submodule-foreach.sh = 13, 9, 42(*)
t7504-commit-msg-hook.sh = 34("), 96(`)
t7700-repack.sh = 9, 42(*)
t9001-send-email.sh = 13, 34("), 96(`)
t9200-git-cvsexportcommit.sh = 42(*), 63(?)
t9300-fast-import.sh = 34("), 92(\)
t9400-git-cvsserver-server.sh = 13, 59(;)
t9401-git-cvsserver-crlf.sh = 13, 59(;)
t9402-git-cvsserver-refs.sh = 13, 59(;)
t9600-cvsimport.sh = 42(*), 58(:), 63(?)
t9601-cvsimport-vendor-branch.sh = 42(*), 58(:), 63(?)
t9602-cvsimport-branches-tags.sh = 42(*), 63(?), 58(:)
t9604-cvsimport-timestamps.sh = 42(*), 58(:), 63(?)
t9700-perl-git.sh = 13, 92(\)

I've fixed a few in https://github.com/avar/git/commit/55395613fe

There's one segfault in there:

$ ./t5601-clone.sh --root="xtmp.$(perl -e 'print chr 39')" -v -i -d
[...]
Cloning into 'ssh-bracket-clone-plink-4'...
Segmentation fault
not ok 45 - single quoted plink.exe in GIT_SSH_COMMAND

1. https://github.com/avar/git/commit/edc439c462


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Jeff King
On Mon, Apr 10, 2017 at 08:19:40PM +0200, Joachim Durchholz wrote:

> > I very much disagree with that. Git's test operate under a set of
> > assumptions, and if you violate those assumptions, then the failures are
> > not meaningful.
> 
> In that case the tests do not validate that git can properly work with
> special characters.
> That's a pretty big coverage gap.

That's not necessarily true either; there may be specific tests that
create exotic paths and check them. My point is that the outcome depends
on that paths. So you cannot just take a test which runs "git clone -s"
and expect it to work both with and without paths with newlines. You
need two tests, because there are two different outcomes, depending on
the test environment.

So if you're proposing to write a bunch of new tests that check the
proper behavior under various conditions, go for it. But I don't think
running the entire existing test suite with exotic paths tells you much.
A failure might be a bug, or it might be that the thing is untestable
given the environment.

> > Sure, and I'd encourage people who are interested to dig through the
> > results and see if they can find a real problem. I looked at several and
> > didn't find anything that wasn't an example of the "test assumptions"
> > thing above.
> 
> Don't assume that there's no risk just because you didn't find anything.

I'm not assuming that at all. Didn't I say somebody would need to dig
into all of these to find out the real answer? I'm only arguing that
blindly adding this feature to the test suite has no value. It's the
digging that has value, and you do not even need to modify the test
suite to do it (you can just use --root).

I've been trying to invite you to do that digging, if it's something you
care about.

-Peff


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Joachim Durchholz

Am 10.04.2017 um 18:57 schrieb Jeff King:

If there are security bugs where a malicious input can cause us
to do something bad, that's something to care about. But that's very
different than asking "do these tests run to completion with a funny
input".


If the tests do not complete, git is doing something unexpected.


I very much disagree with that. Git's test operate under a set of
assumptions, and if you violate those assumptions, then the failures are
not meaningful.


In that case the tests do not validate that git can properly work with 
special characters.

That's a pretty big coverage gap.


Take alternates, for instance. The on-disk storage format cannot
represent paths with newlines in them. If a test does:

  git clone -s "$(pwd)" parent.git child &&
  test -d child

then that test is going to fail if the test directory has a newline in
it. But that doesn't tell us anything meaningful. Maybe there is a bug
and maybe there isn't, but we cannot know because the thing being tested
cannot possibly work under the test environment given.


Sure. Not all tests are meaningful in all environments.
That doesn't mean that the tests are generally meaningless.

Also, we're talking about two pretty different things here: gits 
interaction with the file system, and git's interaction with whatever 
shell its scripts are using.
In an ideal world, these two aspects would be orthogonal and could be 
tested independently of each other. Since in practice we do have 
correlations and dependencies, this isn't always possible, but it's what 
we should aim for.



You can rewrite all the tests to say "skip this test if there's a
newline in the test directory". But to what end? It's work to write and
to maintain, and we haven't gained new information.


Not on that "alternates" thing (whatever that is), but we have a test 
that will work and provide information on systems that do allow newlines.



That in itself is not a security hole, but there's a pretty good chance that
at least one of these ~230 unexpected things can be turned into one, given
enough time and motivation. The risk multiplies as this is shell scripting,
where the path from "string is misinterpreted" to "string is run as a
command" is considerably shorter than in other languages.


Sure, and I'd encourage people who are interested to dig through the
results and see if they can find a real problem. I looked at several and
didn't find anything that wasn't an example of the "test assumptions"
thing above.


Don't assume that there's no risk just because you didn't find anything.

Also, git might not be the actual hole, but other software that relies 
on git not doing anything awkward might fail that assumption and expose 
the actual breach of security.
You could argue that it's not a problem in git, but it is. Unless and 
until the git docs clearly state that things may break if "funny 
characters" are being used, and where.



But again, I'm happy to be proven wrong.


With security, you need to be confident about the absence of /any/ type 
of hole, not the absence of a /specific/ type of hole such as a shell 
injection.
So the way forward isn't proving you wrong by providing a specific 
exploit, it's making sure that no exploit with URLs and file names can 
possibly exist.


Now if I'm reading things like "heuristics" and "git uses URL-specific 
code even after it has determined it's not a URL"... well, that's the 
exact opposite of reassuring messages.


> I just don't think

plastering control characters into the test directory names all the time
is a good way of finding those problems (and doesn't balance out the
cost).


Fair enough.


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Jeff King
On Mon, Apr 10, 2017 at 04:59:57PM +0200, Joachim Durchholz wrote:

> Am 10.04.2017 um 15:38 schrieb Jeff King:
> > Are those bugs? Maybe. Certainly they are limitations. But are they ones
> > anybody _cares_ about?  I think this may fall under "if it hurts, don't
> > do it".
> 
> It's not always possible to avoid that.

Sort of. I don't find anything wrong with saying "your local filesystem
path for a repository cannot contain newlines; if it does, some features
may be unavailable".

> URLs, for example, may contain "funny characters", including multi-byte
> characters of which the second byte is 0x0a. If they are guaranteed to
> always be URL-encoded this isn't a problem, but then we still need to make
> sure that URL-encoding does happen.

Sure, but URLs have a way of encoding. And if we're not encoding when we
should, then that's a bug. But the arguments that are fed to things like
git-clone _aren't_ URLs. They're a specifier that uses some heuristics
to decide between the various cases (URLs, host:path specifiers, local
paths). If you feed syntactic garbage, aborting the operation (and
failing the test!) may be the right thing for git to do.

> > If there are security bugs where a malicious input can cause us
> > to do something bad, that's something to care about. But that's very
> > different than asking "do these tests run to completion with a funny
> > input".
> 
> If the tests do not complete, git is doing something unexpected.

I very much disagree with that. Git's test operate under a set of
assumptions, and if you violate those assumptions, then the failures are
not meaningful.

Take alternates, for instance. The on-disk storage format cannot
represent paths with newlines in them. If a test does:

  git clone -s "$(pwd)" parent.git child &&
  test -d child

then that test is going to fail if the test directory has a newline in
it. But that doesn't tell us anything meaningful. Maybe there is a bug
and maybe there isn't, but we cannot know because the thing being tested
cannot possibly work under the test environment given.

You can rewrite all the tests to say "skip this test if there's a
newline in the test directory". But to what end? It's work to write and
to maintain, and we haven't gained new information.

> That in itself is not a security hole, but there's a pretty good chance that
> at least one of these ~230 unexpected things can be turned into one, given
> enough time and motivation. The risk multiplies as this is shell scripting,
> where the path from "string is misinterpreted" to "string is run as a
> command" is considerably shorter than in other languages.

Sure, and I'd encourage people who are interested to dig through the
results and see if they can find a real problem. I looked at several and
didn't find anything that wasn't an example of the "test assumptions"
thing above.

I'll actually be surprised if there are shell injection problems in Git,
because our scripts are usually pretty meticulous about quoting
variables and not doing crazy things with eval. I think the issues are
much more likely to be like the "submodule--helper --list" thing I
pointed out, where we get phantom records in lists.

But again, I'm happy to be proven wrong. If there's a shell injection in
Git it's clearly a bug and should be fixed. I just don't think
plastering control characters into the test directory names all the time
is a good way of finding those problems (and doesn't balance out the
cost). Fuzzing the directory names and digging into specific cases _is_
a reasonable way to do it.

-Peff


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Joachim Durchholz

Am 10.04.2017 um 15:38 schrieb Jeff King:

Are those bugs? Maybe. Certainly they are limitations. But are they ones
anybody _cares_ about?  I think this may fall under "if it hurts, don't
do it".


It's not always possible to avoid that.

URLs, for example, may contain "funny characters", including multi-byte 
characters of which the second byte is 0x0a. If they are guaranteed to 
always be URL-encoded this isn't a problem, but then we still need to 
make sure that URL-encoding does happen.


Next source of funny characters that comes to my mind is submodules. 
They derive their name from the URL by default, and the subdirectory 
name as well. Again, consider the multibyte name where the second 
character is 0x0a. Or 0x80: À (uppercase A with accent grave) happens to 
have that byte in UTF-8 encoding, Ẁ is U+1E80 which would be encoded as 
0x80 0x1e on an NTFS filesystem (barring additional coding steps in APIs 
or webservices, which further complicate the situation but don't usually 
eliminate the problem, they just shift it around).


> If there are security bugs where a malicious input can cause us

to do something bad, that's something to care about. But that's very
different than asking "do these tests run to completion with a funny
input".


If the tests do not complete, git is doing something unexpected.
That in itself is not a security hole, but there's a pretty good chance 
that at least one of these ~230 unexpected things can be turned into 
one, given enough time and motivation. The risk multiplies as this is 
shell scripting, where the path from "string is misinterpreted" to 
"string is run as a command" is considerably shorter than in other 
languages.


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread SZEDER Gábor
On Mon, Apr 10, 2017 at 1:40 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Mon, Apr 10, 2017 at 1:19 PM, SZEDER Gábor  wrote:

>> A few other failures are triggered by the ':' in the trash directory's
>> name, breaking the following commonly used pattern:
>>
>>   export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" &&
>>   cd subdir &&
>>   test-git-pretending-it's-run-outside-of-a-repository
>
> Does GIT_CEILING_DIRECTORIES support escaping somehow? E.g.
> "foo\:bar". If so maybe we could use a wrapper to set it, if not
> that's a bug in the ceiling dir feature, surely.

It doesn't, these $PATH-style variables in git, in the shell or
elsewhere tend not to.


>>> b) I think any sort of magic like using it with 'make test', but not
>>> when the *.sh is manually run, will just lead to frustrating seemingly
>>> heisenbugs from people trying to debug the test suite when things do
>>> fail, i.e. you run 'make test' on some obscure platform we haven't
>>> fixed path bugs on, 10 fail, you manually inspect them and every one
>>> of them succeeds, because some --use-garbage-dirs option wasn't
>>> passed.
>>
>> That's not really an issue.  When a test fails during 'make test' with
>> garbage in trash dir names, the dev comes and attempts to cd into the
>> trash dir, and will be instantly reminded that non-printable
>> characters might play a role in the failure when he can't do so with
>> ordinary means.
>
> When a test fails for me I cd to t/ and re-run the test *.sh manually.
> I don't go straight to inspecting the existing trash.
>
> If those manual invocations were running in some different mode &
> succeeded that would be very confusing.

On the contrary, that's a clue to where you might want to look.
Besides, this is already the case when someone sets some options in
GIT_TEST_OPTS in his config.mak.


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Jeff King
On Mon, Apr 10, 2017 at 01:40:13PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > A few other failures are triggered by the ':' in the trash directory's
> > name, breaking the following commonly used pattern:
> >
> >   export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" &&
> >   cd subdir &&
> >   test-git-pretending-it's-run-outside-of-a-repository
> 
> Does GIT_CEILING_DIRECTORIES support escaping somehow? E.g.
> "foo\:bar". If so maybe we could use a wrapper to set it, if not
> that's a bug in the ceiling dir feature, surely.

I don't think it does. But nor does $PATH (which is almost certainly
another source of breakage). And neither does info/alternates allow
pathnames with newline in them (that is the likely cause of a large
number of failures, as it hits anywhere we use "clone -s").

Are those bugs? Maybe. Certainly they are limitations. But are they ones
anybody _cares_ about?  I think this may fall under "if it hurts, don't
do it". If there are security bugs where a malicious input can cause us
to do something bad, that's something to care about. But that's very
different than asking "do these tests run to completion with a funny
input".

  As an aside, I'd also question whether your patch might actually
  _hide_ bugs. It's applying a blanket change to the on-disk state that
  is obviously breaking some features. How many of those breakages are
  things that show up as a test failure, and how many of them quietly
  cause a test to do something else entirely, like failing its setup in
  a way that makes the rest of the test a noop?

So given the pain this will cause to people actually looking at tests,
and given that it's not clear to me if it has or will find any actual
bugs, it seems premature to flip it on by default. If somebody wants to
actually dig into these cases and look for actual bugs, I'm all for it.

But flipping the default and marking a bunch of tests blindly as "well,
this fails" hasn't made the world a better place. It's made it slightly
worse.

-Peff

PS If you want to test various characters, I think the simplest way is
   just:

 make GIT_TEST_OPTS=--root=/tmp/look-a-col:on test


Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 10, 2017 at 1:19 PM, SZEDER Gábor  wrote:
> On Mon, Apr 10, 2017 at 10:02 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> On Mon, Apr 10, 2017 at 3:47 AM, SZEDER Gábor  wrote:
 Change the test library to insert non-alphanumeric ASCII characters
 into the TRASH_DIRECTORY name, that's the directory the test library
 creates, chdirs to and runs each individual test from.

 Unless test_fails_on_unusual_directory_names=1 is declared before
 importing test-lib.sh (and if perl isn't available on the system), the
 trash directory will contain every non-alphanumeric character in
 ASCII, in order.
>>>
>>> At the very least there must be an easier way to disable this, e.g. a
>>> command line option.
>>>
>>> This change is sure effective in smoking out bugs, but it's a major
>>> annoyance during development when it comes to debugging a test.  At
>>> first I could not even cd into the trash directory, because TAB
>>> completing the directory name with all those non-printable characters
>>> didn't work (this may be a bug in the bash-completion package).  And
>>> simply copy-pasting the dirname didn't work either, because 'ls'
>>>
>>>   trash directory.t9902-completion.??? 
>>> !"#$%&'()*+,-:;<=>?@[\]^_`{|}~?
>
> Btw, it seems most of the failures in t9902-completion are triggered
> by remote URL parsing.  The trash directory's new name contains '[',
> ']' and even "@[", all of which are treated special by
> connect.c:host_end(), a helper function of parse_connect_url(),
> basically breaking anything trying to e.g.:
>
>   git fetch "$(pwd)/other"

I'm going to work on this patch so that I can report on tests by type
of character that triggers a failure.

> What puzzles me most is that parse_connect_url() recognizes right at
> its beginning that a remote URL like this is not actually an URL, so
> why does it continue parsing it as if it were one?
>
> A few other failures are triggered by the ':' in the trash directory's
> name, breaking the following commonly used pattern:
>
>   export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" &&
>   cd subdir &&
>   test-git-pretending-it's-run-outside-of-a-repository

Does GIT_CEILING_DIRECTORIES support escaping somehow? E.g.
"foo\:bar". If so maybe we could use a wrapper to set it, if not
that's a bug in the ceiling dir feature, surely.

> I think ':' should therefore be excluded from the trash directory, too.

I think it's preferable to have some mode to use : in dirnames for
those tests that don't fail already, to protect them against future
regressions. Disabling the use of a tricky character like ":" invites
future bugs & regressions.

>>> After some headscratching, Sunday night may be my excuse, I figured
>>> out that 'cd tr*' works...  only to be greeted with the ugliest-ever
>>> three-line(!) shell prompt.
>>>
>>> Therefore I would say that this should not even be enabled by default
>>> in test-lib.sh, so at least running a test directly from the command
>>> line as ./t1234-foo.sh would considerately give us an easily
>>> accessible trash directory even without any command line options.  We
>>> could enable it for 'make test' by default via GIT_TEST_OPTS in
>>> t/Makefile, though.
>>
>> This definitely needs some tweaking as you and Joachim point out. E.g.
>> some capabilities check in the test suite to check if we can even
>> create these sorts of paths on the local filesystem.
>>
>> A couple of comments on the above though:
>>
>> a) If we have something that's a more strict mode that makes tests
>> fail due to buggy code in various scenarios, we gain the most from
>> having it on by default
>
> I know, and I basically agree...
>
>> and having some optional mode to have devs
>> e.g. disable it for manual inspection of the test directories.
>
> ... but this is just too gross to live as default outside of a CI
> environment.
>
>> Most of the running of the test suite that really matters, i.e. just
>> before the software is delivered to end users, is going to be running
>> in some non-interactive build system preparing a package.
>>
>> b) I think any sort of magic like using it with 'make test', but not
>> when the *.sh is manually run, will just lead to frustrating seemingly
>> heisenbugs from people trying to debug the test suite when things do
>> fail, i.e. you run 'make test' on some obscure platform we haven't
>> fixed path bugs on, 10 fail, you manually inspect them and every one
>> of them succeeds, because some --use-garbage-dirs option wasn't
>> passed.
>
> That's not really an issue.  When a test fails during 'make test' with
> garbage in trash dir names, the dev comes and attempts to cd into the
> trash dir, and will be instantly reminded that non-printable
> characters might play a role in the failure when he can't do so with
> ordinary means.

When a test fails for me I cd to t/ and re-run the test *.sh manually.
I don't go straight to 

Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread SZEDER Gábor
On Mon, Apr 10, 2017 at 10:02 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Mon, Apr 10, 2017 at 3:47 AM, SZEDER Gábor  wrote:
>>> Change the test library to insert non-alphanumeric ASCII characters
>>> into the TRASH_DIRECTORY name, that's the directory the test library
>>> creates, chdirs to and runs each individual test from.
>>>
>>> Unless test_fails_on_unusual_directory_names=1 is declared before
>>> importing test-lib.sh (and if perl isn't available on the system), the
>>> trash directory will contain every non-alphanumeric character in
>>> ASCII, in order.
>>
>> At the very least there must be an easier way to disable this, e.g. a
>> command line option.
>>
>> This change is sure effective in smoking out bugs, but it's a major
>> annoyance during development when it comes to debugging a test.  At
>> first I could not even cd into the trash directory, because TAB
>> completing the directory name with all those non-printable characters
>> didn't work (this may be a bug in the bash-completion package).  And
>> simply copy-pasting the dirname didn't work either, because 'ls'
>>
>>   trash directory.t9902-completion.??? 
>> !"#$%&'()*+,-:;<=>?@[\]^_`{|}~?

Btw, it seems most of the failures in t9902-completion are triggered
by remote URL parsing.  The trash directory's new name contains '[',
']' and even "@[", all of which are treated special by
connect.c:host_end(), a helper function of parse_connect_url(),
basically breaking anything trying to e.g.:

  git fetch "$(pwd)/other"

What puzzles me most is that parse_connect_url() recognizes right at
its beginning that a remote URL like this is not actually an URL, so
why does it continue parsing it as if it were one?

A few other failures are triggered by the ':' in the trash directory's
name, breaking the following commonly used pattern:

  export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" &&
  cd subdir &&
  test-git-pretending-it's-run-outside-of-a-repository

I think ':' should therefore be excluded from the trash directory, too.

>> After some headscratching, Sunday night may be my excuse, I figured
>> out that 'cd tr*' works...  only to be greeted with the ugliest-ever
>> three-line(!) shell prompt.
>>
>> Therefore I would say that this should not even be enabled by default
>> in test-lib.sh, so at least running a test directly from the command
>> line as ./t1234-foo.sh would considerately give us an easily
>> accessible trash directory even without any command line options.  We
>> could enable it for 'make test' by default via GIT_TEST_OPTS in
>> t/Makefile, though.
>
> This definitely needs some tweaking as you and Joachim point out. E.g.
> some capabilities check in the test suite to check if we can even
> create these sorts of paths on the local filesystem.
>
> A couple of comments on the above though:
>
> a) If we have something that's a more strict mode that makes tests
> fail due to buggy code in various scenarios, we gain the most from
> having it on by default

I know, and I basically agree...

> and having some optional mode to have devs
> e.g. disable it for manual inspection of the test directories.

... but this is just too gross to live as default outside of a CI
environment.

> Most of the running of the test suite that really matters, i.e. just
> before the software is delivered to end users, is going to be running
> in some non-interactive build system preparing a package.
>
> b) I think any sort of magic like using it with 'make test', but not
> when the *.sh is manually run, will just lead to frustrating seemingly
> heisenbugs from people trying to debug the test suite when things do
> fail, i.e. you run 'make test' on some obscure platform we haven't
> fixed path bugs on, 10 fail, you manually inspect them and every one
> of them succeeds, because some --use-garbage-dirs option wasn't
> passed.

That's not really an issue.  When a test fails during 'make test' with
garbage in trash dir names, the dev comes and attempts to cd into the
trash dir, and will be instantly reminded that non-printable
characters might play a role in the failure when he can't do so with
ordinary means.

>>> This includes all the control characters, !, [], {} etc. the "."
>>> character isn't included because it's already in the directory name,
>>> and nor is "/" for obvious reasons, although that would actually work,
>>> we'd just create a subdirectory, which would make the tests harder to
>>> inspect when they fail.i
>>
>> 1. Heh.  How an additional subdirectory would make the tests harder to
>>inspect is nothing compared to the effect of all the other
>>characters.
>>
>> 2. s/i$//
>>
>>> This change is inspired by the "submodule: prevent backslash expantion
>>> in submodule names" patch[1]. If we'd had backslashes in the
>>> TRASH_DIRECTORY all along that bug would have been fixed a long time
>>> ago. This will flag such issues by marking tests that currently fail
>>> with 

Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-10 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 10, 2017 at 3:47 AM, SZEDER Gábor  wrote:
>> Change the test library to insert non-alphanumeric ASCII characters
>> into the TRASH_DIRECTORY name, that's the directory the test library
>> creates, chdirs to and runs each individual test from.
>>
>> Unless test_fails_on_unusual_directory_names=1 is declared before
>> importing test-lib.sh (and if perl isn't available on the system), the
>> trash directory will contain every non-alphanumeric character in
>> ASCII, in order.
>
> At the very least there must be an easier way to disable this, e.g. a
> command line option.
>
> This change is sure effective in smoking out bugs, but it's a major
> annoyance during development when it comes to debugging a test.  At
> first I could not even cd into the trash directory, because TAB
> completing the directory name with all those non-printable characters
> didn't work (this may be a bug in the bash-completion package).  And
> simply copy-pasting the dirname didn't work either, because 'ls'
>
>   trash directory.t9902-completion.??? 
> !"#$%&'()*+,-:;<=>?@[\]^_`{|}~?
>
> After some headscratching, Sunday night may be my excuse, I figured
> out that 'cd tr*' works...  only to be greeted with the ugliest-ever
> three-line(!) shell prompt.
>
> Therefore I would say that this should not even be enabled by default
> in test-lib.sh, so at least running a test directly from the command
> line as ./t1234-foo.sh would considerately give us an easily
> accessible trash directory even without any command line options.  We
> could enable it for 'make test' by default via GIT_TEST_OPTS in
> t/Makefile, though.

This definitely needs some tweaking as you and Joachim point out. E.g.
some capabilities check in the test suite to check if we can even
create these sorts of paths on the local filesystem.

A couple of comments on the above though:

a) If we have something that's a more strict mode that makes tests
fail due to buggy code in various scenarios, we gain the most from
having it on by default, and having some optional mode to have devs
e.g. disable it for manual inspection of the test directories.

Most of the running of the test suite that really matters, i.e. just
before the software is delivered to end users, is going to be running
in some non-interactive build system preparing a package.

b) I think any sort of magic like using it with 'make test', but not
when the *.sh is manually run, will just lead to frustrating seemingly
heisenbugs from people trying to debug the test suite when things do
fail, i.e. you run 'make test' on some obscure platform we haven't
fixed path bugs on, 10 fail, you manually inspect them and every one
of them succeeds, because some --use-garbage-dirs option wasn't
passed.

>> This includes all the control characters, !, [], {} etc. the "."
>> character isn't included because it's already in the directory name,
>> and nor is "/" for obvious reasons, although that would actually work,
>> we'd just create a subdirectory, which would make the tests harder to
>> inspect when they fail.i
>
> 1. Heh.  How an additional subdirectory would make the tests harder to
>inspect is nothing compared to the effect of all the other
>characters.
>
> 2. s/i$//
>
>> This change is inspired by the "submodule: prevent backslash expantion
>> in submodule names" patch[1]. If we'd had backslashes in the
>> TRASH_DIRECTORY all along that bug would have been fixed a long time
>> ago. This will flag such issues by marking tests that currently fail
>> with "test_fails_on_unusual_directory_names=1", ensure that new tests
>> aren't added unless a discussion is had about why the code can't
>> handle unusual pathnames, and prevent future regressions.
>>
>> 1. <20170407172306.172673-1-bmw...@google.com>
>> ---
>>  t/README  | 12 
>>  t/test-lib.sh |  4 
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/t/README b/t/README
>> index ab386c3681..314dd40221 100644
>> --- a/t/README
>> +++ b/t/README
>> @@ -345,6 +345,18 @@ assignment to variable 'test_description', like this:
>>   This test registers the following structure in the cache
>>   and tries to run git-ls-files with option --frotz.'
>>
>> +By default the tests will be run from a directory with a highly
>> +unusual filename that includes control characters, a newline, various
>> +punctuation etc., this is done to smoke out any bugs related to path
>> +handling. If for whatever reason the tests can't deal with such
>> +unusual path names, set:
>> +
>> +test_fails_on_unusual_directory_names=1
>> +
>> +Before sourcing 'test-lib.sh' as described below. This option is
>> +mainly intended to grandfather in existing broken tests & code, and
>> +should usually not be used in new code, instead your tests or code
>> +probably need fixing.
>>
>>  Source 'test-lib.sh'
>>  
>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>> index 13b5696822..089ff5ac7d 100644
>> --- 

Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name

2017-04-09 Thread SZEDER Gábor
> Change the test library to insert non-alphanumeric ASCII characters
> into the TRASH_DIRECTORY name, that's the directory the test library
> creates, chdirs to and runs each individual test from.
> 
> Unless test_fails_on_unusual_directory_names=1 is declared before
> importing test-lib.sh (and if perl isn't available on the system), the
> trash directory will contain every non-alphanumeric character in
> ASCII, in order.

At the very least there must be an easier way to disable this, e.g. a
command line option.

This change is sure effective in smoking out bugs, but it's a major
annoyance during development when it comes to debugging a test.  At
first I could not even cd into the trash directory, because TAB
completing the directory name with all those non-printable characters
didn't work (this may be a bug in the bash-completion package).  And
simply copy-pasting the dirname didn't work either, because 'ls'
displayed it as

  trash directory.t9902-completion.??? 
!"#$%&'()*+,-:;<=>?@[\]^_`{|}~?

After some headscratching, Sunday night may be my excuse, I figured
out that 'cd tr*' works...  only to be greeted with the ugliest-ever
three-line(!) shell prompt.

Therefore I would say that this should not even be enabled by default
in test-lib.sh, so at least running a test directly from the command
line as ./t1234-foo.sh would considerately give us an easily
accessible trash directory even without any command line options.  We
could enable it for 'make test' by default via GIT_TEST_OPTS in
t/Makefile, though.


> This includes all the control characters, !, [], {} etc. the "."
> character isn't included because it's already in the directory name,
> and nor is "/" for obvious reasons, although that would actually work,
> we'd just create a subdirectory, which would make the tests harder to
> inspect when they fail.i

1. Heh.  How an additional subdirectory would make the tests harder to
   inspect is nothing compared to the effect of all the other
   characters.

2. s/i$//

> This change is inspired by the "submodule: prevent backslash expantion
> in submodule names" patch[1]. If we'd had backslashes in the
> TRASH_DIRECTORY all along that bug would have been fixed a long time
> ago. This will flag such issues by marking tests that currently fail
> with "test_fails_on_unusual_directory_names=1", ensure that new tests
> aren't added unless a discussion is had about why the code can't
> handle unusual pathnames, and prevent future regressions.
> 
> 1. <20170407172306.172673-1-bmw...@google.com>
> ---
>  t/README  | 12 
>  t/test-lib.sh |  4 
>  2 files changed, 16 insertions(+)
> 
> diff --git a/t/README b/t/README
> index ab386c3681..314dd40221 100644
> --- a/t/README
> +++ b/t/README
> @@ -345,6 +345,18 @@ assignment to variable 'test_description', like this:
>   This test registers the following structure in the cache
>   and tries to run git-ls-files with option --frotz.'
>  
> +By default the tests will be run from a directory with a highly
> +unusual filename that includes control characters, a newline, various
> +punctuation etc., this is done to smoke out any bugs related to path
> +handling. If for whatever reason the tests can't deal with such
> +unusual path names, set:
> +
> +test_fails_on_unusual_directory_names=1
> +
> +Before sourcing 'test-lib.sh' as described below. This option is
> +mainly intended to grandfather in existing broken tests & code, and
> +should usually not be used in new code, instead your tests or code
> +probably need fixing.
> 
>  Source 'test-lib.sh'
>  
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 13b5696822..089ff5ac7d 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -914,6 +914,10 @@ fi
>  
>  # Test repository
>  TRASH_DIRECTORY="trash directory.$(basename "$0" .sh)"
> +if test -z "$test_fails_on_unusual_directory_names" -a "$(perl -e 'print 
> 1+1' 2>/dev/null)" = "2"
> +then
> +   TRASH_DIRECTORY="$TRASH_DIRECTORY.$(perl -e 'print join q[], grep { 
> /[^[:alnum:]]/ and !m<[./]> } map chr, 0x01..0x7f')"
> +fi
>  test -n "$root" && TRASH_DIRECTORY="$root/$TRASH_DIRECTORY"
>  case "$TRASH_DIRECTORY" in
>  /*) ;; # absolute path is good
> -- 
> 2.11.0