Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
Am 11.04.2017 um 01:23 schrieb Ævar Arnfjörð Bjarmason: > * Much of it fails due to GIT_CEILING_DIRECTORIES not working with > dirs with ":" in the name. Oh. That might hit me: I'm using URLs for parent directory names in a cache directory. urlencode may or may not work: > t0021-conversion.sh = 37(%), 58(:) Even if it works, I'll have a cache directory with pretty unreadable names. I had hoped I could avoid this kind of ugliness.
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Tue, Apr 11, 2017 at 01:23:32AM +0200, Ævar Arnfjörð Bjarmason wrote: > * Most of the tests fail because git clone can't deal with cloning a > repo with a \r in the path. The error we produce when we try is quite > bad and doesn't indicate what went wrong: > > $ rm -rf /tmp/git.*; mkdir /tmp/git.$(perl -e 'print chr 13') && cd > /tmp/git.* && git init --bare b && file b && git clone b c > /b/tialized empty Git repository in /tmp/git. > b: directory > Cloning into 'c'... > fatal: '/tmp/git. /b' does not appear to be a git repository > fatal: Could not read from remote repository. > > Please make sure you have the correct access rights > and the repository exists. > > So file(1) shows it exists, a strace shows that git knows it exists at > some point, but something gets lost along the way. It's the round-trip through the config. It writes: [remote "origin"] url = /tmp/git.^M/b into the config (where ^M is a literal CR). And then on reading it back, unquoted whitespace in config values is converted to spaces, which is why you see "git. /b" in the error message (the other ones are ugly because it's writing the raw CR in via printf, and your terminal respects it. Try "2>&1 | cat -A" when debugging this sort of thing (our error and warning messages clean up cruft like this already, but informational messages don't always go through vreportf()). Anyway, something like this would fix it (it actually adds "\r" to the config format, which is how we handle \n and \t; we could do it without touching the parsing side if we taught store_write_pair() to recognize that \r needs to go in double-quotes). diff --git a/config.c b/config.c index 1a4d85537..4a36a37ba 100644 --- a/config.c +++ b/config.c @@ -526,6 +526,9 @@ static char *parse_value(void) case 'n': c = '\n'; break; + case 'r': + c = '\r'; + break; /* Some characters escape as themselves */ case '\\': case '"': break; @@ -2172,6 +2175,9 @@ static int store_write_pair(int fd, const char *key, const char *value) case '\t': strbuf_addstr(, "\\t"); break; + case '\r': + strbuf_addstr(, "\\r"); + break; case '"': case '\\': strbuf_addch(, '\\'); -Peff
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Sun, Apr 9, 2017 at 9:11 PM, Ævar Arnfjörð Bjarmasonwrote: > Change the test library to insert non-alphanumeric ASCII characters > into the TRASH_DIRECTORY name, that's the directory the test library > creates, chdirs to and runs each individual test from. I did a bit more work on getting more granularity on why things were failing. I wrote a script[1] that runs each of the failing tests once with each individual character in the non-alphanumeric ASCII range to narrow down issues. I skipped the svn tests, and something went wrong in the middle of my run to make the http tests fail on everything, but the gist of it is: * Most of the tests fail because git clone can't deal with cloning a repo with a \r in the path. The error we produce when we try is quite bad and doesn't indicate what went wrong: $ rm -rf /tmp/git.*; mkdir /tmp/git.$(perl -e 'print chr 13') && cd /tmp/git.* && git init --bare b && file b && git clone b c /b/tialized empty Git repository in /tmp/git. b: directory Cloning into 'c'... fatal: '/tmp/git. /b' does not appear to be a git repository fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. So file(1) shows it exists, a strace shows that git knows it exists at some point, but something gets lost along the way. * Much of it fails due to GIT_CEILING_DIRECTORIES not working with dirs with ":" in the name. * That leaves the rest at: $ grep FAIL results | grep -v -e svn -e svk | awk '{print $1 " " $2}'|perl -nE 'chomp;my ($f, $c) = split / /, $_; push @{$f{$f}} => $c; END { for my $k (sort keys %f) { say "$k = " . join ", ", map { $_ < 32 ? $_ : "$_(" . chr($_) . ")" } @{$f{$k}} } }'|grep -E -v '^.{150,}$'|grep -E -v -e '= 13$' -e '= 58\(:\)$' -e '= 13, 58\(:\)$' t0021-conversion.sh = 37(%), 58(:) t0060-path-utils.sh = 39(') t0302-credential-store.sh = 9, 32( ), 34("), 40((), 39('), 38(&), 41()), 59(;), 60(<), 62(>), 96(`), 124(|) t1305-config-include.sh = 9, 13, 34("), 35(#), 59(;), 92(\), 91([) t1500-rev-parse.sh = 9, 13 t1510-repo-setup.sh = 13, 92(\) t3900-i18n-commit.sh = 9, 32( ), 40((), 41()), 38(&), 59(;), 60(<), 62(>), 124(|) t4030-diff-textconv.sh = 13, 34("), 96(`) t4031-diff-rewrite-binary.sh = 13, 34("), 96(`) t5150-request-pull.sh = 13, 92(\) t5310-pack-bitmaps.sh = 9, 13 t5407-post-rewrite-hook.sh = 34("), 96(`) t5505-remote.sh = 13, 35(#) t5516-fetch-push.sh = 13, 39(') t5601-clone.sh = 13, 34("), 39('), 96(`) t7003-filter-branch.sh = 34("), 96(`) t7008-grep-binary.sh = 13, 34("), 96(`) t7402-submodule-rebase.sh = 13, 34("), 96(`) t7405-submodule-merge.sh = 34("), 92(\) t7406-submodule-update.sh = 13, 9, 34("), 35(#), 38(&), 59(;), 92(\) t7407-submodule-foreach.sh = 13, 9, 42(*) t7504-commit-msg-hook.sh = 34("), 96(`) t7700-repack.sh = 9, 42(*) t9001-send-email.sh = 13, 34("), 96(`) t9200-git-cvsexportcommit.sh = 42(*), 63(?) t9300-fast-import.sh = 34("), 92(\) t9400-git-cvsserver-server.sh = 13, 59(;) t9401-git-cvsserver-crlf.sh = 13, 59(;) t9402-git-cvsserver-refs.sh = 13, 59(;) t9600-cvsimport.sh = 42(*), 58(:), 63(?) t9601-cvsimport-vendor-branch.sh = 42(*), 58(:), 63(?) t9602-cvsimport-branches-tags.sh = 42(*), 63(?), 58(:) t9604-cvsimport-timestamps.sh = 42(*), 58(:), 63(?) t9700-perl-git.sh = 13, 92(\) I've fixed a few in https://github.com/avar/git/commit/55395613fe There's one segfault in there: $ ./t5601-clone.sh --root="xtmp.$(perl -e 'print chr 39')" -v -i -d [...] Cloning into 'ssh-bracket-clone-plink-4'... Segmentation fault not ok 45 - single quoted plink.exe in GIT_SSH_COMMAND 1. https://github.com/avar/git/commit/edc439c462
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 08:19:40PM +0200, Joachim Durchholz wrote: > > I very much disagree with that. Git's test operate under a set of > > assumptions, and if you violate those assumptions, then the failures are > > not meaningful. > > In that case the tests do not validate that git can properly work with > special characters. > That's a pretty big coverage gap. That's not necessarily true either; there may be specific tests that create exotic paths and check them. My point is that the outcome depends on that paths. So you cannot just take a test which runs "git clone -s" and expect it to work both with and without paths with newlines. You need two tests, because there are two different outcomes, depending on the test environment. So if you're proposing to write a bunch of new tests that check the proper behavior under various conditions, go for it. But I don't think running the entire existing test suite with exotic paths tells you much. A failure might be a bug, or it might be that the thing is untestable given the environment. > > Sure, and I'd encourage people who are interested to dig through the > > results and see if they can find a real problem. I looked at several and > > didn't find anything that wasn't an example of the "test assumptions" > > thing above. > > Don't assume that there's no risk just because you didn't find anything. I'm not assuming that at all. Didn't I say somebody would need to dig into all of these to find out the real answer? I'm only arguing that blindly adding this feature to the test suite has no value. It's the digging that has value, and you do not even need to modify the test suite to do it (you can just use --root). I've been trying to invite you to do that digging, if it's something you care about. -Peff
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
Am 10.04.2017 um 18:57 schrieb Jeff King: If there are security bugs where a malicious input can cause us to do something bad, that's something to care about. But that's very different than asking "do these tests run to completion with a funny input". If the tests do not complete, git is doing something unexpected. I very much disagree with that. Git's test operate under a set of assumptions, and if you violate those assumptions, then the failures are not meaningful. In that case the tests do not validate that git can properly work with special characters. That's a pretty big coverage gap. Take alternates, for instance. The on-disk storage format cannot represent paths with newlines in them. If a test does: git clone -s "$(pwd)" parent.git child && test -d child then that test is going to fail if the test directory has a newline in it. But that doesn't tell us anything meaningful. Maybe there is a bug and maybe there isn't, but we cannot know because the thing being tested cannot possibly work under the test environment given. Sure. Not all tests are meaningful in all environments. That doesn't mean that the tests are generally meaningless. Also, we're talking about two pretty different things here: gits interaction with the file system, and git's interaction with whatever shell its scripts are using. In an ideal world, these two aspects would be orthogonal and could be tested independently of each other. Since in practice we do have correlations and dependencies, this isn't always possible, but it's what we should aim for. You can rewrite all the tests to say "skip this test if there's a newline in the test directory". But to what end? It's work to write and to maintain, and we haven't gained new information. Not on that "alternates" thing (whatever that is), but we have a test that will work and provide information on systems that do allow newlines. That in itself is not a security hole, but there's a pretty good chance that at least one of these ~230 unexpected things can be turned into one, given enough time and motivation. The risk multiplies as this is shell scripting, where the path from "string is misinterpreted" to "string is run as a command" is considerably shorter than in other languages. Sure, and I'd encourage people who are interested to dig through the results and see if they can find a real problem. I looked at several and didn't find anything that wasn't an example of the "test assumptions" thing above. Don't assume that there's no risk just because you didn't find anything. Also, git might not be the actual hole, but other software that relies on git not doing anything awkward might fail that assumption and expose the actual breach of security. You could argue that it's not a problem in git, but it is. Unless and until the git docs clearly state that things may break if "funny characters" are being used, and where. But again, I'm happy to be proven wrong. With security, you need to be confident about the absence of /any/ type of hole, not the absence of a /specific/ type of hole such as a shell injection. So the way forward isn't proving you wrong by providing a specific exploit, it's making sure that no exploit with URLs and file names can possibly exist. Now if I'm reading things like "heuristics" and "git uses URL-specific code even after it has determined it's not a URL"... well, that's the exact opposite of reassuring messages. > I just don't think plastering control characters into the test directory names all the time is a good way of finding those problems (and doesn't balance out the cost). Fair enough.
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 04:59:57PM +0200, Joachim Durchholz wrote: > Am 10.04.2017 um 15:38 schrieb Jeff King: > > Are those bugs? Maybe. Certainly they are limitations. But are they ones > > anybody _cares_ about? I think this may fall under "if it hurts, don't > > do it". > > It's not always possible to avoid that. Sort of. I don't find anything wrong with saying "your local filesystem path for a repository cannot contain newlines; if it does, some features may be unavailable". > URLs, for example, may contain "funny characters", including multi-byte > characters of which the second byte is 0x0a. If they are guaranteed to > always be URL-encoded this isn't a problem, but then we still need to make > sure that URL-encoding does happen. Sure, but URLs have a way of encoding. And if we're not encoding when we should, then that's a bug. But the arguments that are fed to things like git-clone _aren't_ URLs. They're a specifier that uses some heuristics to decide between the various cases (URLs, host:path specifiers, local paths). If you feed syntactic garbage, aborting the operation (and failing the test!) may be the right thing for git to do. > > If there are security bugs where a malicious input can cause us > > to do something bad, that's something to care about. But that's very > > different than asking "do these tests run to completion with a funny > > input". > > If the tests do not complete, git is doing something unexpected. I very much disagree with that. Git's test operate under a set of assumptions, and if you violate those assumptions, then the failures are not meaningful. Take alternates, for instance. The on-disk storage format cannot represent paths with newlines in them. If a test does: git clone -s "$(pwd)" parent.git child && test -d child then that test is going to fail if the test directory has a newline in it. But that doesn't tell us anything meaningful. Maybe there is a bug and maybe there isn't, but we cannot know because the thing being tested cannot possibly work under the test environment given. You can rewrite all the tests to say "skip this test if there's a newline in the test directory". But to what end? It's work to write and to maintain, and we haven't gained new information. > That in itself is not a security hole, but there's a pretty good chance that > at least one of these ~230 unexpected things can be turned into one, given > enough time and motivation. The risk multiplies as this is shell scripting, > where the path from "string is misinterpreted" to "string is run as a > command" is considerably shorter than in other languages. Sure, and I'd encourage people who are interested to dig through the results and see if they can find a real problem. I looked at several and didn't find anything that wasn't an example of the "test assumptions" thing above. I'll actually be surprised if there are shell injection problems in Git, because our scripts are usually pretty meticulous about quoting variables and not doing crazy things with eval. I think the issues are much more likely to be like the "submodule--helper --list" thing I pointed out, where we get phantom records in lists. But again, I'm happy to be proven wrong. If there's a shell injection in Git it's clearly a bug and should be fixed. I just don't think plastering control characters into the test directory names all the time is a good way of finding those problems (and doesn't balance out the cost). Fuzzing the directory names and digging into specific cases _is_ a reasonable way to do it. -Peff
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
Am 10.04.2017 um 15:38 schrieb Jeff King: Are those bugs? Maybe. Certainly they are limitations. But are they ones anybody _cares_ about? I think this may fall under "if it hurts, don't do it". It's not always possible to avoid that. URLs, for example, may contain "funny characters", including multi-byte characters of which the second byte is 0x0a. If they are guaranteed to always be URL-encoded this isn't a problem, but then we still need to make sure that URL-encoding does happen. Next source of funny characters that comes to my mind is submodules. They derive their name from the URL by default, and the subdirectory name as well. Again, consider the multibyte name where the second character is 0x0a. Or 0x80: À (uppercase A with accent grave) happens to have that byte in UTF-8 encoding, Ẁ is U+1E80 which would be encoded as 0x80 0x1e on an NTFS filesystem (barring additional coding steps in APIs or webservices, which further complicate the situation but don't usually eliminate the problem, they just shift it around). > If there are security bugs where a malicious input can cause us to do something bad, that's something to care about. But that's very different than asking "do these tests run to completion with a funny input". If the tests do not complete, git is doing something unexpected. That in itself is not a security hole, but there's a pretty good chance that at least one of these ~230 unexpected things can be turned into one, given enough time and motivation. The risk multiplies as this is shell scripting, where the path from "string is misinterpreted" to "string is run as a command" is considerably shorter than in other languages.
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 1:40 PM, Ævar Arnfjörð Bjarmasonwrote: > On Mon, Apr 10, 2017 at 1:19 PM, SZEDER Gábor wrote: >> A few other failures are triggered by the ':' in the trash directory's >> name, breaking the following commonly used pattern: >> >> export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" && >> cd subdir && >> test-git-pretending-it's-run-outside-of-a-repository > > Does GIT_CEILING_DIRECTORIES support escaping somehow? E.g. > "foo\:bar". If so maybe we could use a wrapper to set it, if not > that's a bug in the ceiling dir feature, surely. It doesn't, these $PATH-style variables in git, in the shell or elsewhere tend not to. >>> b) I think any sort of magic like using it with 'make test', but not >>> when the *.sh is manually run, will just lead to frustrating seemingly >>> heisenbugs from people trying to debug the test suite when things do >>> fail, i.e. you run 'make test' on some obscure platform we haven't >>> fixed path bugs on, 10 fail, you manually inspect them and every one >>> of them succeeds, because some --use-garbage-dirs option wasn't >>> passed. >> >> That's not really an issue. When a test fails during 'make test' with >> garbage in trash dir names, the dev comes and attempts to cd into the >> trash dir, and will be instantly reminded that non-printable >> characters might play a role in the failure when he can't do so with >> ordinary means. > > When a test fails for me I cd to t/ and re-run the test *.sh manually. > I don't go straight to inspecting the existing trash. > > If those manual invocations were running in some different mode & > succeeded that would be very confusing. On the contrary, that's a clue to where you might want to look. Besides, this is already the case when someone sets some options in GIT_TEST_OPTS in his config.mak.
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 01:40:13PM +0200, Ævar Arnfjörð Bjarmason wrote: > > A few other failures are triggered by the ':' in the trash directory's > > name, breaking the following commonly used pattern: > > > > export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" && > > cd subdir && > > test-git-pretending-it's-run-outside-of-a-repository > > Does GIT_CEILING_DIRECTORIES support escaping somehow? E.g. > "foo\:bar". If so maybe we could use a wrapper to set it, if not > that's a bug in the ceiling dir feature, surely. I don't think it does. But nor does $PATH (which is almost certainly another source of breakage). And neither does info/alternates allow pathnames with newline in them (that is the likely cause of a large number of failures, as it hits anywhere we use "clone -s"). Are those bugs? Maybe. Certainly they are limitations. But are they ones anybody _cares_ about? I think this may fall under "if it hurts, don't do it". If there are security bugs where a malicious input can cause us to do something bad, that's something to care about. But that's very different than asking "do these tests run to completion with a funny input". As an aside, I'd also question whether your patch might actually _hide_ bugs. It's applying a blanket change to the on-disk state that is obviously breaking some features. How many of those breakages are things that show up as a test failure, and how many of them quietly cause a test to do something else entirely, like failing its setup in a way that makes the rest of the test a noop? So given the pain this will cause to people actually looking at tests, and given that it's not clear to me if it has or will find any actual bugs, it seems premature to flip it on by default. If somebody wants to actually dig into these cases and look for actual bugs, I'm all for it. But flipping the default and marking a bunch of tests blindly as "well, this fails" hasn't made the world a better place. It's made it slightly worse. -Peff PS If you want to test various characters, I think the simplest way is just: make GIT_TEST_OPTS=--root=/tmp/look-a-col:on test
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 1:19 PM, SZEDER Gáborwrote: > On Mon, Apr 10, 2017 at 10:02 AM, Ævar Arnfjörð Bjarmason > wrote: >> On Mon, Apr 10, 2017 at 3:47 AM, SZEDER Gábor wrote: Change the test library to insert non-alphanumeric ASCII characters into the TRASH_DIRECTORY name, that's the directory the test library creates, chdirs to and runs each individual test from. Unless test_fails_on_unusual_directory_names=1 is declared before importing test-lib.sh (and if perl isn't available on the system), the trash directory will contain every non-alphanumeric character in ASCII, in order. >>> >>> At the very least there must be an easier way to disable this, e.g. a >>> command line option. >>> >>> This change is sure effective in smoking out bugs, but it's a major >>> annoyance during development when it comes to debugging a test. At >>> first I could not even cd into the trash directory, because TAB >>> completing the directory name with all those non-printable characters >>> didn't work (this may be a bug in the bash-completion package). And >>> simply copy-pasting the dirname didn't work either, because 'ls' >>> >>> trash directory.t9902-completion.??? >>> !"#$%&'()*+,-:;<=>?@[\]^_`{|}~? > > Btw, it seems most of the failures in t9902-completion are triggered > by remote URL parsing. The trash directory's new name contains '[', > ']' and even "@[", all of which are treated special by > connect.c:host_end(), a helper function of parse_connect_url(), > basically breaking anything trying to e.g.: > > git fetch "$(pwd)/other" I'm going to work on this patch so that I can report on tests by type of character that triggers a failure. > What puzzles me most is that parse_connect_url() recognizes right at > its beginning that a remote URL like this is not actually an URL, so > why does it continue parsing it as if it were one? > > A few other failures are triggered by the ':' in the trash directory's > name, breaking the following commonly used pattern: > > export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" && > cd subdir && > test-git-pretending-it's-run-outside-of-a-repository Does GIT_CEILING_DIRECTORIES support escaping somehow? E.g. "foo\:bar". If so maybe we could use a wrapper to set it, if not that's a bug in the ceiling dir feature, surely. > I think ':' should therefore be excluded from the trash directory, too. I think it's preferable to have some mode to use : in dirnames for those tests that don't fail already, to protect them against future regressions. Disabling the use of a tricky character like ":" invites future bugs & regressions. >>> After some headscratching, Sunday night may be my excuse, I figured >>> out that 'cd tr*' works... only to be greeted with the ugliest-ever >>> three-line(!) shell prompt. >>> >>> Therefore I would say that this should not even be enabled by default >>> in test-lib.sh, so at least running a test directly from the command >>> line as ./t1234-foo.sh would considerately give us an easily >>> accessible trash directory even without any command line options. We >>> could enable it for 'make test' by default via GIT_TEST_OPTS in >>> t/Makefile, though. >> >> This definitely needs some tweaking as you and Joachim point out. E.g. >> some capabilities check in the test suite to check if we can even >> create these sorts of paths on the local filesystem. >> >> A couple of comments on the above though: >> >> a) If we have something that's a more strict mode that makes tests >> fail due to buggy code in various scenarios, we gain the most from >> having it on by default > > I know, and I basically agree... > >> and having some optional mode to have devs >> e.g. disable it for manual inspection of the test directories. > > ... but this is just too gross to live as default outside of a CI > environment. > >> Most of the running of the test suite that really matters, i.e. just >> before the software is delivered to end users, is going to be running >> in some non-interactive build system preparing a package. >> >> b) I think any sort of magic like using it with 'make test', but not >> when the *.sh is manually run, will just lead to frustrating seemingly >> heisenbugs from people trying to debug the test suite when things do >> fail, i.e. you run 'make test' on some obscure platform we haven't >> fixed path bugs on, 10 fail, you manually inspect them and every one >> of them succeeds, because some --use-garbage-dirs option wasn't >> passed. > > That's not really an issue. When a test fails during 'make test' with > garbage in trash dir names, the dev comes and attempts to cd into the > trash dir, and will be instantly reminded that non-printable > characters might play a role in the failure when he can't do so with > ordinary means. When a test fails for me I cd to t/ and re-run the test *.sh manually. I don't go straight to
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 10:02 AM, Ævar Arnfjörð Bjarmasonwrote: > On Mon, Apr 10, 2017 at 3:47 AM, SZEDER Gábor wrote: >>> Change the test library to insert non-alphanumeric ASCII characters >>> into the TRASH_DIRECTORY name, that's the directory the test library >>> creates, chdirs to and runs each individual test from. >>> >>> Unless test_fails_on_unusual_directory_names=1 is declared before >>> importing test-lib.sh (and if perl isn't available on the system), the >>> trash directory will contain every non-alphanumeric character in >>> ASCII, in order. >> >> At the very least there must be an easier way to disable this, e.g. a >> command line option. >> >> This change is sure effective in smoking out bugs, but it's a major >> annoyance during development when it comes to debugging a test. At >> first I could not even cd into the trash directory, because TAB >> completing the directory name with all those non-printable characters >> didn't work (this may be a bug in the bash-completion package). And >> simply copy-pasting the dirname didn't work either, because 'ls' >> >> trash directory.t9902-completion.??? >> !"#$%&'()*+,-:;<=>?@[\]^_`{|}~? Btw, it seems most of the failures in t9902-completion are triggered by remote URL parsing. The trash directory's new name contains '[', ']' and even "@[", all of which are treated special by connect.c:host_end(), a helper function of parse_connect_url(), basically breaking anything trying to e.g.: git fetch "$(pwd)/other" What puzzles me most is that parse_connect_url() recognizes right at its beginning that a remote URL like this is not actually an URL, so why does it continue parsing it as if it were one? A few other failures are triggered by the ':' in the trash directory's name, breaking the following commonly used pattern: export GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY" && cd subdir && test-git-pretending-it's-run-outside-of-a-repository I think ':' should therefore be excluded from the trash directory, too. >> After some headscratching, Sunday night may be my excuse, I figured >> out that 'cd tr*' works... only to be greeted with the ugliest-ever >> three-line(!) shell prompt. >> >> Therefore I would say that this should not even be enabled by default >> in test-lib.sh, so at least running a test directly from the command >> line as ./t1234-foo.sh would considerately give us an easily >> accessible trash directory even without any command line options. We >> could enable it for 'make test' by default via GIT_TEST_OPTS in >> t/Makefile, though. > > This definitely needs some tweaking as you and Joachim point out. E.g. > some capabilities check in the test suite to check if we can even > create these sorts of paths on the local filesystem. > > A couple of comments on the above though: > > a) If we have something that's a more strict mode that makes tests > fail due to buggy code in various scenarios, we gain the most from > having it on by default I know, and I basically agree... > and having some optional mode to have devs > e.g. disable it for manual inspection of the test directories. ... but this is just too gross to live as default outside of a CI environment. > Most of the running of the test suite that really matters, i.e. just > before the software is delivered to end users, is going to be running > in some non-interactive build system preparing a package. > > b) I think any sort of magic like using it with 'make test', but not > when the *.sh is manually run, will just lead to frustrating seemingly > heisenbugs from people trying to debug the test suite when things do > fail, i.e. you run 'make test' on some obscure platform we haven't > fixed path bugs on, 10 fail, you manually inspect them and every one > of them succeeds, because some --use-garbage-dirs option wasn't > passed. That's not really an issue. When a test fails during 'make test' with garbage in trash dir names, the dev comes and attempts to cd into the trash dir, and will be instantly reminded that non-printable characters might play a role in the failure when he can't do so with ordinary means. >>> This includes all the control characters, !, [], {} etc. the "." >>> character isn't included because it's already in the directory name, >>> and nor is "/" for obvious reasons, although that would actually work, >>> we'd just create a subdirectory, which would make the tests harder to >>> inspect when they fail.i >> >> 1. Heh. How an additional subdirectory would make the tests harder to >>inspect is nothing compared to the effect of all the other >>characters. >> >> 2. s/i$// >> >>> This change is inspired by the "submodule: prevent backslash expantion >>> in submodule names" patch[1]. If we'd had backslashes in the >>> TRASH_DIRECTORY all along that bug would have been fixed a long time >>> ago. This will flag such issues by marking tests that currently fail >>> with
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
On Mon, Apr 10, 2017 at 3:47 AM, SZEDER Gáborwrote: >> Change the test library to insert non-alphanumeric ASCII characters >> into the TRASH_DIRECTORY name, that's the directory the test library >> creates, chdirs to and runs each individual test from. >> >> Unless test_fails_on_unusual_directory_names=1 is declared before >> importing test-lib.sh (and if perl isn't available on the system), the >> trash directory will contain every non-alphanumeric character in >> ASCII, in order. > > At the very least there must be an easier way to disable this, e.g. a > command line option. > > This change is sure effective in smoking out bugs, but it's a major > annoyance during development when it comes to debugging a test. At > first I could not even cd into the trash directory, because TAB > completing the directory name with all those non-printable characters > didn't work (this may be a bug in the bash-completion package). And > simply copy-pasting the dirname didn't work either, because 'ls' > > trash directory.t9902-completion.??? > !"#$%&'()*+,-:;<=>?@[\]^_`{|}~? > > After some headscratching, Sunday night may be my excuse, I figured > out that 'cd tr*' works... only to be greeted with the ugliest-ever > three-line(!) shell prompt. > > Therefore I would say that this should not even be enabled by default > in test-lib.sh, so at least running a test directly from the command > line as ./t1234-foo.sh would considerately give us an easily > accessible trash directory even without any command line options. We > could enable it for 'make test' by default via GIT_TEST_OPTS in > t/Makefile, though. This definitely needs some tweaking as you and Joachim point out. E.g. some capabilities check in the test suite to check if we can even create these sorts of paths on the local filesystem. A couple of comments on the above though: a) If we have something that's a more strict mode that makes tests fail due to buggy code in various scenarios, we gain the most from having it on by default, and having some optional mode to have devs e.g. disable it for manual inspection of the test directories. Most of the running of the test suite that really matters, i.e. just before the software is delivered to end users, is going to be running in some non-interactive build system preparing a package. b) I think any sort of magic like using it with 'make test', but not when the *.sh is manually run, will just lead to frustrating seemingly heisenbugs from people trying to debug the test suite when things do fail, i.e. you run 'make test' on some obscure platform we haven't fixed path bugs on, 10 fail, you manually inspect them and every one of them succeeds, because some --use-garbage-dirs option wasn't passed. >> This includes all the control characters, !, [], {} etc. the "." >> character isn't included because it's already in the directory name, >> and nor is "/" for obvious reasons, although that would actually work, >> we'd just create a subdirectory, which would make the tests harder to >> inspect when they fail.i > > 1. Heh. How an additional subdirectory would make the tests harder to >inspect is nothing compared to the effect of all the other >characters. > > 2. s/i$// > >> This change is inspired by the "submodule: prevent backslash expantion >> in submodule names" patch[1]. If we'd had backslashes in the >> TRASH_DIRECTORY all along that bug would have been fixed a long time >> ago. This will flag such issues by marking tests that currently fail >> with "test_fails_on_unusual_directory_names=1", ensure that new tests >> aren't added unless a discussion is had about why the code can't >> handle unusual pathnames, and prevent future regressions. >> >> 1. <20170407172306.172673-1-bmw...@google.com> >> --- >> t/README | 12 >> t/test-lib.sh | 4 >> 2 files changed, 16 insertions(+) >> >> diff --git a/t/README b/t/README >> index ab386c3681..314dd40221 100644 >> --- a/t/README >> +++ b/t/README >> @@ -345,6 +345,18 @@ assignment to variable 'test_description', like this: >> This test registers the following structure in the cache >> and tries to run git-ls-files with option --frotz.' >> >> +By default the tests will be run from a directory with a highly >> +unusual filename that includes control characters, a newline, various >> +punctuation etc., this is done to smoke out any bugs related to path >> +handling. If for whatever reason the tests can't deal with such >> +unusual path names, set: >> + >> +test_fails_on_unusual_directory_names=1 >> + >> +Before sourcing 'test-lib.sh' as described below. This option is >> +mainly intended to grandfather in existing broken tests & code, and >> +should usually not be used in new code, instead your tests or code >> +probably need fixing. >> >> Source 'test-lib.sh' >> >> diff --git a/t/test-lib.sh b/t/test-lib.sh >> index 13b5696822..089ff5ac7d 100644 >> ---
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
> Change the test library to insert non-alphanumeric ASCII characters > into the TRASH_DIRECTORY name, that's the directory the test library > creates, chdirs to and runs each individual test from. > > Unless test_fails_on_unusual_directory_names=1 is declared before > importing test-lib.sh (and if perl isn't available on the system), the > trash directory will contain every non-alphanumeric character in > ASCII, in order. At the very least there must be an easier way to disable this, e.g. a command line option. This change is sure effective in smoking out bugs, but it's a major annoyance during development when it comes to debugging a test. At first I could not even cd into the trash directory, because TAB completing the directory name with all those non-printable characters didn't work (this may be a bug in the bash-completion package). And simply copy-pasting the dirname didn't work either, because 'ls' displayed it as trash directory.t9902-completion.??? !"#$%&'()*+,-:;<=>?@[\]^_`{|}~? After some headscratching, Sunday night may be my excuse, I figured out that 'cd tr*' works... only to be greeted with the ugliest-ever three-line(!) shell prompt. Therefore I would say that this should not even be enabled by default in test-lib.sh, so at least running a test directly from the command line as ./t1234-foo.sh would considerately give us an easily accessible trash directory even without any command line options. We could enable it for 'make test' by default via GIT_TEST_OPTS in t/Makefile, though. > This includes all the control characters, !, [], {} etc. the "." > character isn't included because it's already in the directory name, > and nor is "/" for obvious reasons, although that would actually work, > we'd just create a subdirectory, which would make the tests harder to > inspect when they fail.i 1. Heh. How an additional subdirectory would make the tests harder to inspect is nothing compared to the effect of all the other characters. 2. s/i$// > This change is inspired by the "submodule: prevent backslash expantion > in submodule names" patch[1]. If we'd had backslashes in the > TRASH_DIRECTORY all along that bug would have been fixed a long time > ago. This will flag such issues by marking tests that currently fail > with "test_fails_on_unusual_directory_names=1", ensure that new tests > aren't added unless a discussion is had about why the code can't > handle unusual pathnames, and prevent future regressions. > > 1. <20170407172306.172673-1-bmw...@google.com> > --- > t/README | 12 > t/test-lib.sh | 4 > 2 files changed, 16 insertions(+) > > diff --git a/t/README b/t/README > index ab386c3681..314dd40221 100644 > --- a/t/README > +++ b/t/README > @@ -345,6 +345,18 @@ assignment to variable 'test_description', like this: > This test registers the following structure in the cache > and tries to run git-ls-files with option --frotz.' > > +By default the tests will be run from a directory with a highly > +unusual filename that includes control characters, a newline, various > +punctuation etc., this is done to smoke out any bugs related to path > +handling. If for whatever reason the tests can't deal with such > +unusual path names, set: > + > +test_fails_on_unusual_directory_names=1 > + > +Before sourcing 'test-lib.sh' as described below. This option is > +mainly intended to grandfather in existing broken tests & code, and > +should usually not be used in new code, instead your tests or code > +probably need fixing. > > Source 'test-lib.sh' > > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 13b5696822..089ff5ac7d 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -914,6 +914,10 @@ fi > > # Test repository > TRASH_DIRECTORY="trash directory.$(basename "$0" .sh)" > +if test -z "$test_fails_on_unusual_directory_names" -a "$(perl -e 'print > 1+1' 2>/dev/null)" = "2" > +then > + TRASH_DIRECTORY="$TRASH_DIRECTORY.$(perl -e 'print join q[], grep { > /[^[:alnum:]]/ and !m<[./]> } map chr, 0x01..0x7f')" > +fi > test -n "$root" && TRASH_DIRECTORY="$root/$TRASH_DIRECTORY" > case "$TRASH_DIRECTORY" in > /*) ;; # absolute path is good > -- > 2.11.0