0.braceexp.cpp

Travis Vitek Mon, 18 Feb 2008 23:47:09 -0800


Martin,

Thank you for the additional testcases. They point out a few issues that I
didn't interpret from the description in the Bash Reference Manual
[http://www.gnu.org/software/bash/manual/bashref.html#Brace-Expansion]. Note
that below I refer to paragraphs from this documentation.

I do have a few issues with the expectations you've laid out. Comments
follow...

sebor-2 wrote:
> 
> +
> +    TEST ("foo {1,2} bar", "foo 1 2 bar");
> +
> 

This isn't a brace expansion. It is a literal string, followed by a brace
expansion, followed by a literal string. When you run 'echo foo {1,2} bar'
in the shell, each of the args are brace expanded individually, so the only
thing that is brace expanded is the '{1,2}' and everything else is written
literally. I believe this testcase is invalid.

sebor-2 wrote:
> 
> +    // we don't have eval
> +    // TEST ("`zecho foo {1,2} bar`",  "foo 1 2 bar");
> +    // TEST ("$(zecho foo {1,2} bar)", "foo 1 2 bar");
> 

Same problem here.

sebor-2 wrote:
> 
> +#if 0   // not implemented yet
> +
> +    // set the three variables
> +    rw_putenv ("var=baz:varx=vx:vary=vy");
> +
> +    TEST ("foo{bar,${var}.}", "foobar foobaz.");
> +    TEST ("foo{bar,${var}}",  "foobar foobaz");
> +
> +    TEST ("${var}\"{x,y}",    "bazx bazy");
> +    TEST ("$var{x,y}",        "vx vy");
> +    TEST ("${var}{x,y}",      "bazx bazy");
> +
> +    // unset all three variables
> +    rw_putenv ("var=:varx=:vary=");
> +
> +#endif   // 0
> 

I don't expect this functionality to ever be implemented inside
rw_brace_expand(). As mentioned in paragraph 4, the brace expansion itself
is done before other expansions, and it does not interpret the text between
the braces.

Given this, I feel that the environment variable expansion must done at some
later stage, by some other function, and the above test block is
inappropriate for this test.

sebor-2 wrote:
> 
> +
> +    TEST ("{1..10}", "1 2 3 4 5 6 7 8 9 10");
> +
> 

This is a case that I should be handling. I need to go back and add complete
support for integer ranges, specifically ranges that include multidigit
numbers and sign.

sebor-2 wrote:
> 
> +    // this doesn't work in Bash 3.2
> +    // TEST ("{0..10,braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
> +
> 

I don't know how anyone could expect this to work. The first subexpression
of the brace expansion list is '0..10', which itself is not a brace
expansion, so it should not be expanded. It should be left as a literal.
This happens to be the behavior I see with Bash 3.0.

sebor-2 wrote:
> 
> +    // but this does
> +    TEST ("{{0..10},braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
> +    TEST ("x{{0..10},braces}y",
> +          "x0y x1y x2y x3y x4y x5y x6y x7y x8y x9y x10y xbracesy");
> +
> 

Obviously, both of these are valid versions of the previous test expression.

sebor-2 wrote:
> 
> +    TEST ("{a..A}",
> +          "a ` _ ^ ]  [ Z Y X W V U T S R Q P O N M L K J I H G F E D C B
> A");
> +    TEST ("{A..a}",
> +          "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [  ] ^ _ `
> a");
> +
> 

Interesting. I didn't think it would make sense to allow mixing of lower and
uppercase characters in the sequence expression because of the characters
between 'Z' and 'a'. Obviously I was wrong. BTW, any idea what happened to
ASCII 92? It is the backslash character that should appear between '[' and
']'.

sebor-2 wrote:
> 
> +
> +    TEST ("0{1..9} {10..20}",
> +          "01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20");
> 

This has the same problem as the first issue I brought up. This is actually
two seperate brace expansions, the first is '0{1..9}' and the second is
'{10..20}'. This is how the shell handles them, and this is how I handle
them.

If they were treated as one brace expansion by the shell, I would expect the
postscript '{10..20}' expanded for each prefix/body expansion, much like you
would see if you escaped the space.

sebor-2 wrote:
> 
> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
> 

I don't understand how this could be interpreted as valid brace expansion at
all. The body of the expansion is '{b{d,e}}'. Paragraph 5 [and paragraph 1
for that matter] require a correctly-formed brace expansion have unquoted
[unescaped?] opening and closing braces, and at least one unquoted comma or
a valid sequence expression. The body does not meet either of these
requirements, so it must be invalid.

To get the result shown, the obvious thing to do is to escape the outer
braces. This would give us the valid expression 'a-\{b{d,e}\}-c', that
happens to also work with previous versions of bash also.

sebor-2 wrote:
> 
> +    TEST ("a-{bdef-{g,i}-c", "a-{bdef-g-c a-{bdef-i-c");
> 

Again, this does not seem correct according to the requirements of paragraph
5 [and 1].

If the body is supposed to be between a pair of braces, shouldn't the first
unescaped opening brace match the first unescaped close brace at the same
brace depth? If it is, then the outer brace expansion isn't valid because it
doesn't have a terminating close brace. Even if one was added, the resulting
expression has the same problem as the previous example. The nested
expression 'bdef-{g,i}-c' isn't a series comma-seperated strings or a
sequence expression. 

If you wanted the first brace to be ignored, as it is in the test, then it
should be escaped. Then we would have 'a-\{bdef-{g,i}-c'. That expression
follows the requirements outlined in the manual, and works with old versions
of bash, and a human can pretty easily figure out what the expected result
would be.

Now I suppose that since invalid brace expansions are to be left unchanged,
you could say that the first brace expansion is copied literally because it
is invalid, but the second is valid and should be expanded. This almost
explains how bash 3.2 gets these results, but it still seems wrong. If a
subexpression is invalid it seems that the whole expression is invalid.

sebor-2 wrote:
> 
> +    TEST ("{",     "{");
> +    TEST ("}",     "}");
> +    TEST ("{}",    "{}");
> +    TEST ("{ }",   "{ }");
> +    TEST ("{  }",  "{  }");   // is this right?
> 

I sure think it is. Again, the requirements say that these are not valid
brace expansions, so they should be left unchanged. I'm wondering if the
shell is doing some sort of whitespace collapse. Everything seems to work
fine if you escape the spaces, so I'm thinking that is why you see the
behavior that you do.

So, with all that said, I've got a few thoughts.

1. I don't really like the idea of trying to emulate all behavior of the
shell in rw_brace_expand. If we want that, then we should have made a bug
entitled 'provide a complete implementation of bash'.
2. I don't feel comfortable trying to maintain compatibility with version
3.2 of bash. It doesn't seem to follow the documented requirements, and I
believe that the odd behavior may be difficult to implement. The bash 3.0
implementation seems much more sane and that is what I tried to emulate when
writing this code.
3. If you, er, we want to do brace expansion exactly like you see within
bash, then we should write another function that tokenizes a string on
whitespace and does brace expansion on each token. I was expecting the
caller of rw_brace_expand() to expect the function to do brace expansion,
not complete shell emulation.

Travis

-- 
View this message in context: 
http://www.nabble.com/svn-commit%3A-r628839----stdcxx-trunk-tests-self-0.braceexp.cpp-tp15551361p15560343.html
Sent from the stdcxx-commits mailing list archive at Nabble.com.

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Reply via email to