Re: Line continuation and variables
On Wed, Oct 29, 2014 at 10:52:30PM +0100, Jilles Tjoelker wrote: > > This implementation of pgetc_eatbnl() does not allow pushing back a > backslash, since that would call pungetc() twice without an intervening > pgetc(). However, some places do attempt to push back a backslash. As a > result, a script file containing many repeated ${w#\#} will not be > parsed correctly. There is a similar bug with repeated $\# but this is > not specified by POSIX. Good catch! I guess I'll do something similar to tokpushback to handle this. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Line continuation and variables
On Mon, Sep 29, 2014 at 10:55:07PM +0800, Herbert Xu wrote: > On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote: > [snip] > > So the fact that dash is treating the elided backslash-newline as a > > token separator, and parsing your input as if ${EDIT}OR instead of > > ${EDITOR} is a bug in dash. > I agree. The following patch should fix this: > commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8 > Author: Herbert Xu > Date: Mon Sep 29 22:52:41 2014 +0800 > [PARSER] Handle backslash newlines properly after dollar sign > [snip] > diff --git a/ChangeLog b/ChangeLog > index 0fbc514..398bd15 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,6 +1,7 @@ > 2014-09-29 Herbert Xu > > * Kill pgetc_macro. > + * Handle backslash newlines properly after dollar sign. > > 2014-09-28 Herbert Xu > > diff --git a/src/parser.c b/src/parser.c > index c4eaae2..2b07437 100644 > --- a/src/parser.c > +++ b/src/parser.c > @@ -827,6 +827,24 @@ breakloop: > #undef RETURN > } > > +static int pgetc_eatbnl(void) > +{ > + int c; > + > + while ((c = pgetc()) == '\\') { > + if (pgetc() != '\n') { > + pungetc(); > + break; > + } > + > + plinno++; > + if (doprompt) > + setprompt(2); > + } > + > + return c; > +} > + > > > /* This implementation of pgetc_eatbnl() does not allow pushing back a backslash, since that would call pungetc() twice without an intervening pgetc(). However, some places do attempt to push back a backslash. As a result, a script file containing many repeated ${w#\#} will not be parsed correctly. There is a similar bug with repeated $\# but this is not specified by POSIX. -- Jilles Tjoelker -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Line continuation and variables
On Mon, Sep 29, 2014 at 10:55:07PM +0800, Herbert Xu wrote: > > I agree. The following patch should fix this: > > commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8 > Author: Herbert Xu > Date: Mon Sep 29 22:52:41 2014 +0800 > > [PARSER] Handle backslash newlines properly after dollar sign Here is a small clean-up on top of it: commit 6df87cf1d4b7c0c490ab1803b863de10579df92e Author: Herbert Xu Date: Mon Sep 29 22:53:53 2014 +0800 [PARSER] Add nlprompt/nlnoprompt helpers This patch adds the nlprompt/nlnoprompt helpers to isolate code dealing with newlines and prompting. Signed-off-by: Herbert Xu diff --git a/ChangeLog b/ChangeLog index 398bd15..f161a13 100644 --- a/ChangeLog +++ b/ChangeLog @@ -2,6 +2,7 @@ * Kill pgetc_macro. * Handle backslash newlines properly after dollar sign. + * Add nlprompt/nlnoprompt helpers. 2014-09-28 Herbert Xu diff --git a/src/parser.c b/src/parser.c index 2b07437..f6c43be 100644 --- a/src/parser.c +++ b/src/parser.c @@ -743,6 +743,19 @@ out: return (t); } +static void nlprompt(void) +{ + plinno++; + if (doprompt) + setprompt(2); +} + +static void nlnoprompt(void) +{ + plinno++; + needprompt = doprompt; +} + /* * Read the next input token. @@ -786,16 +799,13 @@ xxreadtoken(void) continue; case '\\': if (pgetc() == '\n') { - plinno++; - if (doprompt) - setprompt(2); + nlprompt(); continue; } pungetc(); goto breakloop; case '\n': - plinno++; - needprompt = doprompt; + nlnoprompt(); RETURN(TNL); case PEOF: RETURN(TEOF); @@ -837,9 +847,7 @@ static int pgetc_eatbnl(void) break; } - plinno++; - if (doprompt) - setprompt(2); + nlprompt(); } return c; @@ -913,9 +921,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) if (syntax == BASESYNTAX) goto endword; /* exit outer loop */ USTPUTC(c, out); - plinno++; - if (doprompt) - setprompt(2); + nlprompt(); c = pgetc(); goto loop; /* continue outer loop */ case CWORD: @@ -934,9 +940,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC('\\', out); pungetc(); } else if (c == '\n') { - plinno++; - if (doprompt) - setprompt(2); + nlprompt(); } else { if ( dblquote && @@ -1092,8 +1096,7 @@ checkend: { if (c == '\n' || c == PEOF) { c = PEOF; - plinno++; - needprompt = doprompt; + nlnoprompt(); } else { int len; @@ -1342,9 +1345,7 @@ parsebackq: { case '\\': if ((pc = pgetc()) == '\n') { - plinno++; - if (doprompt) - setprompt(2); + nlprompt(); /* * If eating a newline, avoid putting * the newline into the new character @@ -1366,8 +1367,7 @@ parsebackq: { synerror("EOF in backquote substitution"); case '\n': - plinno++; - needprompt = doprompt; + nlnoprompt(); break; default: Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@v
Re: Line continuation and variables
On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote: > On 08/26/2014 06:15 AM, Oleg Bulatov wrote: > > Hi! > > > > While playing with sh generators I found that dash and bash have different > > interpretations for sequence. > > > > $ dash -c 'EDIT=xxx; echo $EDIT\ > >> OR' > > xxxOR > > Buggy. > > > $ bash -c 'EDIT=xxx; echo $EDIT\ > > OR' > > /usr/bin/vim > > Correct behavior. > > > > > $ dash -c 'echo "$\ > > (pwd)"' > > $(pwd) > > > > Is it undefined behaviour in POSIX? > > No, it's well-defined, and dash is buggy. POSIX says: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03 > > "the shell shall break its input into tokens by applying the first > applicable rule below to the next character in its input" > > Rule 4 covers backslash handling, while rule 5 covers locating the end > of a word to be subject to $ expansion. Therefore, rule 4 should happen > first. Rule 4 defers to the section on quoting, with the caveat that > joining is the only substitution that happens immediately as > part of the parsing: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02 > > "If a follows the , the shell shall interpret this > as line continuation. The and shall be removed > before splitting the input into tokens. Since the escaped is > removed entirely from the input and is not replaced by any white space, > it cannot serve as a token separator." > > So the fact that dash is treating the elided backslash-newline as a > token separator, and parsing your input as if ${EDIT}OR instead of > ${EDITOR} is a bug in dash. I agree. The following patch should fix this: commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8 Author: Herbert Xu Date: Mon Sep 29 22:52:41 2014 +0800 [PARSER] Handle backslash newlines properly after dollar sign On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote: > On 08/26/2014 06:15 AM, Oleg Bulatov wrote: > > Hi! > > > > While playing with sh generators I found that dash and bash have different > > interpretations for sequence. > > > > $ dash -c 'EDIT=xxx; echo $EDIT\ > >> OR' > > xxxOR > > Buggy. > > > $ bash -c 'EDIT=xxx; echo $EDIT\ > > OR' > > /usr/bin/vim > > Correct behavior. > > > > > $ dash -c 'echo "$\ > > (pwd)"' > > $(pwd) > > > > Is it undefined behaviour in POSIX? > > No, it's well-defined, and dash is buggy. POSIX says: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03 > > "the shell shall break its input into tokens by applying the first > applicable rule below to the next character in its input" > > Rule 4 covers backslash handling, while rule 5 covers locating the end > of a word to be subject to $ expansion. Therefore, rule 4 should happen > first. Rule 4 defers to the section on quoting, with the caveat that > joining is the only substitution that happens immediately as > part of the parsing: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02 > > "If a follows the , the shell shall interpret this > as line continuation. The and shall be removed > before splitting the input into tokens. Since the escaped is > removed entirely from the input and is not replaced by any white space, > it cannot serve as a token separator." > > So the fact that dash is treating the elided backslash-newline as a > token separator, and parsing your input as if ${EDIT}OR instead of > ${EDITOR} is a bug in dash. I agree. This patch should resolve this problem and similar ones affecting blackslash newlines after we encounter a dollar sign. Signed-off-by: Herbert Xu diff --git a/ChangeLog b/ChangeLog index 0fbc514..398bd15 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,6 +1,7 @@ 2014-09-29 Herbert Xu * Kill pgetc_macro. + * Handle backslash newlines properly after dollar sign. 2014-09-28 Herbert Xu diff --git a/src/parser.c b/src/parser.c index c4eaae2..2b07437 100644 --- a/src/parser.c +++ b/src/parser.c @@ -827,6 +827,24 @@ breakloop: #undef RETURN } +static int pgetc_eatbnl(void) +{ + int c; + + while ((c = pgetc()) == '\\') { + if (pgetc() != '\n') { + pungetc(); + break; + } + + plinno++; + if (doprompt) + setprompt(2); + } + + return c; +} + /* @@ -1179,7 +1197,7 @@ parsesub: { char *p; static const char types[] = "}-+?="; - c = pgetc(); + c = pgetc_eatbnl(); if ( (checkkwd & CHKEOFMARK) || c <= PEOA || @@ -1188,7 +1206,7 @@ parsesub: { USTPUTC('$', out); pungetc(); } else if (c == '(') { /* $(
Re: Line continuation and variables
On 08/26/2014 06:15 AM, Oleg Bulatov wrote: > Hi! > > While playing with sh generators I found that dash and bash have different > interpretations for sequence. > > $ dash -c 'EDIT=xxx; echo $EDIT\ >> OR' > xxxOR Buggy. > $ bash -c 'EDIT=xxx; echo $EDIT\ > OR' > /usr/bin/vim Correct behavior. > > $ dash -c 'echo "$\ > (pwd)"' > $(pwd) > > Is it undefined behaviour in POSIX? No, it's well-defined, and dash is buggy. POSIX says: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03 "the shell shall break its input into tokens by applying the first applicable rule below to the next character in its input" Rule 4 covers backslash handling, while rule 5 covers locating the end of a word to be subject to $ expansion. Therefore, rule 4 should happen first. Rule 4 defers to the section on quoting, with the caveat that joining is the only substitution that happens immediately as part of the parsing: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02 "If a follows the , the shell shall interpret this as line continuation. The and shall be removed before splitting the input into tokens. Since the escaped is removed entirely from the input and is not replaced by any white space, it cannot serve as a token separator." So the fact that dash is treating the elided backslash-newline as a token separator, and parsing your input as if ${EDIT}OR instead of ${EDITOR} is a bug in dash. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Line continuation and variables
Hi! While playing with sh generators I found that dash and bash have different interpretations for sequence. $ dash -c 'EDIT=xxx; echo $EDIT\ > OR' xxxOR $ bash -c 'EDIT=xxx; echo $EDIT\ OR' /usr/bin/vim $ dash -c 'echo "$\ (pwd)"' $(pwd) Is it undefined behaviour in POSIX? -- WBR, Oleg Bulatov -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html