Re: Line continuation and variables

2014-10-29 Thread Herbert Xu
On Wed, Oct 29, 2014 at 10:52:30PM +0100, Jilles Tjoelker wrote:
>
> This implementation of pgetc_eatbnl() does not allow pushing back a
> backslash, since that would call pungetc() twice without an intervening
> pgetc(). However, some places do attempt to push back a backslash. As a
> result, a script file containing many repeated  ${w#\#}  will not be
> parsed correctly. There is a similar bug with repeated  $\#  but this is
> not specified by POSIX.

Good catch! I guess I'll do something similar to tokpushback
to handle this.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Line continuation and variables

2014-10-29 Thread Jilles Tjoelker
On Mon, Sep 29, 2014 at 10:55:07PM +0800, Herbert Xu wrote:
> On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote:
> [snip]
> > So the fact that dash is treating the elided backslash-newline as a
> > token separator, and parsing your input as if ${EDIT}OR instead of
> > ${EDITOR} is a bug in dash.

> I agree.  The following patch should fix this:

> commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8
> Author: Herbert Xu 
> Date:   Mon Sep 29 22:52:41 2014 +0800

> [PARSER] Handle backslash newlines properly after dollar sign
> [snip]

> diff --git a/ChangeLog b/ChangeLog
> index 0fbc514..398bd15 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,6 +1,7 @@
>  2014-09-29  Herbert Xu 
>  
>   * Kill pgetc_macro.
> + * Handle backslash newlines properly after dollar sign.
>  
>  2014-09-28  Herbert Xu 
>  
> diff --git a/src/parser.c b/src/parser.c
> index c4eaae2..2b07437 100644
> --- a/src/parser.c
> +++ b/src/parser.c
> @@ -827,6 +827,24 @@ breakloop:
>  #undef RETURN
>  }
>  
> +static int pgetc_eatbnl(void)
> +{
> + int c;
> +
> + while ((c = pgetc()) == '\\') {
> + if (pgetc() != '\n') {
> + pungetc();
> + break;
> + }
> +
> + plinno++;
> + if (doprompt)
> + setprompt(2);
> + }
> +
> + return c;
> +}
> +
>  
>  
>  /*

This implementation of pgetc_eatbnl() does not allow pushing back a
backslash, since that would call pungetc() twice without an intervening
pgetc(). However, some places do attempt to push back a backslash. As a
result, a script file containing many repeated  ${w#\#}  will not be
parsed correctly. There is a similar bug with repeated  $\#  but this is
not specified by POSIX.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Line continuation and variables

2014-09-29 Thread Herbert Xu
On Mon, Sep 29, 2014 at 10:55:07PM +0800, Herbert Xu wrote:
>
> I agree.  The following patch should fix this:
> 
> commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8
> Author: Herbert Xu 
> Date:   Mon Sep 29 22:52:41 2014 +0800
> 
> [PARSER] Handle backslash newlines properly after dollar sign

Here is a small clean-up on top of it:

commit 6df87cf1d4b7c0c490ab1803b863de10579df92e
Author: Herbert Xu 
Date:   Mon Sep 29 22:53:53 2014 +0800

[PARSER] Add nlprompt/nlnoprompt helpers

This patch adds the nlprompt/nlnoprompt helpers to isolate code
dealing with newlines and prompting.

Signed-off-by: Herbert Xu 

diff --git a/ChangeLog b/ChangeLog
index 398bd15..f161a13 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,7 @@
 
* Kill pgetc_macro.
* Handle backslash newlines properly after dollar sign.
+   * Add nlprompt/nlnoprompt helpers.
 
 2014-09-28  Herbert Xu 
 
diff --git a/src/parser.c b/src/parser.c
index 2b07437..f6c43be 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -743,6 +743,19 @@ out:
return (t);
 }
 
+static void nlprompt(void)
+{
+   plinno++;
+   if (doprompt)
+   setprompt(2);
+}
+
+static void nlnoprompt(void)
+{
+   plinno++;
+   needprompt = doprompt;
+}
+
 
 /*
  * Read the next input token.
@@ -786,16 +799,13 @@ xxreadtoken(void)
continue;
case '\\':
if (pgetc() == '\n') {
-   plinno++;
-   if (doprompt)
-   setprompt(2);
+   nlprompt();
continue;
}
pungetc();
goto breakloop;
case '\n':
-   plinno++;
-   needprompt = doprompt;
+   nlnoprompt();
RETURN(TNL);
case PEOF:
RETURN(TEOF);
@@ -837,9 +847,7 @@ static int pgetc_eatbnl(void)
break;
}
 
-   plinno++;
-   if (doprompt)
-   setprompt(2);
+   nlprompt();
}
 
return c;
@@ -913,9 +921,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, 
int striptabs)
if (syntax == BASESYNTAX)
goto endword;   /* exit outer loop */
USTPUTC(c, out);
-   plinno++;
-   if (doprompt)
-   setprompt(2);
+   nlprompt();
c = pgetc();
goto loop;  /* continue outer loop 
*/
case CWORD:
@@ -934,9 +940,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, 
int striptabs)
USTPUTC('\\', out);
pungetc();
} else if (c == '\n') {
-   plinno++;
-   if (doprompt)
-   setprompt(2);
+   nlprompt();
} else {
if (
dblquote &&
@@ -1092,8 +1096,7 @@ checkend: {
 
if (c == '\n' || c == PEOF) {
c = PEOF;
-   plinno++;
-   needprompt = doprompt;
+   nlnoprompt();
} else {
int len;
 
@@ -1342,9 +1345,7 @@ parsebackq: {
 
case '\\':
 if ((pc = pgetc()) == '\n') {
-   plinno++;
-   if (doprompt)
-   setprompt(2);
+   nlprompt();
/*
 * If eating a newline, avoid putting
 * the newline into the new character
@@ -1366,8 +1367,7 @@ parsebackq: {
synerror("EOF in backquote substitution");
 
case '\n':
-   plinno++;
-   needprompt = doprompt;
+   nlnoprompt();
break;
 
default:

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@v

Re: Line continuation and variables

2014-09-29 Thread Herbert Xu
On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote:
> On 08/26/2014 06:15 AM, Oleg Bulatov wrote:
> > Hi!
> > 
> > While playing with sh generators I found that dash and bash have different
> > interpretations for  sequence.
> > 
> > $ dash -c 'EDIT=xxx; echo $EDIT\
> >> OR'
> > xxxOR
> 
> Buggy.
> 
> > $ bash -c 'EDIT=xxx; echo $EDIT\
> > OR'
> > /usr/bin/vim
> 
> Correct behavior.
> 
> > 
> > $ dash -c 'echo "$\
> > (pwd)"'
> > $(pwd)
> > 
> > Is it undefined behaviour in POSIX?
> 
> No, it's well-defined, and dash is buggy.  POSIX says:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03
> 
> "the shell shall break its input into tokens by applying the first
> applicable rule below to the next character in its input"
> 
> Rule 4 covers backslash handling, while rule 5 covers locating the end
> of a word to be subject to $ expansion.  Therefore, rule 4 should happen
> first.  Rule 4 defers to the section on quoting, with the caveat that
>  joining is the only substitution that happens immediately as
> part of the parsing:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02
> 
> "If a  follows the , the shell shall interpret this
> as line continuation. The  and  shall be removed
> before splitting the input into tokens. Since the escaped  is
> removed entirely from the input and is not replaced by any white space,
> it cannot serve as a token separator."
> 
> So the fact that dash is treating the elided backslash-newline as a
> token separator, and parsing your input as if ${EDIT}OR instead of
> ${EDITOR} is a bug in dash.

I agree.  The following patch should fix this:

commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8
Author: Herbert Xu 
Date:   Mon Sep 29 22:52:41 2014 +0800

[PARSER] Handle backslash newlines properly after dollar sign

On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote:
> On 08/26/2014 06:15 AM, Oleg Bulatov wrote:
> > Hi!
> >
> > While playing with sh generators I found that dash and bash have 
different
> > interpretations for  sequence.
> >
> > $ dash -c 'EDIT=xxx; echo $EDIT\
> >> OR'
> > xxxOR
>
> Buggy.
>
> > $ bash -c 'EDIT=xxx; echo $EDIT\
> > OR'
> > /usr/bin/vim
>
> Correct behavior.
>
> >
> > $ dash -c 'echo "$\
> > (pwd)"'
> > $(pwd)
> >
> > Is it undefined behaviour in POSIX?
>
> No, it's well-defined, and dash is buggy.  POSIX says:
>
> 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03
>
> "the shell shall break its input into tokens by applying the first
> applicable rule below to the next character in its input"
>
> Rule 4 covers backslash handling, while rule 5 covers locating the end
> of a word to be subject to $ expansion.  Therefore, rule 4 should happen
> first.  Rule 4 defers to the section on quoting, with the caveat that
>  joining is the only substitution that happens immediately as
> part of the parsing:
>
> 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02
>
> "If a  follows the , the shell shall interpret this
> as line continuation. The  and  shall be removed
> before splitting the input into tokens. Since the escaped  is
> removed entirely from the input and is not replaced by any white space,
> it cannot serve as a token separator."
>
> So the fact that dash is treating the elided backslash-newline as a
> token separator, and parsing your input as if ${EDIT}OR instead of
> ${EDITOR} is a bug in dash.

I agree.  This patch should resolve this problem and similar ones
affecting blackslash newlines after we encounter a dollar sign.

Signed-off-by: Herbert Xu 

diff --git a/ChangeLog b/ChangeLog
index 0fbc514..398bd15 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,6 +1,7 @@
 2014-09-29  Herbert Xu 
 
* Kill pgetc_macro.
+   * Handle backslash newlines properly after dollar sign.
 
 2014-09-28  Herbert Xu 
 
diff --git a/src/parser.c b/src/parser.c
index c4eaae2..2b07437 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -827,6 +827,24 @@ breakloop:
 #undef RETURN
 }
 
+static int pgetc_eatbnl(void)
+{
+   int c;
+
+   while ((c = pgetc()) == '\\') {
+   if (pgetc() != '\n') {
+   pungetc();
+   break;
+   }
+
+   plinno++;
+   if (doprompt)
+   setprompt(2);
+   }
+
+   return c;
+}
+
 
 
 /*
@@ -1179,7 +1197,7 @@ parsesub: {
char *p;
static const char types[] = "}-+?=";
 
-   c = pgetc();
+   c = pgetc_eatbnl();
if (
(checkkwd & CHKEOFMARK) ||
c <= PEOA  ||
@@ -1188,7 +1206,7 @@ parsesub: {
USTPUTC('$', out);
pungetc();
} else if (c == '(') {  /* $(

Re: Line continuation and variables

2014-08-26 Thread Eric Blake
On 08/26/2014 06:15 AM, Oleg Bulatov wrote:
> Hi!
> 
> While playing with sh generators I found that dash and bash have different
> interpretations for  sequence.
> 
> $ dash -c 'EDIT=xxx; echo $EDIT\
>> OR'
> xxxOR

Buggy.

> $ bash -c 'EDIT=xxx; echo $EDIT\
> OR'
> /usr/bin/vim

Correct behavior.

> 
> $ dash -c 'echo "$\
> (pwd)"'
> $(pwd)
> 
> Is it undefined behaviour in POSIX?

No, it's well-defined, and dash is buggy.  POSIX says:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03

"the shell shall break its input into tokens by applying the first
applicable rule below to the next character in its input"

Rule 4 covers backslash handling, while rule 5 covers locating the end
of a word to be subject to $ expansion.  Therefore, rule 4 should happen
first.  Rule 4 defers to the section on quoting, with the caveat that
 joining is the only substitution that happens immediately as
part of the parsing:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02

"If a  follows the , the shell shall interpret this
as line continuation. The  and  shall be removed
before splitting the input into tokens. Since the escaped  is
removed entirely from the input and is not replaced by any white space,
it cannot serve as a token separator."

So the fact that dash is treating the elided backslash-newline as a
token separator, and parsing your input as if ${EDIT}OR instead of
${EDITOR} is a bug in dash.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Line continuation and variables

2014-08-26 Thread Oleg Bulatov
Hi!

While playing with sh generators I found that dash and bash have different
interpretations for  sequence.

$ dash -c 'EDIT=xxx; echo $EDIT\
> OR'
xxxOR
$ bash -c 'EDIT=xxx; echo $EDIT\
OR'
/usr/bin/vim

$ dash -c 'echo "$\
(pwd)"'
$(pwd)

Is it undefined behaviour in POSIX?

-- 
WBR, Oleg Bulatov
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html