Issue7+TC2 0001454]: Conflict between "case" description and grammar

Robert Elz via austin-group-l at The Open Group Fri, 19 Feb 2021 10:45:14 -0800

    Date:        Fri, 19 Feb 2021 18:13:09 +0000
    From:        "Harald van Dijk via austin-group-l at The Open Group" 
<austin-group-l@opengroup.org>
    Message-ID:  <b7991fd5-e969-f215-0105-3efdf7113...@gigawatt.nl>



  | The grammar only allows the '(' in a case_item or case_item_ns.

Yes. as you will have seen from my later reply (to my own message) I
realised that later.

  |    case x in ( (x) echo match ;; esac
  |
  | is rejected because that first '(' does change the parse state, making 
  | the second '(' invalid.

Only a single '(' was ever dropped, its point was to balance the
')' that follows the pattern (for things that count parentheses,
one of which was for (unworkably) finding cmdsub termination in shells,
another is simply editors which believe that one should never write a ')'
without a matching preceding '(' and object, or do other strange things,
if you try.)

But I recall the original NetBSD sh code (and so probably original ash
code, though I haven't checked) which did this, which was (paraphrased)
while parsing the patterns (and their code) in a "case" statement

        if (token == '(')
                token = next_token();

        if (token == ESAC)                      /* not quite as simple as that, 
but ... */
                break;                  /* from the enclosing loop which
                                           loops over the patterns, and code */

        if (token != WORD)
                error(...);

        parse_pattern();
        while (token == '|') {
                token = next_token();
                parse_pattern();
        }

        parse_cmd_list("end with ;; or esac");
        
        if (token == DSEMI)
                token = next_token();
        else if (token != ESAC)
                error(...);

        if (token == ESAC)
                break;


(or something like that) when a '(' was seen just before a pattern might
appear, it was simply discarded, and the next token considered instead.

A second '(' at that point would be an error, as expected.

Our code has since changes a little.

So, perhaps "changes nothing about the parse state" is not quite correct,
in that having seen a '(' we don't look for any more of them, so we have
in effect remembered that it has already been dropped (if it appeared),
but at least originally, it truly had no other effect at all.

  | What I believe POSIX currently specifies is that "esac" is treated as an 
  | Esac token even when it follows '(', resulting in a syntax error.
  |
  | What I believe shells should do is what Geoff believes POSIX already 
  | requires, which is treat "esac" as a WORD token when it follows "(".

I can believe that either of those is what the standard currently says.
That's partly why we need an improvement.

And yes, I support making it very clear that no Esac can follow the '('.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001454]: Conflict between "case" description and grammar

Reply via email to