Issue7+TC2 0001454]: Conflict between "case" description and grammar

Robert Elz via austin-group-l at The Open Group Fri, 19 Feb 2021 09:57:13 -0800

    Date:        Fri, 19 Feb 2021 15:11:58 +0000
    From:        "Harald van Dijk via austin-group-l at The Open Group" 
<austin-group-l@opengroup.org>
    Message-ID:  <4b4f2cbf-2a2e-f0bf-34ca-a7357f99c...@gigawatt.nl>


  | Observe that rule 4 is applied for the first word in a pattern even if 
  | that pattern follows an opening parenthesis.

Yes.

  | Because of that, in my 
  | example, the esac in parentheses is interpreted as the esac keyword 
  | token, not a regular WORD token that makes for a valid pattern.

Yes.

  | Your change would make it so that since the esac keyword is not 
  | acceptable at that point in the grammar,

Is it not?   Why not?

The statement "case foo in (esac" is valid according to the grammar,
just as "case foo in esac" is.

When the '(' was added, it was added (in shells) as a throw away token,
which changes nothing about the parse state, and is permitted merely to
match the ')' required after the pattern (both for user style reasons,
and to handle this common usage of a lone ')' for shells that used
parentheses counting to find the end of a $() cmdsub ... the latter doesn't
work anyway, and is largely gone everywhere now, but the optional (and
supposedly meaningless) '(' remains.

The issue here is that you seem to be expecting the shell to convert
"esac" to Esac only when the complete statement would then be valid.

That is, in:

    case esac in
    (esac) echo match
    esac

the first esac is clearly just a WORD (must be), the second is the
one in question, if that becomes Esac the statement to that point is
valid.  However, the following ')' is now not valid, as it would be
the termination of something (a subshell, a cmdsub, arithmetic,...) and
no such thing exists.   Similarly the 3rd "esac" (which is in a command
word position, so rule 1 applies, and it becomes Esac would be invalid,
as there's no (unterminated) preceding "case" for it to be connected to.

On the other hand, interpret the 2nd "esac" as WORD, it becomes a pattern,
and the whole statement parses correctly.   Looks like that is what should
be done.

But that's impossible, and isn't what Geoff proposed, we cannot have the
shell parse the entire input looking for every possible variation of what
is a reserved word, or just WORD, hoping to find some combination that
might just work to make the whole script (or even just list, or compound
statement) legal.   "acceptable at that point" means the parser does a
shift or reduce when it encounters the token, rather than backtrack.
That's all it can reasonably possibly mean (though what "acceptable"
perhaps ought to be better spelled out).


Consider two scripts (written here side by side in 2 columns
for comparison)

        case esac in                                    case esac in
        (esac)                                  (esac)
                echo match                              echo match

                # 500 more lines of                             # the same 500 
more lines of
                # executable code dropped       # executable code dropped

        esac                                    # nothing here

Are you really saying that in the left side, the 2nd "esac" must be WORD,
and in the right side script, the 2nd "esac" must be Esac, because those
are the only interpretations that make the existence of that final "esac"
in the left side, and its lack, on the right side, be valid ?

Really?

That would be completely unworkable.

We could, of course, change the grammar, to match what most shells do
now, and not allow the Esac after a '(' (just as it isn't allowed after
a '|' - though in that context it would make no sense at all, empty
patterns are not defined (omitted patters, yes, but not empty ones).

As things are now, to me it looks as if (of the shells I test) only
zsh gets things right (as defined by posix currently):

zsh $    case esac in
   (esac) echo match
   esac
zsh: parse error near `)'

which is exactly as I predicted above.

bash's behaviour is a little weird:

bash5 $    case esac in
   (esac) echo match
-bash: syntax error near unexpected token `esac'
bash5 $    esac
-bash: syntax error near unexpected token `esac'

It is obviously converting the "esac" to Esac, which is correct according
to POSIX, but them apparently expecting it to be a pattern, which is
not correct, it should be terminating the case statement (as zsh
does) making it be that the following ')' is incorrect.

Everything else I tested treats the entire statement as valid (and echos 
"match"), the only way that can happen is if that 2nd "esac" turns into
WORD after the '(' which gives the '(' a grammatical meaning it never
originally had.   (And yes, that includes the NetBSD sh).

  | it would not be interpreted as 
  | the esac keyword, unless I am misreading your change.

I don't see that in the proposed change.   However, perhaps, given that
almost all shells interpret it as a WORD, and not Esac, perhaps the
grammar ought to be changed to make that the correct interpretation.

Geoff, I will ponder more whether all other rules, except rule 1, work
properly with the "acceptable at that point in the grammar" included.
I'm not sure yet (because I haven't gone through them all).

kre

ps: note that in the example tested, if that '(' before the "pattern" is
omitted, all shells appear to do the right thing, interpreting the 2nd
"esac" as Esac and using that to terminate the case (what problem they
object to in what comes next, which is clearly not valid, varies from
shell to shell).

pps: everyone, please ignore shareware_systems' utter nonsense.

Re: [1003.1(2016/18)/Issue7+TC2 0001454]: Conflict between "case" description and grammar

Reply via email to