Date: Fri, 19 Feb 2021 15:11:58 +0000 From: "Harald van Dijk via austin-group-l at The Open Group" <austin-group-l@opengroup.org> Message-ID: <4b4f2cbf-2a2e-f0bf-34ca-a7357f99c...@gigawatt.nl>
| Observe that rule 4 is applied for the first word in a pattern even if | that pattern follows an opening parenthesis. Yes. | Because of that, in my | example, the esac in parentheses is interpreted as the esac keyword | token, not a regular WORD token that makes for a valid pattern. Yes. | Your change would make it so that since the esac keyword is not | acceptable at that point in the grammar, Is it not? Why not? The statement "case foo in (esac" is valid according to the grammar, just as "case foo in esac" is. When the '(' was added, it was added (in shells) as a throw away token, which changes nothing about the parse state, and is permitted merely to match the ')' required after the pattern (both for user style reasons, and to handle this common usage of a lone ')' for shells that used parentheses counting to find the end of a $() cmdsub ... the latter doesn't work anyway, and is largely gone everywhere now, but the optional (and supposedly meaningless) '(' remains. The issue here is that you seem to be expecting the shell to convert "esac" to Esac only when the complete statement would then be valid. That is, in: case esac in (esac) echo match esac the first esac is clearly just a WORD (must be), the second is the one in question, if that becomes Esac the statement to that point is valid. However, the following ')' is now not valid, as it would be the termination of something (a subshell, a cmdsub, arithmetic,...) and no such thing exists. Similarly the 3rd "esac" (which is in a command word position, so rule 1 applies, and it becomes Esac would be invalid, as there's no (unterminated) preceding "case" for it to be connected to. On the other hand, interpret the 2nd "esac" as WORD, it becomes a pattern, and the whole statement parses correctly. Looks like that is what should be done. But that's impossible, and isn't what Geoff proposed, we cannot have the shell parse the entire input looking for every possible variation of what is a reserved word, or just WORD, hoping to find some combination that might just work to make the whole script (or even just list, or compound statement) legal. "acceptable at that point" means the parser does a shift or reduce when it encounters the token, rather than backtrack. That's all it can reasonably possibly mean (though what "acceptable" perhaps ought to be better spelled out). Consider two scripts (written here side by side in 2 columns for comparison) case esac in case esac in (esac) (esac) echo match echo match # 500 more lines of # the same 500 more lines of # executable code dropped # executable code dropped esac # nothing here Are you really saying that in the left side, the 2nd "esac" must be WORD, and in the right side script, the 2nd "esac" must be Esac, because those are the only interpretations that make the existence of that final "esac" in the left side, and its lack, on the right side, be valid ? Really? That would be completely unworkable. We could, of course, change the grammar, to match what most shells do now, and not allow the Esac after a '(' (just as it isn't allowed after a '|' - though in that context it would make no sense at all, empty patterns are not defined (omitted patters, yes, but not empty ones). As things are now, to me it looks as if (of the shells I test) only zsh gets things right (as defined by posix currently): zsh $ case esac in (esac) echo match esac zsh: parse error near `)' which is exactly as I predicted above. bash's behaviour is a little weird: bash5 $ case esac in (esac) echo match -bash: syntax error near unexpected token `esac' bash5 $ esac -bash: syntax error near unexpected token `esac' It is obviously converting the "esac" to Esac, which is correct according to POSIX, but them apparently expecting it to be a pattern, which is not correct, it should be terminating the case statement (as zsh does) making it be that the following ')' is incorrect. Everything else I tested treats the entire statement as valid (and echos "match"), the only way that can happen is if that 2nd "esac" turns into WORD after the '(' which gives the '(' a grammatical meaning it never originally had. (And yes, that includes the NetBSD sh). | it would not be interpreted as | the esac keyword, unless I am misreading your change. I don't see that in the proposed change. However, perhaps, given that almost all shells interpret it as a WORD, and not Esac, perhaps the grammar ought to be changed to make that the correct interpretation. Geoff, I will ponder more whether all other rules, except rule 1, work properly with the "acceptable at that point in the grammar" included. I'm not sure yet (because I haven't gone through them all). kre ps: note that in the example tested, if that '(' before the "pattern" is omitted, all shells appear to do the right thing, interpreting the 2nd "esac" as Esac and using that to terminate the case (what problem they object to in what comes next, which is clearly not valid, varies from shell to shell). pps: everyone, please ignore shareware_systems' utter nonsense.