[CC JPMS expert mailing list because, it's an important issue IMO] I've a counter proposition.
I do not like your proposal because from the user point of view, '^' looks like a hack, it's not used anywhere else in the grammar. I agree that restricted keywords are not properly specified in JLS. Reading your mail, i've discovered that what i was calling restricted keywords is not what javac implements :( I agree that restricted keywords should be only enabled when parsing module-info.java I agree that doing error recovery on the way the grammar for module-info is currently implemented in javac leads to less than ideal error messages. In my opinion, both module m { requires transitive transitive; } module m { requires transitive; } should be rejected because what javac implements something more close to the javascript ASI rules than restricted keywords as currently specified by Alex. For me, a restricted keyword is a keyword which is activated if you are at a position in the grammar where it can be recognized and because it's a keyword, it tooks over an identifier. by example for module m { if the next token is 'requires', it should be recognized as a keyword because you can parse a directive 'required ...' so there is a production that will starts with the 'required' keyword. so module m { requires transitive; } should be rejected because transitive should be recognized as a keyword after requires and the compiler should report a missing module name. and module m { requires transitive transitive; } should be rejected because the grammar that parse the modifiers is defined as "a loop" so from the grammar point of view it's like module m { requires Modifier Modifier; } so the the front end of the compiler should report a missing module name and a later phase should report that there is twice the same modifier 'transitive'. I believe that with this definition of 'restricted keyword', compiler can recover error more easily and offers meaningful error message and the module-info part of the grammar is LR(1). regards, Rémi ----- Mail original ----- > De: "Stephan Herrmann" <stephan.herrm...@berlin.de> > À: jigsaw-dev@openjdk.java.net > Envoyé: Mardi 9 Mai 2017 16:56:11 > Objet: An alternative to "restricted keywords" > (1) I understand the need for avoiding that new module-related > keywords conflict with existing code, where these words may be used > as identifiers. Moreover, it must be possible for a module declaration > to refer to packages or types thusly named. > > However, > > (2) The currently proposed "restricted keywords" are not appropriately > specified in JLS. > > (3) The currently proposed "restricted keywords" pose difficulties to > the implementation of all tools that need to parse a module declaration. > > (4) A simple alternative to "restricted keywords" exists, which has not > received the attention it deserves. > > Details: > > (2) The current specification implicitly violates the assumption that > parsing can be performed on the basis of a token stream produced by > a scanner (aka lexer). From discussion on this list we learned that > the following examples are intended to be syntactically legal: > module m { requires transitive transitive; } > module m { requires transitive; } > (Please for the moment disregard heuristic solutions, while we are > investigating whether generally "restricted keywords" is a well-defined > concept, or not.) > Of the three occurrences of "transitive", #1 is a keyword, the others > are identifiers. At the point when the parser has consumed "requires" > and now asks about classification of the word "transitive", the scanner > cannot possible answer this classification. It can only answer for sure, > after the *parser* has accepted the full declaration. Put differently, > the parser must consume more tokens than have been classified by the > Scanner. Put differently, to faithfully parse arbitrary grammars using > a concept of "restricted keywords", scanners must provide speculative > answers, which may later need to be revised by backtracking or similar > exhaustive exploration of the space of possible interpretations. > > The specification is totally silent about this fundamental change. > > > (3) "restricted keywords" pose three problems to tool implementations: > > (3.a) Any known practical approach to implement a parser with > "restricted keywords" requires to leverage heuristics, which are based > on the exact set of rules defined in the grammar. Such heuristics > reduce the look-ahead that needs to be performed by the scanner, > in order to avoid the full exhaustive exploration mentioned above. > A set of such heuristic is extremely fragile and can easily break when > later more rules are added to the grammar. This means small future > language changes can easily break any chosen strategy. > > (3.b) If parsing works for error-free input, this doesn't imply that > a parser will be able to give any useful answer for input with syntax > errors. As a worst-case example consider an arbitrary input sequence > consisting of just the two words "requires" and "transitive" in random > order and with no punctuation. > A parser will not be able to detect any structure in this sequence. > By comparison, normal keywords serve as a baseline, where parsing > typically can resume regardless of any leading garbage. > While this is not relevant for normal compilation, it is paramount > for assistive functions, which most of the time operate on incomplete > text, likely to contain even syntax errors. > Strictly speaking, any "module declaration" with syntax errors is > not a ModuleDeclaration, and thus none of the "restrictive keywords" > can be interpreted as keywords (which per JLS can only happen inside > a ModuleDeclaration). > All this means, that functionality like code completion is > systematically broken in a language using "restricted keywords". > > (3.c) Other IDE functionality assumes that small fragments of the > input text can be scanned out of context. The classical example here > is syntax highlighting but there are more examples. > Any such functionality has to be re-implemented, replacing the > highly efficient local scanning with full parsing of the input text. > For functionality that is implicitly invoked per keystroke, or on > mouse hover etc, this difference in efficiency negatively affects > the overall user experience of an IDE. > > > (4) The following proposal avoids all difficulties described above: > > * open, module, requires, transitive, exports, opens, to, uses, > provides, and with are "module words", to which the following > interpretation is applied: > * within any ordinary compilation unit, a module word is a normal > identifier. > * within a modular compilation unit, all module words are > (unconditional) keywords. > * We introduce three new auxiliary non-terminals: > LegacyPackageName: > LegacyIdentifier > LegacyPackageName . LegacyIdentifier > LegacyTypeName: > LegacyIdentifier > LegacyTypeName . LegacyIdentifier > LegacyIdentifier: > Identifier > ^open > ^module > ... > ^with > * We modify all productions in 7.7, replacing PackageName with > LegacyPackageName and replacing TypeName with LegacyTypeName. > * After parsing, each of the words '^open', '^module' etc. > is interpreted by removing the leading '^' (escape character). > > Here, '^' is chosen as the escape character following the precedent > of Xtext. Plenty of other options for this purpose are possible, too. > > > > This proposal completely satisfies the requirements (1), and avoids > all of the problems (2) and (3). There's an obvious price to pay: > users will have to add the escape character when referring to code > that uses a module word as a package name or type name. > > Not only is this a very low price compared to the benefits; one can > even argue that it also helps the human reader of a module declaration, > because it clearly marks which occurrences of a module word are indeed > identifiers. > > An IDE can easily help in interactively adding escapes where necessary. > > Finally, in this trade-off it is relevant to consider the expected > frequencies: legacy names (needing escape) will surely be the exception > - by magnitudes. So, the little price needing to be paid, will only > affect a comparatively small number of locations. > > > Stephan