Re: An alternative to "restricted keywords" + helping automatic modules

Stephan Herrmann Fri, 19 May 2017 14:37:38 -0700

Meanwhile we seem to have (at least) 4 proposals on the table.


Here's my biased summary:

(A) JLS up until 2017-05-18
PRO:
+ we have a specification
+ spec can be interpreted as allowing all module words to be mentionable
  in all relevant directives.
CON:
- the specification is interpreted differently by different experts
- the interpretation that Alex intends, requires parsing to be done
  ahead of scanning, which breaks established compiler technology.


(B) Remi: in case of ambiguity interpret as keyword, not identifier
PRO:
+ removes the need to parse more than you have scanned.
CON:
- makes "transitive" unmentionable as the first segment in a module name,
  and will cause the same effect on any modifier that may be added in
  the future.
- still requires stateful scanning, so typically syntax highlighting
  will be partly broken, still. (Isn't that an aesthetic aspect, too?)
  Other IDE functions are affected, too.


(C) Remi + Stephen: disambiguate by adding "module" before each module 
reference.
PRO:
+ removes the need to parse more than you have scanned.
+ avoids restriction regarding "transitive" and future modifiers.
CON:
- still requires stateful scanning. Implications see above


(D) Stephan: Escape module words to use them as identifier
PRO:
+ all module words can be used in package & module references
+ allows referring to modules which have Java keywords in their name
+ avoids adding new technical complexity
+ clearly specified in the proposal, mostly using standard concepts
CON:
- some find the occasional escape character (aesthetically) unpleasant


People favoring (B) or (C) could further promote their case by
providing a (near) formal specification.


For anybody interested in further technical implications, I'd be happy
to provide pointers to plenty of IDE functions that would be broken
(to different degrees) by (A) - (C).
When I spoke about bad error recovery, I wasn't complaining about
a few more man days of engineering work, but about a conceptual
impossibility to achieve the quality that users should expect.


Let me add my interpretation of why we are in this strange situation
in the first place:
The language of module-info is an unusual mix of:
- something like a DSL for declaring API and dependencies of modules
- a subset of Java
If we would take away the complexities of Java annotations and Java comments,
nobody would mind hand-coding an arbitrarily tricky parser that easily meets
all relevant goals. But nobody will hand-code a parser that is able to parse
a significant sub-set of Java.
It's the mix of both natures in one language that creates the conflict.

Finally, please don't take this as an issue of
    language design *vs.* tool implementation.
We can only make our users happy, if both aspects smoothly integrate,

Stephan


On 19.05.2017 18:51, fo...@univ-mlv.fr wrote:

----- Mail original -----

De: "Stephan Herrmann" <stephan.herrm...@berlin.de>
À: fo...@univ-mlv.fr, jigsaw-dev@openjdk.java.net
Envoyé: Vendredi 19 Mai 2017 17:26:02
Objet: Re: An alternative to "restricted keywords" + helping automatic modules

Inline

On 19.05.2017 15:53, fo...@univ-mlv.fr wrote:



------------------------------------------------------------------------------------------------------------------------------------

    *De: *"Stephan Herrmann" <stephan.herrm...@berlin.de>
    *À: *"John Rose" <john.r.r...@oracle.com>, jigsaw-dev@openjdk.java.net
    *Cc: *"Rémi Forax" <fo...@univ-mlv.fr>
    *Envoyé: *Vendredi 19 Mai 2017 12:37:07
    *Objet: *Re: Re: An alternative to "restricted keywords" + helping automatic
    modules

    A quick question to keep the ball rolling:

    Do we agree on the following assessment of the status quo?

      The definition of "restricted keywords" implies (without explicitly 
saying so),
      that classification of a word as keyword vs. identifier can only be made
      *after* parsing has accepted the enclosing ModuleDeclaration.
      (With some tweaks, this can be narrowed down to
       "after the enclosing ModuleDirective has been accepted")

      This definition is not acceptable.


I agree that this is not acceptable but this is not what we are proposing.


Who is "we"?

Note that your proposal let me conclude that "transitive" is not a legal
start of a module reference. If that is not what you intend, please provide
a specification-like description of what you have in mind.
Probably Stephen's proposal will come in handy for this issue?


transitive is not a valid start of a module name if you want to use it in a 
requires directive in Java,
but it's a valid module name for the JVM, you can create a module-info.class in 
another language than Java.


Your notes about possible implementation may help when we come to implementing,
but right now they may also distract from understanding the intention.


We have gone into the rabbit hole of talking about implementation because you ask to it's your 
point (3) ""restricted keywords" pose three problems to tool implementations".
The intention is to introduce restricted keywords (i prefer local keywords), to quote the 
the current draft of the JLS: "They are keywords solely where they appear as 
terminals in the ModuleDeclaration production (§7.7), and are identifiers everywhere 
else", so developers will not have to change all their Java codes because open, 
module, requires, transitive, exports, opens, to, uses, provides, and with are only 
keywords activated locally in module-info.java.


Stephan


Rémi


You do not have to wait the reduction of ModuleDeclaration (or ModuleDirective),
the parser know its parsing state (the LR item)
during the parsing not at the end.
The LR analysis is not able to know at some point during the parsing which
production will be reduced later but it is able to know
which terminals will not lead to an error when shifting the next terminal.

When you are in the middle of the parsing, the parser shift a terminal to go
from one state to another, so for a state the parser
knows if it can shift by a terminal which is among the set of restricted
keywords or not then either it can instruct the lexer
before scanning the token to activate the restricted keyword automata or after
having scanned the token it can classify the token as
a keyword instead of as an identifier.

The idea is that the parser will not only tell when it reduces a production but
also when it is about to shift a restricted keyword.
So you can classify a token as an identifier or as a keyword because the parser
is able to bubble up that its parser state (the LR
item) may recognize a keyword.



    comments?
    Stephan


Rémi


    ----- ursprüngliche Nachricht ---------

    Subject: Re: An alternative to "restricted keywords" + helping automatic 
modules
    Date: Fr 19 Mai 2017 07:27:31 CEST
    From: John Rose<john.r.r...@oracle.com>
    To: Stephan Herrmann<stephan.herrm...@berlin.de>

    On May 18, 2017, at 1:59 AM, Stephan Herrmann <stephan.herrm...@berlin.de
    <mailto:stephan.herrm...@berlin.de>> wrote:


        In all posts I could not find a real reason against escaping,
        aside from aesthetics. I don't see this as sufficient motivation
        for a less-then-perfect solution.


    So, by disregarding esthetics...


        Clarity:
        I'm still not completely following your explanations, partly because
        of the jargon you are using. I'll leave it to Alex to decide if he
        likes the idea that JLS would have to explain terms like dotted
        production.

        Compare this to just adding a few more rules to the grammar,
        where no hand-waving is needed for an explanation.
        No, I did not say that escaping is a pervasive change.
        I never said that the grammar for ordinary compilation units
        should be changed.
        If you like we only need to extend one rule for the scope of
        modular compilation units: Identifier. It can't get simpler.


        Completeness:
        I understand you as saying, module names cannot start with
        "transitive". Mind you, that every modifier that will be added
        to the grammar for modules in the future will cause conflicts for
        names that are now legal, and you won't have a means to resolve this.

        By contrast, we can use the escaping approach even to solve one
        more problem that has been briefly touched on this list before:

        Automatic modules suffer from the fact that some artifact names may
        have Java keywords in their name, which means that these artifacts
        simply cannot be used as automatic modules, right?
        Why not apply escaping also here? *Any* dot-separated sequence
        of words could be used as module name, as long as module references
        have a means to escape any keywords in that sequence.


        Suitability for implementation:
        As said, your proposal resolves one problem, but still IDE
        functionality suffers from restricted keywords, because scanning
        and parsing need more context information than normal.


    …we obtain the freedom for IDEs to disregard abnormal
    amounts of context, saving uncounted machine cycles,

        - Recovery after a syntax error will regress.


    …and we make life easier for all ten writers of error recovery
    functions,

        - Scanning arbitrary regions of code is not possible.


    …we unleash the power of an army of grad students to study
    bidirectional parsing of module files,

        Remember:
        In an IDE code with syntax errors is the norm, not an exception,
        as the IDE provides functionality to work on incomplete code.


    …and ease the burdens of the thousands who must spend their
    time looking at syntax errors for their broken module files.

    Nope, not for me.  Give me esthetics, please.  Really.

    — John


    ---- ursprüngliche Nachricht Ende ----

Re: An alternative to "restricted keywords" + helping automatic modules

Reply via email to