Re: An alternative to "restricted keywords" + helping automatic modules

forax Thu, 18 May 2017 23:24:01 -0700

----- Mail original -----
> De: "Stephan Herrmann" <stephan.herrm...@berlin.de>
> À: "Remi Forax" <fo...@univ-mlv.fr>, jigsaw-dev@openjdk.java.net
> Envoyé: Jeudi 18 Mai 2017 10:59:09
> Objet: Re: An alternative to "restricted keywords" + helping automatic modules


> Remi,

Stephan,

>
> I see your proposal as a minimal compromise, avoiding the worst
> of difficulties, but I think we can do better.

better is usually a bitter enemy

>
> Trade-off:
> In all posts I could not find a real reason against escaping,
> aside from aesthetics. I don't see this as sufficient motivation
> for a less-then-perfect solution.
>
>
> Clarity:
> I'm still not completely following your explanations, partly because
> of the jargon you are using. I'll leave it to Alex to decide if he
> likes the idea that JLS would have to explain terms like dotted
> production.

Sorry for the jargon, dotted production is the same thing as a LR parser item 
[1],
the dot mark the parsing position inside a production.
i used 'dotted production' instead of parser item because usually it's clearer 
for my students.

>
> Compare this to just adding a few more rules to the grammar,
> where no hand-waving is needed for an explanation.
> No, I did not say that escaping is a pervasive change.
> I never said that the grammar for ordinary compilation units
> should be changed.
> If you like we only need to extend one rule for the scope of
> modular compilation units: Identifier. It can't get simpler.
>

I do not like ^ because
- as John said, esthetics is important
- it pushes the burden to the developers and not to the guys that implements 
the grammar.
  I'm Ok to makes your life and the life of people that implement the Java 
grammar (me included) less fun if for all other Java developers it just works, 
given the scale of the Java community, it seems to be a good compromise.
- ^ has to be a pervasive change, i mean it can be specified as a change only 
for module-info but from the developers point of view, it will be weird if you 
introduce ^ in module-info and not introduce it in the whole grammar
  so it's a global solution to local problem.  

so in my opinion, it's not that ^ does not work, as you said, it works in 
Xtend, it's that ^ is a escape hatch, it's better to use it when all other 
solutions do not work.

>
> Completeness:
> I understand you as saying, module names cannot start with
> "transitive". Mind you, that every modifier that will be added
> to the grammar for modules in the future will cause conflicts for
> names that are now legal, and you won't have a means to resolve this.
>
> By contrast, we can use the escaping approach even to solve one
> more problem that has been briefly touched on this list before:
>
> Automatic modules suffer from the fact that some artifact names may
> have Java keywords in their name, which means that these artifacts
> simply cannot be used as automatic modules, right?
> Why not apply escaping also here? *Any* dot-separated sequence
> of words could be used as module name, as long as module references
> have a means to escape any keywords in that sequence.
>
>
> Suitability for implementation:
> As said, your proposal resolves one problem, but still IDE
> functionality suffers from restricted keywords, because scanning
> and parsing need more context information than normal.
> - Recovery after a syntax error will regress.

Error recovery will not regress in all existing java file because restricted 
keyword only works when parsing the module-info.
And technically, there is no regression possible because the module-info was 
not existing before.
So error recovery after a syntax error in a module-info may be less fun to 
handle, as i said above, i'm ok with that.


> - Scanning arbitrary regions of code is not possible.

Scanning an arbitrary region is not easy in general, by example, you have if 
you are inside or outside a string, so you have to keep some information to be 
able to scan a region, why not trying to keep the parser state when necessary.
As John said, it seems to be a nice problem for grad students and at worst, you 
can use the existing code, it will display a restricted keyword in bold in the 
middle of a package name, that's all. 

> Remember:
> In an IDE code with syntax errors is the norm, not an exception,
> as the IDE provides functionality to work on incomplete code.
>
>
> Stephan

Rémi

[1] https://en.wikipedia.org/wiki/Canonical_LR_parser

>
>
> On 18.05.2017 00:34, Remi Forax wrote:
>> I want to answer this before we start the meetings because i really think 
>> that
>> restricted keyword as i propose solve the issues Stephan raised.
>>
>>
>> ----- Mail original -----
>>> De: "Stephan Herrmann" <stephan.herrm...@berlin.de>
>>> À: jigsaw-dev@openjdk.java.net
>>> Envoyé: Mardi 16 Mai 2017 11:49:45
>>> Objet: Re: An alternative to "restricted keywords"
>>
>>> Thanks, Remi, for taking this to the EG list.
>>>
>>> Some collected responses:
>>>
>>>
>>> Remi: "from the user point of view, '^' looks like a hack"
>>>
>>> This is, of course, a subjective statement. I don't share this view
>>> and in years of experience with Xtext-languages (where this concept
>>> is used by default) I never heard any user complain about this.
>>>
>>> More importantly, I hold that such aesthetic considerations are of
>>> much lesser significance than the question, whether we can explain
>>> - unambiguously explain - the concept in a few simple sentences.
>>> Explaining must be possible at two levels: in a rigorous specification
>>> and in simple words for users of the language.
>>
>> I'm not against ^, or ` as it has already asked to escape an identifier, but 
>> as
>> you said it's a pervasive change that applies on the whole grammar while i
>> think that with restricted keyword (that really should be called local
>> keywords) the changes only impact the grammar that specifies a 
>> module-info.java
>>
>>>
>>> Remi: "a keyword which is activated if you are at a position in the
>>>  grammar where it can be recognized".
>>>
>>> I don't think 'being at a position in the grammar' is a good way of
>>> explaining. Parsing doesn't generally have one position in a grammar,
>>> multiple productions can be active in the same parser state.
>>> Also speaking of a "loop" for modifiers seems to complicate matters
>>> more than necessary.
>>>
>>> Under these considerations I still see '^' as the clearest of all
>>> solutions. Clear as a specification, simple to explain to users.
>>
>> Eclipse uses a LR parser, for a LR parser, position == dotted production as i
>> have written earlier, so no problem because it corresponds to only one parser
>> state.  Note that even if one do not use an LR or a LL parser, most hand
>> written parser i've seen, javac is one of them, also refers to dotted
>> production in the comments of the corresponding methods.
>>
>>>
>>>
>>>
>>> Peter spoke about module names vs. package names.
>>>
>>> I think we agree, that module names cannot use "module words",
>>> whereas package names should be expected to contain them.
>>
>> yes, that the main issue, package names may contains unqualified name like
>> 'transitive, ''with' or 'to'.
>> but i think people will also want to use existing package or more exactly 
>> prefix
>> of existing package as module name, so we should also support having 
>> restricted
>> keyword name as part of a module name.
>>
>> The grammar is:
>>
>>   open? module module_name {
>>     requires (transitive | static)* module_name;
>>     exports package_name;
>>     exports package_name to module_name1, module_name2;
>>     opens package_name;
>>     opens package_name to module_name1, module_name2;
>>     uses xxx;
>>     provides xxx with xxx, yyy;
>>   }
>>
>> If we just consider package name, only 'opens' and 'exports' are followed by 
>> a
>> package name and a package name can only been followed by ';' or 'to', so 
>> once
>> 'opens' is parsed, you know that you can have only an identifier so if it's 
>> not
>> an identifier by one of the restricted keywords, it should be considered as 
>> an
>> identifier.
>>
>> As i said earlier, the scanner can see the restricted keyword as keyword and
>> before feeding the token to the parser, you can check the parser state to see
>> if the keyword as to be lowered to an identifier or not.
>>
>> For module name, there is the supplementary problem of transitive, because 
>> if a
>> module starts with transitive, you can have a conflict. As i said earlier,
>> instead of using the next token to know if transitive is the keyword or part 
>> of
>> the module name, i think we should consider it as a keyword, as the JLS said 
>> a
>> restricted keyword is activated when it can appear, so "requires transitive" 
>> is
>> not a valid directive.
>>
>>>
>>> Remi: "you should use reverse DNS naming for package so no problem :)"
>>>
>>> "to" is a "module word" and a TLD.
>>> I think we should be very careful in judging that a existing conflict
>>> is not a real problem. Better to clearly and rigorously avoid the
>>> conflict in the first place.
>>
>> to as the first part of a package/module and to as in exports ... to can not 
>> be
>> present on the same dotted production, because exports as to be followed by a
>> package_name so 'to' here means the start of a package name and then because 
>> a
>> package name can not ends with '.' you always know if you are inside the
>> production recognizing the package_name or outside matching the to of the
>> directive exports.
>>
>>>
>>>
>>>
>>> Some additional notes from my side:
>>>
>>> In the escape-approach, it may be prudent to technically allow
>>> escaping even words that are identifiers in Java 9, but could become
>>> keywords in a future version. This ensures that modules which need
>>> more escaping in Java 9+X can still be parsed in Java 9.
>>
>> yes, that's why i think that escaping is not the right mechanism here, 
>> because
>> we want to solve a very local problem so we do not need a global grammar-wise
>> way to solve our problem.
>>
>>>
>>>
>>> Current focus was on names of modules, packages and types.
>>> A complete solution must also give an answer for annotations on modules.
>>> Some possible solutions:
>>> a. Assume that annotations for modules are designed with modules in mind
>>>    and thus have to avoid any module words in their names.
>>> b. Support escaping also in annotations
>>> c. Refine the scope where "module words" are keywords, let it start only
>>>    when the word "module" or the group "open module" has been consumed.
>>>    This would make the words "module" and "open" special, as being
>>>    switch words, where we switch from one language to another.
>>>    (For this I previously coined the term "scoped keywords" [1])
>>
>> For annotation, again, because annotation name are qualified, you now when 
>> you
>> see 'module' if you are in the middle of the annotation name or if you are
>> outside.
>>
>>>
>>>
>>> I think we all agree that the conflicts we are solving here are rare
>>> corner cases. Most names do not contain module words. Still, from a
>>> conceptual and technical p.o.v. the solution must be bullet proof.
>>> But there's no need to be afraid of module declarations being spammed
>>> with dozens of '^' characters. Realistically, this will not happen.
>>>
>>
>> I agree, and i strongly believe that scoped keyword, local keywords or
>> restricted keywords, i.e. whatever the name, keywords that are keywords or
>> identifiers depending on the parser state are the general mechanism that 
>> solve
>> our problem.
>>
>>> Stephan
>>>
>>> [1] http://www.objectteams.org/def/1.3/sA.html#sA.0.1
>>>
>>
>> Rémi

Re: An alternative to "restricted keywords" + helping automatic modules

Reply via email to