[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171092#comment-15171092
 ] 

Jack Krupansky commented on SOLR-8110:
--------------------------------------

bq.  lucene expressions

I was going to say that Luceene Expressions are basically JavaScript, but... 
they are sort-of based on JS, but really more of a conceptual rather than 
literal basis. Here's Lucene's grammar rule for VARIABLE:

{code}
VARIABLE: ID ARRAY* ( [.] ID ARRAY* )*;
fragment ARRAY: [[] ( STRING | INTEGER ) [\]];
fragment ID: [_$a-zA-Z] [_$a-zA-Z0-9]*;
fragment STRING
    : ['] ( '\\\'' | '\\\\' | ~[\\'] )*? [']
    | ["] ( '\\"' | '\\\\' | ~[\\"] )*? ["]
    ;
{code}

See:
https://github.com/apache/lucene-solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressions/js/Javascript.g4

No Unicode support, no random special characters, just $ and _, but apparently 
dot as well.

An ID is:

{code}
ID: [_$a-zA-Z] [_$a-zA-Z0-9]*
{code}

And any number of IDs can be written with dots between them to represent a 
single VARIABLE token.

JavaScript identifiers are defined in the ECMAScript spec:
https://tc39.github.io/ecma262/#prod-IdentifierName

Letters in Java/ECMAScript are Unicode as defined by the Unicode property 
“ID_Start” and "ID_Continue". Java/ECMAScript supports $ and _ in addition to 
letters.

Identifier start and continue character types are defined by the Unicode UAX#31 
 Identifier spec:
http://unicode.org/reports/tr31/


> Start enforcing field naming recomendations in next X.0 release?
> ----------------------------------------------------------------
>
>                 Key: SOLR-8110
>                 URL: https://issues.apache.org/jira/browse/SOLR-8110
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>         Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to