[ 
https://issues.apache.org/jira/browse/LUCENE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-5014:
--------------------------------

    Attachment: LUCENE-5014.txt

Patch without binary files (if possible, use the other patch)
                
> ANTLR Lucene query parser
> -------------------------
>
>                 Key: LUCENE-5014
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5014
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser, modules/queryparser
>    Affects Versions: 4.3
>         Environment: all
>            Reporter: Roman Chyla
>              Labels: antlr, query, queryparser
>         Attachments: LUCENE-5014.txt
>
>
> I would like to propose a new way of building query parsers for Lucene.  
> Currently, most Lucene parsers are hard to extend because they are either 
> written in Java (ie. the SOLR query parser, or edismax) or the parsing logic 
> is 'married' with the query building logic (i.e. the standard lucene parser, 
> generated by JavaCC) - which makes any extension really hard.
> Few years back, Lucene got the contrib/modern query parser (later renamed to 
> 'flexible'), yet that parser didn't become a star (it must be very confusing 
> for many users). However, that parsing framework is very powerful! And it is 
> a real pity that there aren't more parsers already using it - because it 
> allows us to add/extend/change almost any aspect of the query parsing. 
> So, if we combine ANTLR + queryparser.flexible, we can get very powerful 
> framework for building almost any query language one can think of. And I hope 
> this extension can become useful.
> The details:
>  - every new query syntax is written in EBNF, it lives in separate files (and 
> can be tested/developed independently - using 'gunit')
>  - ANTLR parser generates parsing code (and it can generate parsers in 
> several languages, the main target is Java, but it can also do Python - which 
> may be interesting for pylucene)
>  - the parser generates AST (abstract syntax tree) which is consumed by a  
> 'pipeline' of processors, users can easily modify this pipeline to add a 
> desired functionality
>  - the new parser contains a few (very important) debugging functions; it can 
> print results of every stage of the build, generate AST's as graphical 
> charts; ant targets help to build/test/debug grammars
>  - I've tried to reuse the existing queryparser.flexible components as much 
> as possible, only adding new processors when necessary
> Assumptions about the grammar:
>  - every grammar must have one top parse rule called 'mainQ'
>  - parsers must generate AST (Abstract Syntax Tree)
> The structure of the AST is left open, there are components which make 
> assumptions about the shape of the AST (ie. that MODIFIER is parent of a a 
> FIELD) however users are free to choose/write different processors with 
> different assumptions about the AST shape.
> More documentation on how to use the parser can be seen here:
> http://29min.wordpress.com/category/antlrqueryparser/
> The parser has been created more than one year back and is used in production 
> (http://labs.adsabs.harvard.edu/adsabs/). A different dialects of query 
> languages (with proximity operatos, functions, special logic etc) - can be 
> seen here: 
> https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs
> https://github.com/romanchyla/montysolr/tree/master/contrib/invenio

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to