[ https://issues.apache.org/jira/browse/LUCENE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Chyla updated LUCENE-5014: -------------------------------- Attachment: LUCENE-5014.txt The same patch + lucene grammar extended with NEARx operator > ANTLR Lucene query parser > ------------------------- > > Key: LUCENE-5014 > URL: https://issues.apache.org/jira/browse/LUCENE-5014 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser, modules/queryparser > Affects Versions: 4.3 > Environment: all > Reporter: Roman Chyla > Labels: antlr, query, queryparser > Attachments: LUCENE-5014.txt, LUCENE-5014.txt, LUCENE-5014.txt > > > I would like to propose a new way of building query parsers for Lucene. > Currently, most Lucene parsers are hard to extend because they are either > written in Java (ie. the SOLR query parser, or edismax) or the parsing logic > is 'married' with the query building logic (i.e. the standard lucene parser, > generated by JavaCC) - which makes any extension really hard. > Few years back, Lucene got the contrib/modern query parser (later renamed to > 'flexible'), yet that parser didn't become a star (it must be very confusing > for many users). However, that parsing framework is very powerful! And it is > a real pity that there aren't more parsers already using it - because it > allows us to add/extend/change almost any aspect of the query parsing. > So, if we combine ANTLR + queryparser.flexible, we can get very powerful > framework for building almost any query language one can think of. And I hope > this extension can become useful. > The details: > - every new query syntax is written in EBNF, it lives in separate files (and > can be tested/developed independently - using 'gunit') > - ANTLR parser generates parsing code (and it can generate parsers in > several languages, the main target is Java, but it can also do Python - which > may be interesting for pylucene) > - the parser generates AST (abstract syntax tree) which is consumed by a > 'pipeline' of processors, users can easily modify this pipeline to add a > desired functionality > - the new parser contains a few (very important) debugging functions; it can > print results of every stage of the build, generate AST's as graphical > charts; ant targets help to build/test/debug grammars > - I've tried to reuse the existing queryparser.flexible components as much > as possible, only adding new processors when necessary > Assumptions about the grammar: > - every grammar must have one top parse rule called 'mainQ' > - parsers must generate AST (Abstract Syntax Tree) > The structure of the AST is left open, there are components which make > assumptions about the shape of the AST (ie. that MODIFIER is parent of a a > FIELD) however users are free to choose/write different processors with > different assumptions about the AST shape. > More documentation on how to use the parser can be seen here: > http://29min.wordpress.com/category/antlrqueryparser/ > The parser has been created more than one year back and is used in production > (http://labs.adsabs.harvard.edu/adsabs/). A different dialects of query > languages (with proximity operatos, functions, special logic etc) - can be > seen here: > https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs > https://github.com/romanchyla/montysolr/tree/master/contrib/invenio -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org