[jira] Issue Comment Edited: (LUCENE-1567) New flexible query parser

Luis Alves (JIRA) Sat, 01 Aug 2009 15:08:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737957#action_12737957
 ]


Luis Alves edited comment on LUCENE-1567 at 8/1/09 3:07 PM:
------------------------------------------------------------

Hi Michael,

{quote}
OK, didn't know there was another patch coming.... I guess I'll redo my 
verification then...
{quote}

I added that comment when I created a block dependency on LUCENE-1486.

I'm still learning JIRA :).

I didn't know the comment was going to get posted in this thread,
I was assuming LUCENE-1486 would get the comment.

      was (Author: lafa):
    Hi Michael,

{quote}
OK, didn't know there was another patch coming.... I guess I'll redo my 
verification then...
{quote}

I added that comment when I created a block dependency on LUCENE-1486.
I'm still learning JIRA :), I didn't the comment know was going to get posted 
in this thread,
I was assuming it was the LUCENE-1486, that was going to get the comment.
  
> New flexible query parser
> -------------------------
>
>                 Key: LUCENE-1567
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1567
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>         Environment: N/A
>            Reporter: Luis Alves
>            Assignee: Michael Busch
>             Fix For: 2.9
>
>         Attachments: lucene-1567.patch, 
> lucene_1567_adriano_crestani_07_13_2009.patch, 
> lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
> lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
> lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
> lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
> lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
> lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
> lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
> lucene_trunk_FlexQueryParser_2009july30_v12.patch, 
> lucene_trunk_FlexQueryParser_2009july31_v14.patch, 
> lucene_trunk_FlexQueryParser_2009March24.patch, 
> lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
> QueryParser_restructure_meetup_june2009_v2.pdf, 
> wiki_switching_to_the_new_query_parser.txt
>
>
> From "New flexible query parser" thread by Micheal Busch
> in my team at IBM we have used a different query parser than Lucene's in
> our products for quite a while. Recently we spent a significant amount
> of time in refactoring the code and designing a very generic
> architecture, so that this query parser can be easily used for different
> products with varying query syntaxes.
> This work was originally driven by Andreas Neumann (who, however, left
> our team); most of the code was written by Luis Alves, who has been a
> bit active in Lucene in the past, and Adriano Campos, who joined our
> team at IBM half a year ago. Adriano is Apache committer and PMC member
> on the Tuscany project and getting familiar with Lucene now too.
> We think this code is much more flexible and extensible than the current
> Lucene query parser, and would therefore like to contribute it to
> Lucene. I'd like to give a very brief architecture overview here,
> Adriano and Luis can then answer more detailed questions as they're much
> more familiar with the code than I am.
> The goal was it to separate syntax and semantics of a query. E.g. 'a AND
> b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
> We distinguish the semantics of the different query components, e.g.
> whether and how to tokenize/lemmatize/normalize the different terms or
> which Query objects to create for the terms. We wanted to be able to
> write a parser with a new syntax, while reusing the underlying
> semantics, as quickly as possible.
> In fact, Adriano is currently working on a 100% Lucene-syntax compatible
> implementation to make it easy for people who are using Lucene's query
> parser to switch.
> The query parser has three layers and its core is what we call the
> QueryNodeTree. It is a tree that initially represents the syntax of the
> original query, e.g. for 'a AND b':
>   AND
>  /   \
> A     B
> The three layers are:
> 1. QueryParser
> 2. QueryNodeProcessor
> 3. QueryBuilder
> 1. The upper layer is the parsing layer which simply transforms the
> query text string into a QueryNodeTree. Currently our implementations of
> this layer use javacc.
> 2. The query node processors do most of the work. It is in fact a
> configurable chain of processors. Each processors can walk the tree and
> modify nodes or even the tree's structure. That makes it possible to
> e.g. do query optimization before the query is executed or to tokenize
> terms.
> 3. The third layer is also a configurable chain of builders, which
> transform the QueryNodeTree into Lucene Query objects.
> Furthermore the query parser uses flexible configuration objects, which
> are based on AttributeSource/Attribute. It also uses message classes that
> allow to attach resource bundles. This makes it possible to translate
> messages, which is an important feature of a query parser.
> This design allows us to develop different query syntaxes very quickly.
> Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
> underlying processors and builders in a few days. We now have a 100%
> compatible Lucene query parser, which means the syntax is identical and
> all query parser test cases pass on the new one too using a wrapper.
> Recent posts show that there is demand for query syntax improvements,
> e.g improved range query syntax or operator precedence. There are
> already different QP implementations in Lucene+contrib, however I think
> we did not keep them all up to date and in sync. This is not too
> surprising, because usually when fixes and changes are made to the main
> query parser, people don't make the corresponding changes in the contrib
> parsers. (I'm guilty here too)
> With this new architecture it will be much easier to maintain different
> query syntaxes, as the actual code for the first layer is not very much.
> All syntaxes would benefit from patches and improvements we make to the
> underlying layers, which will make supporting different syntaxes much
> more manageable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-1567) New flexible query parser

Reply via email to