[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737957#action_12737957 ] Luis Alves commented on LUCENE-1567: Hi Michael, {quote} OK, didn't know there was another patch coming I guess I'll redo my verification then... {quote} I added that comment when I created a block dependency on LUCENE-1486. I'm still learning JIRA :), I didn't the comment know was going to get posted in this thread, I was assuming it was the LUCENE-1486, that was going to get the comment. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009july30_v12.patch, lucene_trunk_FlexQueryParser_2009july31_v14.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737958#action_12737958 ] Michael Busch commented on LUCENE-1567: --- No worries. I just tested the latest patch. All tests (core, contrib, tag) pass and javadocs look good. I'll commit this in a day or two! New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009july30_v12.patch, lucene_trunk_FlexQueryParser_2009july31_v14.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737750#action_12737750 ] Michael Busch commented on LUCENE-1567: --- Good work! All tests (core, contrib, bw-comp) pass, the warnings are gone, and the All javadocs section contains the new packages now too. I will commit this in a couple of days if nobody objects. If there are small outstanding javadoc improvements or so, we should open a separate JIRA issue... this patch is so big that changes are hard to track. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009july30_v12.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737752#action_12737752 ] Luis Alves commented on LUCENE-1567: Michael, I will submit a new patch in a few hours, I just need to finished the testing. It includes: - javadocs for the messages package - rename all classes that started with Lucene to Original. - refactor the OriginalConfigHandler and OriginalQueryParserHelper - remove the testcases used the ConfigHandler directly, since it was a duplication of the ones that use the QPHelper. - new FieldBostMapFCListener, to handle boost maps in a cleaner way - rename method in FieldConfigListener to buildFieldConfig - new DateResolutionFCListener to handle dateresolution in a cleaner way - A fix to a problem I found in BoostQueryNodeProcessor New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009july30_v12.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737022#action_12737022 ] Luis Alves commented on LUCENE-1567: Hi Adriano There is something wrong with your patch v11, the size almost doubled. I did a diff with v10 and it looks like all the files show up twice on v11 patch. Can you resubmit it again, I'll add the docs for messages after you resubmit it. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009july29_v11.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737031#action_12737031 ] Uwe Schindler commented on LUCENE-1567: --- {quote} Can you create a new jira issue with the description of the feature, so we can discuss the details there. I'll try to implement that once we agree on all the details. {quote} Will do! Thanks. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009july29_v11.patch, lucene_trunk_FlexQueryParser_2009july30_v12.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736662#action_12736662 ] Uwe Schindler commented on LUCENE-1567: --- Just a question: Will it be possible to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. NumericRangeQuery also supports the rewrite modes, only some type of schema support is missing. I ask this, because someone asked on java-user for such a feature in query parser. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736879#action_12736879 ] Michael Busch commented on LUCENE-1567: --- {quote} Could you also please fix the javadocs? When I'm building the javadocs I'm getting a lot of warnings about not found references. {quote} The warnings occur because you put links to the new contrib queryparser into the core queryparser. That doesn't work as the contribs are not in the classpath of the core, so I think we should remove those links and change them just to plain text. Also, please make sure to add to the main build.xml appropriate entries for the javadocs, otherwise the All javadocs will not contain the contrib QP classes. There are also some TODOs in the docs; especially in top-level places, such as the package.html of your new package, we should not have TODOs in the docs. Please fix that soon, 2.9 is coming quickly. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736982#action_12736982 ] Luis Alves commented on LUCENE-1567: Hi Uwe, {quote} Will it be possible to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. NumericRangeQuery also supports the rewrite modes, only some type of schema support is missing. {quote} I think this is doable. I don't think there is a way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. Can you create a new jira issue with the description of the feature, so we can discuss the details there. I'll try to implement that once we agree on all the details. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736270#action_12736270 ] Luis Alves commented on LUCENE-1567: Hi Adriano, You just forgot to delete the query parser filesgenerated by javacc, and regenerate the code from the TextParser.jj using javacc. This created 2 new classes called TextParserConstants and TextParserTokenManager, but your re-factored code was still using the QueryParser named classes. Not a big deal, but we should always use the generated code from javacc, and avoid editing those generated files, this will avoid inconsistencies. I'll try to add a javacc target to the queryparser build file, in the near future to make that easier :) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736285#action_12736285 ] Adriano Crestani commented on LUCENE-1567: -- Hi Michael, I noticed you renamed LuceneQueryConfigHandler.setConstantScoreRewrite to setMultiTermRewriteMethod. Shouldn't ConstantScoreRewriteAttribute class (and its impl) be renamed too? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736289#action_12736289 ] Adriano Crestani commented on LUCENE-1567: -- {quote} You just forgot to delete the query parser files generated by javacc, and regenerate the code from the TextParser.jj using javacc. This created 2 new classes called TextParserConstants and TextParserTokenManager, but your re-factored code was still using the QueryParser named classes. Not a big deal. We should always use the generated code from javacc, and avoid editing those generated files, this will prevent inconsistencies. {quote} I see now. I just regenerated the code and replaced the main QueryParser.java by TextParser.java. Sorry for the mistake. {quote} I'll try to add a javacc target to the queryparser build file, in the near future to make that easier {quote} I would love that! New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736293#action_12736293 ] Uwe Schindler commented on LUCENE-1567: --- bq. I'll try to add a javacc target to the queryparser build file, in the near future to make that easier The main Lucene build.xml/common-build.xml has support for this, maybe it is just using an already available target/macro. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736439#action_12736439 ] Michael Busch commented on LUCENE-1567: --- {quote} I noticed you renamed LuceneQueryConfigHandler.setConstantScoreRewrite to setMultiTermRewriteMethod. Shouldn't ConstantScoreRewriteAttribute class (and its impl) be renamed too? {quote} Oh yeah of course! I meant to do that, but then forgot... Are you going to submit a new version of the patch anyway? If yes, can you make this change? If no I can submit a new patch... New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736454#action_12736454 ] Adriano Crestani commented on LUCENE-1567: -- {quote} Are you going to submit a new version of the patch anyway? If yes, can you make this change? If no I can submit a new patch... {quote} Sorry Michael, by one minute, I did not see your comment. I will submit another patch tomorrow with these changes. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736459#action_12736459 ] Michael Busch commented on LUCENE-1567: --- {quote} I will submit another patch tomorrow with these changes. {quote} Sounds good. Could you also please fix the javadocs? When I'm building the javadocs I'm getting a lot of warnings about not found references. Otherwise I think this is ready to commit soon. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735812#action_12735812 ] Michael Busch commented on LUCENE-1567: --- The build.xml seems to be missing in the latest patch. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735820#action_12735820 ] Adriano Crestani commented on LUCENE-1567: -- Hi Michael, I think I excluded by mistake every file (non folder) under contrib/queryparser/. So, the files contrib/queryparser/build.xml and pom.xml.template were not included in patch v8. You can copy them from patch v7, because they were not changed between v7 and v8. Or I can submit a new patch including these 2 files. Let me know what is the best option for you. Sorry for the mistake New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735835#action_12735835 ] Michael Busch commented on LUCENE-1567: --- Thanks, Adriano. I'll take the files from the v7 patch. I made some changes to make the patch compile with the recent changes on trunk (Attribute changes and constant score rewrite method changes). Now some tests are failing for me: {noformat} [junit] Testsuite: org.apache.lucene.messages.TestNLS [junit] Tests run: 6, Failures: 2, Errors: 0, Time elapsed: 0.073 sec [junit] - Standard Error - [junit] WARN: Message with key:Q0005E_MESSAGE_NOT_IN_BUNDLE and locale: en_US not found. [junit] - --- [junit] Testcase: testMessageLoading_ja(org.apache.lucene.messages.TestNLS):FAILED [junit] expected:[?]: XXX but was:[?]: XXX [junit] junit.framework.ComparisonFailure: expected:[?]: XXX but was:[?]: XXX [junit] at org.apache.lucene.messages.TestNLS.testMessageLoading_ja(TestNLS.java:36) [junit] Testcase: testNLSLoading_ja(org.apache.lucene.messages.TestNLS): FAILED [junit] expected:[?] but was:[?] [junit] junit.framework.ComparisonFailure: expected:[?] but was:[?] [junit] at org.apache.lucene.messages.TestNLS.testNLSLoading_ja(TestNLS.java:54) [junit] Test org.apache.lucene.messages.TestNLS FAILED {noformat} New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735838#action_12735838 ] Michael Busch commented on LUCENE-1567: --- Sorry, my fault. The encoding was not set to UTF-8 in my eclipse project when I applied the patch. I changed it to UTF-8, reapplied, and now it works fine. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735842#action_12735842 ] Michael Busch commented on LUCENE-1567: --- {quote} OK, I have submitted the IP Clearance to Incubator. It now has 3 days to percolate under a lazy consensus review. After that, assuming it passes, this can be committed. {quote} Did it pass the review, Grant? If yes then I think we can commit this soon to contrib. We can still make small improvements, which will be easier when it's committed. This patch is really large now, and it's hard to track changes. Also, even though in 3.0 we can't add new features, we can certainly improve documentation and add more testcases aftert 2.9 and before 3.0. I haven't reviewed everything yet, but if feels like it could use some more documentation... New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735850#action_12735850 ] Michael Busch commented on LUCENE-1567: --- Luis/Adriano: I find it a bit confusing now that the different main folders, such as builders, processors, etc. share the same root with 'original', which is an actual implementation. Could we change the packaging here? Maybe we could create contrib/queryparser/core and move builders, processors, etc. there? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735855#action_12735855 ] Luis Alves commented on LUCENE-1567: Hi Michael, Thanks for updating the patch to work with the latest code in svn. I like your suggestion about creating a core package, I'll re-factor the code to use the core package. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735862#action_12735862 ] Grant Ingersoll commented on LUCENE-1567: - bq. Did it pass the review, Grant? Yes. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735938#action_12735938 ] Adriano Crestani commented on LUCENE-1567: -- {quote} fixed TextParser.jj - I assumed Adriano's re-factored this class without generating the code using javacc, the generated classes where not correct. {quote} Hi Luis, What exactly did I forget to do? It's hard to find out differences between the patch versions while the code is not commited :( I'm not sure, I generated a new TextParser.java after I changed the .jj, it was working and tests passing. {quote} Could we change the packaging here? Maybe we could create contrib/queryparser/core and move builders, processors, etc. there? {quote} That's fine, I think this way the code will look clearer and less confusing. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733549#action_12733549 ] Michael McCandless commented on LUCENE-1567: How are we going to migrate the subclasses we have to the current queryParser (ComplexPhraseQueryParser and MultiFieldQueryParser) to the new queryParser? In the latest patch I see that these subclasses are deprecated, saying please use the new flexible queryParser instead, but I think that's not enough, ie don't we need to make corresponding subclasses of the new queryParser for these? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733591#action_12733591 ] Michael McCandless commented on LUCENE-1567: bq. How are we going to migrate the subclasses we have to the current queryParser (ComplexPhraseQueryParser and MultiFieldQueryParser) to the new queryParser? Here's the answer to 1/2 of my question: the current patch already contains a MultiFieldQueryParserWrapper. It too is deprecated, which means in the move to core that we'll do for 3.0, it'll be removed. But the migration path forward seems quite simple: use a MultiFieldQueryNodeProcessor to rewrite all queries so they are against multiple fields (I think?). If that's right, can you update the deprecated javadocs explaining that this is the way forward? In general when we deprecate the old, we need to point the way to the new. Also, can you fix all indentation to 2-space in the next patch iteration? I'm not sure exactly how, but it'd be great if this new QueryParser had better integration with NumericRangeQuery. But that's a nice-to-have for 2.9. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733603#action_12733603 ] Grant Ingersoll commented on LUCENE-1567: - OK, I have submitted the IP Clearance to Incubator. It now has 3 days to percolate under a lazy consensus review. After that, assuming it passes, this can be committed. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733925#action_12733925 ] Michael Busch commented on LUCENE-1567: --- {quote} Wow, I'm scaried now, do you know an easy and automated way to do that? It will take forever to do manually. {quote} Just change your eclipse settings (Java Code Style - Formatter) to 2-space indentation (only whitespaces) and then do Source - Correct Indentation. No reason to be scared! :) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732723#action_12732723 ] Michael Busch commented on LUCENE-1567: --- {quote} I see the hash, but not the version number/platform of the tool used to create it. {quote} md5sum version is included in coreutils 6.10-6ubuntu1 The GNU core utilities OS: Ubuntu Jaunty 2.6.28-13-server i686 GNU/Linux New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731497#action_12731497 ] Luis Alves commented on LUCENE-1567: I upload the patch that undo my changes on the build files to skip the queryparser module if jdk 1.4 was found This latest patch also includes Adriano's changes. {quote} I agree with Luis, it's a good idea to have a package for each different query parser implementation. I also agree with Grant that it does not make sense to have an implementation tied to a number. So, as the lucene2 implementation contains the default/main Lucene query parser implementation, I would suggest to rename it to defaultLucene, default or main. I will give +1 for default. {quote} I'll add 2 more suggestions standard, standardSyntax. I will give +1 for standard. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730721#action_12730721 ] Michael Busch commented on LUCENE-1567: --- Mark, it seems like the best thing to do here is to add this as a 1.5 contrib for now and deprecate the core query parser. Then in 3.0 we would move the new one into core and remove the old one entirely. Since it will remain in the same package users won't have to change their code, just while they use 2.9 they have to put an extra jar in their classpath. Looking at the latest patch, that's what it does (new one to contrib while deprecating old one). New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730735#action_12730735 ] Luis Alves commented on LUCENE-1567: Adriano, I will rollback the build.xml changes tomorrow, and use the convention that the spatial and fast-vector-highlighter modules use. On the package name lucene2: I think during the Lucene 3.X development more parsers will be added to the QueryParser, and these parsers will also be lucene parsers and we will need different names. It is probably better to keep lucene2 on the package name, or use a name that makes a reference to the old queryparser. For example, in the future we could have: org.apache.lucene.queryParser.lucene2 - lucene 2.X syntax org.apache.lucene.queryParser.lucene3 - lucene 3.X syntax org.apache.lucene.queryParser.xml - some XML syntax org.apache.lucene.queryParser.luceneBoolean - boolean syntax org.apache.lucene.queryParser.explicit - explict query language syntax I'll also help on the when wiki the code is committed to the trunk. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730780#action_12730780 ] Grant Ingersoll commented on LUCENE-1567: - Names that tack a 2 or some other number on the end are pretty much meaningless. I'd suggest finding something better that actually describes what the package contains. After all what is the second query parser? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730812#action_12730812 ] Mark Miller commented on LUCENE-1567: - {quote}Mark, it seems like the best thing to do here is to add this as a 1.5 contrib for now and deprecate the core query parser. Then in 3.0 we would move the new one into core and remove the old one entirely. Since it will remain in the same package users won't have to change their code, just while they use 2.9 they have to put an extra jar in their classpath. Looking at the latest patch, that's what it does (new one to contrib while deprecating old one).{quote} Right, I think that does make sense, but can we actually go to 1.5 in 3.0? Thats what my main question is around. I know the 1.5 wiki says that we can, but Mike has indicated that 3.0 would just be a quick bug fix release with deprecations removed from 2.9. I thought I'd seen him say that 3.1 would actually be the first with 1.5? Mike M? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730815#action_12730815 ] Grant Ingersoll commented on LUCENE-1567: - 3.0 will be 1.5. See http://wiki.apache.org/lucene-java/Java_1.5_Migration New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. --
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730821#action_12730821 ] Michael McCandless commented on LUCENE-1567: Right, 3.0 is when we can first use 1.5 code. But, 3.0 will be a fast mechanical release after 2.9. This is just like the 1.9 - 2.0 fast turnaround, *except* because we begin accepting 1.5 code in 3.0 we may make certain changes (switch to generics in certain APIs; move the new QueryParser into core; etc.). However we don't plan on doing any new features, etc in 3.0; that will first happen in 3.1. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730823#action_12730823 ] Mark Miller commented on LUCENE-1567: - Yeah, I had seen that, I was just remembering an email or two from Mike that mentioned differently (waiting till 3.1) ... but I just found one of the threads discussing it and it looks like consensus shifted: http://www.lucidimagination.com/search/document/6d2b6488b4115/2_9_3_0_plan_java_1_5#6d2b6488b4115 New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731016#action_12731016 ] Michael Busch commented on LUCENE-1567: --- {quote} except because we begin accepting 1.5 code in 3.0 we may make certain changes (switch to generics in certain APIs; move the new QueryParser into core; etc.). {quote} OK sounds like a plan then! The new QP code will not change, but we'll move it into core in 3.0. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731220#action_12731220 ] Adriano Crestani commented on LUCENE-1567: -- {quote} except because we begin accepting 1.5 code in 3.0 we may make certain changes (switch to generics in certain APIs; move the new QueryParser into core; etc.). OK sounds like a plan then! The new QP code will not change, but we'll move it into core in 3.0. {quote} Thanks for the explanation! {quote} Names that tack a 2 or some other number on the end are pretty much meaningless. I'd suggest finding something better that actually describes what the package contains. After all what is the second query parser? {quote} I agree with Luis, it's a good idea to have a package for each different query parser implementation. I also agree with Grant that it does not make sense to have an implementation tied to a number. So, as the lucene2 implementation contains the default/main Lucene query parser implementation, I would suggest to rename it to defaultLucene, default or main. I will give +1 for default. Regards, Adriano Crestani Campos New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Fix For: 2.9 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730627#action_12730627 ] Adriano Crestani commented on LUCENE-1567: -- Ah, I also couldn't run ant build-contrib using Java 1.4, it fails, I even tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis? I already opened a thread on the ML about this: http://markmail.org/thread/3fyldf7t423fhwbm New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730641#action_12730641 ] Adriano Crestani commented on LUCENE-1567: -- {quote} Ah, I also couldn't run ant build-contrib using Java 1.4, it fails, I even tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis? I already opened a thread on the ML about this: http://markmail.org/thread/3fyldf7t423fhwbm {quote} Mark Miller just replied to the thread and based on his response there is no need for contrib projects to be able to compile using JDK 1.4. So, Luis, could you rollback your changes you did on the build files? Thanks, Adriano Crestani Campos New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730653#action_12730653 ] Mark Miller commented on LUCENE-1567: - Hang on a sec - it sounds like the target was 1.4 because this was going to replace a 1.4 core piece of functionality. I don't know that all of the details are fully straightened out though. 1. I'm not pro moving the QueryParser to contrib myself, unless we actually move forward on that 'modules' thread - if not, it doesn't appear very helpful to me. 2. If we move this to contrib, perhaps it can be 1.5? But then in 3.0, can we have 1.5 already? Or is that 3.1? If its 3.1, than if we remove the deprecated query parser in 3.0, you won't have a java 1.4 replacement to move to (if course we could keep the old QueryParser till 4.0 ... ). I'm not clear that we can't add new functionality to 3.0 though. I know Mike has mentioned it, but I can't find where it says that - I just see that we can remove deprecations, not that we can't also add new features. I may be missing something though? We should get things fully straightened out before you spend too much time switching between 1.4 and 1.5 though. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729551#action_12729551 ] Michael Busch commented on LUCENE-1567: --- Luis, I think you need to modify the main build.xml, because the query parser contrib uses java 1.5. For an example look into the build.xml from Lucene 2.2.x. It had a contrib called gdata, which used JRE 1.5. (This was removed after 2.2, so you won't find it in the current build.xml anymore). Currently the build will fail if the user runs JRE 1.4, but it should rather skip the new query parser contrib. You can use this property, which is definied in common-build.xml: {code} condition property=build-1-5-contrib equals arg1=1.5 arg2=${ant.java.version} / /condition {code} New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729815#action_12729815 ] Adriano Crestani commented on LUCENE-1567: -- Hi Luis, I have been improving the code documentation lately, I will merge my diff with your new patch and submit the changes soon. I also could merge with the trunk, it depends when last Luis' patch will be committed. {quote} Adriano when you have some time, can you write an interface for simple usage of the new QueryParser, and a simple implementation of the interface, that creates a textparser, creates a processor pipeline, and instantiates the Lucene builders? {quote} Good idea Luis! I was thinking about a class that would allow query parser implementors to bundle their processor, text parser and builder in it, so the user could simply use it, nobody needs to know how it's implemented. I think the class should contain a method parse(String defaultField, String queryString) that returns whatever that query parser creates from it, in Lucene's case, a Query object. Also, some sets and gets to access the internal processor, builder and text parser, if the user wishes to. I'm gonna work more on the design and submit a patch soon containing it. {quote} And please add a simple junit that demonstrates the usage of that interface and ideally some documentation into the package.html of the new contrib package that will help users who want to use the queryparser to get started. {quote} I was also thinking about a wiki page that would guide Lucene users to migrate to the new query parser using this new interface. More suggestions? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729935#action_12729935 ] Luis Alves commented on LUCENE-1567: Hi Michael, For an example look into the build.xml from Lucene 2.2.x. The ant file on this Lucene 2.2 module does not follow the lucene convention and it uses a complex implementation. So I fixed the problem in a different way: I renamed the contrib/queryparser/build.xml to build15.xml, and I fixed the contrib-crawl to include build15.xml when a jdk15 is present. I tested default, build-contrib, javadocs-contrib all work fine. I just uploaded the patch v5 with this fix. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729529#action_12729529 ] Luis Alves commented on LUCENE-1567: Since all legal work is finally finished now, it is time for an updated patch with the latest fixes and improvements. Below are the changes compared to the previous patch: • moved the new queryparser to contrib • deprecated old QueryParser classes in the core • the new queryparser in contrib uses jdk 1.5 • patch compiles against current trunk • rewrote the lucene testcases to use the new API's • created wrapper testcases that uses wrapper classes • created classes to overwrite the old QueryParser in the util folder, and make Lucene use the flexible query parser engine, without having to change your code. I verified that all testcases are working, and that all contrib modules still compile fine. Adriano when you have some time, can you write an interface for simple usage of the new QueryParser, and a simple implementation of the interface, that creates a textparser, creates a processor pipeline, and instantiates the lucene builders? And please add a simple junit that demonstrates the usage of that interface and ideally some documentation into the package.html of the new contrib package that will help users who want to use the queryparser to get started. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729532#action_12729532 ] David Sitsky commented on LUCENE-1567: -- I will be out of the office on Friday, 10th of July. -- Cheers, David Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://www.nuix.comFax: +61 2 9212 6902 New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728039#action_12728039 ] Grant Ingersoll commented on LUCENE-1567: - bq. I saw that at least one other Apache project Just because someone else does it wrong... It's pretty clear in this case that the Grant is necessary. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728042#action_12728042 ] Mark Miller commented on LUCENE-1567: - Someone else didnt do it wrong - they asked and got an answer from someone from Apache that seemed to know what they were talking about, and seemed to have the authority/knowledge to give the answer they gave. I'm not saying something one way or another - just throwing what I saw out there. I'm sure you have more info on the subject than I do. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728052#action_12728052 ] Mark Miller commented on LUCENE-1567: - Hmmm - if you look at the strict letter of the law in Intellectual Property Clearance, then LocalLucene and Trie and a lot of other stuff also needed this clearance ... I may have just missed the process on those though. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728085#action_12728085 ] Grant Ingersoll commented on LUCENE-1567: - LocalLucene did. Not sure about Trie. Anyway, this issue is not the place for this discussion. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728090#action_12728090 ] Mark Miller commented on LUCENE-1567: - bq. Anyway, this issue is not the place for this discussion. Seems like a couple comments about this here is appropriate to me. Your just being prickly man. I was pointing something out that has relevance to this issue and relevance to committers when dealing with future similar issues. My comment about LocalLucene and Trie were not an accusation, but an attempt to clarify what requires this and what doesn't. As a committer, its important that this information is clear to me. As the PMC head, I'd think youd be more helpful with the matter. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728102#action_12728102 ] Mark Miller commented on LUCENE-1567: - I'll just keep my response out of JIRA to avoid taking over that issue: According to http://incubator.apache.org/ip-clearance/index.html, it doesn't matter if it was your companys code or if you are a committer or if you have a CLA. If it was developed outside of Apache svn/mailing lists and was then donated, it says it needs the grant. - Mark -- - Mark http://www.lucidimagination.com New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much.
Re: [jira] Commented: (LUCENE-1567) New flexible query parser
Mark Miller (JIRA) wrote: Mark Miller commented on LUCENE-1567: - I'll just keep my response out of JIRA to avoid taking over that issue: Nevermind. Sorry. JIRA is too smart for me ;) -- - Mark - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728103#action_12728103 ] Grant Ingersoll commented on LUCENE-1567: - bq. And the code was already Apache 2.0 licensed, so there was no problem to donate it. This does not matter, nor does the license. I was unaware that it lived in public someplace else. If the code lives somewhere else in public, then it needs to go through Soft. Grant, AIUI. Having it licensed as ASL just makes the paperwork a formality. At any rate, as I said, the discussion of Trie, LocalLucene and when some generic piece of code needs a grant has nothing to do with this particular issue, so please, if you want to continue this conversation, then start one on java-dev. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727531#action_12727531 ] Mark Miller commented on LUCENE-1567: - I wonder if all of this was really necessary. Months ago, while doing some searching, I saw that at least one other Apache project (might have been on the legal email list?), asked about a large code contribution from a company, and the response was that if a guy at the company had a CLA on file (was a committer), and was part of the process, he could commit the large code contribution without all of this paperwork mumbo jumbo. Of course, best to be thorough and complete, but I think we may not have to jump through these same hoops in the future. Not that that means much, as I say that with no authority or complete knowledge about it. But if someone wanted to research further ... New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720371#action_12720371 ] Michael Busch commented on LUCENE-1567: --- Adriano and Luis: I think a few things need to be changed/improved here in order to commit this for 2.9: - JDK 1.4 compatibility - Backwards-compatibility, which means the patch should deprecate the old QueryParser, not remove it - A class/interface that is as easy to use as the old QueryParser for people who simply want to parse a query string; those users shouldn't have to know about what a processor or builder chain is - Better documentation about how to easily switch over from the old to the new QueryParser We're talking on java-dev about releasing 2.9 fairly soon; how long would these changes take? I'd like to get this in 2.9 because it seems to be a popular issue. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720379#action_12720379 ] Grant Ingersoll commented on LUCENE-1567: - I need an MD5/SHA1 hash (http://incubator.apache.org/ip-clearance/ip-clearance-template.html) for the exact code listed in the software grant. Also include the version number of the software used to create the hash. See https://issues.apache.org/jira/browse/INCUBATOR-77 for example. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720385#action_12720385 ] Michael Busch commented on LUCENE-1567: --- But we still need to update the code before we can commit. From which patch do you need the MD5/SHA1 hash from? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720386#action_12720386 ] Grant Ingersoll commented on LUCENE-1567: - OK, only outstanding items for clearance are: 1. tarball and hash 2. Vote on Incubator for clearance. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720404#action_12720404 ] Michael Busch commented on LUCENE-1567: --- Ok GO! New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720403#action_12720403 ] Adriano Crestani commented on LUCENE-1567: -- Hi Michael, I expect it takes one week at max! New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720417#action_12720417 ] Grant Ingersoll commented on LUCENE-1567: - Commit is separate from IP Clearance and you can't commit until the clearance is accepted. I just need the tarball for the code that was referenced in the software grant along with a hash on it. In the grant, you have a file directory listing describing the code. Take that file listing, tar it up and run md5 on it. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720422#action_12720422 ] Michael Busch commented on LUCENE-1567: --- OK that should be easy. We'll do that asap. Thanks for explaining, Grant. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717588#action_12717588 ] Adriano Crestani commented on LUCENE-1567: -- We also have to change List change QueryNode getchildren(); and public MapCharSequence, Object getTags(); I already mentioned about getChildren(), I forgot about getTags(), thanks for reminding me : )...I looked into the patch attached, it does not contain that yet :P We also have change QueryNodeImpl, we will have to patch all QueryNode classes implementations and perform forced casts. and users implementing QueryNode's will also have to do that. Yes, these are internal changes, they shouldn't affect anything, even after released with 2.9. If some user extends these classes, they will need to do the type checking and follow the documentation as I said before. Also, there would be no impact when they migrate to 3.0 when we re-add the generics, they will just keep performing the type checking on the List objects and verifying if they are QueryNodes or not. Some extras: If we chose to release both parsers, we should deprecate the old one, allowing people to migrate to the new one with release 2.9. and drop the old queryparser classes on 3.0. (we can keep the wrappers in 2.9 throwing exceptions in all methods to remind people to move to the new framework we probably can also keep the wrapper in 3.0, if we think is still necessary). Completely agree :) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717486#action_12717486 ] Michael Busch commented on LUCENE-1567: --- Is it mostly internal stuff you need to change to compile with 1.4, or do also a lot of public APIs use generics? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717492#action_12717492 ] Adriano Crestani commented on LUCENE-1567: -- It's mostly internal stuffs, the only api that uses generics is QueryNode tha returns ListQueryNode and receives it as param, I actually don't think it's a big deal :p New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to
Re: [jira] Commented: (LUCENE-1567) New flexible query parser
I actually think we should give the parser to contrib on 2.9 using jdk 1.5 syntax and move it to main on 3.0 using jdk1.5 syntax. I don't think it's a small change, and will affect the interfaces, and all implementations of QueryNode Objects. I would see nothing wrong with having a jdk 1.4 version if we were 100% compatible with the old queryparser, but since that is not the case. (the wrapper we built does not support the case where users extend the old queryparser class and overwrite methods to add new functionality) Adriano Crestani (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717492#action_12717492 ] Adriano Crestani commented on LUCENE-1567: -- It's mostly internal stuffs, the only api that uses generics is QueryNode tha returns ListQueryNode and receives it as param, I actually don't think it's a big deal :p New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717535#action_12717535 ] Luis Alves commented on LUCENE-1567: I actually think we should give the parser to contrib on 2.9 using jdk 1.5 syntax and move it to main on 3.0 using jdk1.5 syntax. I don't think it's a small change and this change will affect the interfaces and future versions of the parser (to be 1.4 compatible). I would see nothing wrong with having a jdk 1.4 version if we were 100% compatible with the old queryparser, but since that is not the case, I don't think it is worth it. (the wrapper we built does not support the case where users extend the old queryparser class and overwrite methods to add new functionality) If everyone else thinks making the queryparser interfaces 1.4 compatible is a must, I will be OK with it. But only if we actually move the new queryparser to main on 2.9 and break the compatibility with the old lucene Queryparser class, for users that are extending this class. The new queryparser supports 100% on the syntax, and 100% of the lucene Junits. But does not support users that extended the QueryParser class and overwrote some methods. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717543#action_12717543 ] Adriano Crestani commented on LUCENE-1567: -- I went through the new QP and listed what exactly needs to be changed: QueryNode class has 2 methods: set(ListQueryNode), add(ListQueryNode) and ListQueryNode getChildren(). All the generics would be removed. I don't see any back compatibility problem if we add generics in future, we could hardcode the type checking if we release with 1.4 and any user impl of this class will need to do the same and follow the documentation. ModifierQueryNode has an enum called Modifier with values MOD_NOT, MOD_NONE and MOD_REQ. An enum can be almost completely reproduced on 1.4 using: ... final public static class Modifier implements Serializable { final public static Modifier MOD_NOT = new Modifier(); final public static Modifier MOD_NOT = new Modifier(); final public static Modifier MOD_NOT = new Modifier(); private Modifier() { // empty constructor } // we might add some Enum methods, like name(), etc... } ... The only back compatibility problem I see when we change the Modifier to enum again is if on the version 1.4 the user checks for Modifier.class.isEnum()...does anybody see any other back-compatibility issue? The last thing that will need to be changed is on the QueryBuilder and LuceneQueryBuilder. The QueryBuilder.build() returns an Object and when LuceneQueryBuilder implements it, it specializes the return to Query, which will start throwing Object instead if we change to 1.4. On this case I don't see any back-compatibility issue also. Regarding the new QP framework, I don't see any problem about back compatibility, because Lucene will only be Java 1.5 on version 3.0, and back compatibility may be broken. But... I would see nothing wrong with having a jdk 1.4 version if we were 100% compatible with the old queryparser, but since that is not the case, I don't think it is worth it. (the wrapper we built does not support the case where users extend the old queryparser class and overwrite methods to add new functionality) I agree with Luis, if we only release the new QP framework 2.9, we will definitely brake the back-compatiblity of the old QP, so, why not release the old and the new QP together on 2.9? Suggestions? :) Best Regards, Adriano Crestani Campos Adriano Crestani Campos New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717560#action_12717560 ] Luis Alves commented on LUCENE-1567: There will be a couple of more changes need: We also have to change List change QueryNode getchildren(); and public MapCharSequence, Object getTags(); We also have change QueryNodeImpl, we will have to patch all QueryNode classes implementations and perform forced casts. and users implementing QueryNode's will also have to do that. It's about 30 changes, not that a big change, I agree. But if we release both parsers I see no need to change it. I agree with Luis, if we only release the new QP framework 2.9, we will definitely brake the back-compatiblity of the old QP, so, why not release the old and the new QP together on 2.9? Some extras: If we chose to release both parsers, we should deprecate the old one, allowing people to migrate to the new one with release 2.9. and drop the old queryparser classes on 3.0. (we can keep the wrappers in 2.9 throwing exceptions in all methods to remind people to move to the new framework we probably can also keep the wrapper in 3.0, if we think is still necessary). New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716841#action_12716841 ] Adriano Crestani commented on LUCENE-1567: -- Hi Michael, I think the biggest work will be to refactor the code to not use generics anymore, which is used all over the place. It might take an afternoon. Let me know, if everybody agree on changing the code to be Java 1.4 compatible, so I can do the changes ASAP. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716825#action_12716825 ] Michael Busch commented on LUCENE-1567: --- Since the modularization discussion on java-dev hasn't really had any conclusions yet, we have the options to a) make the patch jre 1.4 compatible, or b) commit it as a contrib module and move to core with 3.0. Luis and Adriano: how much work would a) be? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. --
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716311#action_12716311 ] Grant Ingersoll commented on LUCENE-1567: - The software Grant has been received and filed. I will update the paperwork and work to finish this out next week, such that we can then work to commit it. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715653#action_12715653 ] Ali Oral commented on LUCENE-1567: -- The flexible query parser looks very promising. I can see that the ProximityQueryNode was implemented but not integrated completely. Can the ProximityQuerNode be used as a replacement for SrndQueryParser? If yes do you think will it be possible to mix proximity query with the other types of queries like the one below: field1:((term1* or term2* or term3*) WITHIN/5 phrase term1* term2~2 WITHIN/50 (term3 or term4~3 or term5*)) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703504#action_12703504 ] Bertrand Delacretaz commented on LUCENE-1567: - Grant, the ip-clearance document that you created under incubator-public in svn had not been added to the site-publish folder, I just did that in revision 769253. If that's not correct, please remove both xml and html versions of the lucene-query-parser file there. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697808#action_12697808 ] Grant Ingersoll commented on LUCENE-1567: - OK, I see the CLA is registered. Now we need the Software Grant. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail:
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697929#action_12697929 ] Michael Busch commented on LUCENE-1567: --- {quote} Now we need the Software Grant. {quote} Working on it. The terms paper work, IBM and quick don't usually appear in the same sentence ;) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Grant Ingersoll Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12697688#action_12697688 ] Mark Miller commented on LUCENE-1567: - Having gone over this a bit, I think its a great step forward over the current parser. I really think it should end up replacing it. It might work well with our possible plan of steps towards modularization. We could deprecate the core QueryParser and add this as part of a new module (the trick will be turning contrib into modules in practice as opposed to just saying they are modules now - this could be a start though) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689680#action_12689680 ] Luis Alves commented on LUCENE-1567: HI Grant and Micheal I faxed the CLA today. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands,
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688663#action_12688663 ] Mark Miller commented on LUCENE-1567: - I have not had a chance to look too deeply yet (I've just jumped around the code for 15 or 20 minutes), but first impression is great. A really nice step forward for the queryparser. Thanks guys. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Attachments: lucene_trunk_FlexQueryParser_2009March24.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688721#action_12688721 ] Mark Miller commented on LUCENE-1567: - From org.apache.lucene.queryParser.lucene2.config.package.html {quote}This configuration handler reproduces almost everything that could be set on the old query parser.{quote} Does this mean there is something missing? Or that some settings are handled in another manner? New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Attachments: lucene_trunk_FlexQueryParser_2009March24.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688871#action_12688871 ] Adriano Crestani commented on LUCENE-1567: -- From org.apache.lucene.queryParser.lucene2.config.package.html This configuration handler reproduces almost everything that could be set on the old query parser. Does this mean there is something missing? Or that some settings are handled in another manner? The only QueryParser configuration that is not implemented on the new query parser configuration is QueryParser.setFuzzyMinSim(float) and QueryParser.setFuzzyPrefixLength(int). They are kind of hardcoded for now, always using the default values for these configs. But it doesn't mean they cannot be implemented on the new query parser...I think the main reason why they are not implemented yet is that Lucene TestQueryParser is not testing them, so I did not give much attention to these 2 settings. I will try to implement them for the next patch ; ) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Attachments: lucene_trunk_FlexQueryParser_2009March24.patch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683879#action_12683879 ] Grant Ingersoll commented on LUCENE-1567: - OK, I have started the IP Clearance in incubation. Please send in the software grant ASAP and make sure you CC me on it (gsing...@a.o) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683355#action_12683355 ] Paul Elschot commented on LUCENE-1567: -- You may want to take a look here: http://wiki.apache.org/lucene-java/Java_1.5_Migration Iirc somewhere in the threads referenced there it was mentioned that contrib modules can go ahead to 1.5 already now. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683393#action_12683393 ] Michael Busch commented on LUCENE-1567: --- I think it would be good to attach the 1.5 implementation here for now as a patch so that we can review the code. We can then decide when/how/where to commit it. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683616#action_12683616 ] Grant Ingersoll commented on LUCENE-1567: - First step towards Software Grant is to have all people who worked on the code file a CLA. I would suggest IBM file a CCLA if they haven't already, especially b/c it sounds like one of the people who worked on it is no longer there. I will ask for clarification. In the meantime, if Luis and Andriano can file their CLA that would be great: http://www.apache.org/licenses/icla.txt New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683621#action_12683621 ] Grant Ingersoll commented on LUCENE-1567: - Never mind on the CCLA, as I think the software grant covers it: http://www.apache.org/licenses/software-grant.txt New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683308#action_12683308 ] Luis Alves commented on LUCENE-1567: Should the Flexible Query Parser patch be committed to the main, as a replacement for the old queryparser? The current implementation is using Java 1.5 syntax. Is that OK, if we commit it to the trunk. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683313#action_12683313 ] Adriano Crestani commented on LUCENE-1567: -- It's probably not ok, since lucene build script will probably fail because of that. We are working on a patch which we will upload to this JIRA soon, it will only be for the community to review the new query parser code and not to be committed against the trunk. I think somebody could create a sandbox and commit the code, it would be easier for other to review the new query parser. I think the right question is if we should include this new parser in the release 2.9, if yes, then we definitely need to change the code to be java 1.4 compatible. Anyway, before taking this decision, the code must be available for the community : ) Best Regards, New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the