[ https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789374#action_12789374 ]
David Kaelbling commented on LUCENE-2039: ----------------------------------------- > I would suggest to modify the ExtensionQuery ctor to take a QueryParser > instance and add > the corresponding getters to it. That way we can maintain a consistent view > on the setting > even if they are reset on the master parser without overriding all setters > though. Would > that make sense? Simon, it sounds like the right direction! But relying on ExtensionParser implementors to manually copy all the parent settings to the child seems like a maintenance problem. We add new settings relatively often. Unfortunately there's nothing like the token Attribute stuff for QueryParser... > Another way of enable this is to pass the query parser instance into > ExtensionQuery and > simply add a getter so extension parsers can access the parser and its > utilities too. Umm, I don't quite follow. If the extension wraps an existing parser, the extension would have to subclass it, override all the getters/setters to delegate to the parent, and trust that everyone uses them? That's not currently true -- for example QueryParser.getPrefixQuery() directly accesses allowLeadingWildcard, without using the getter. There are probably other cases too, that's just the first one I checked. > Regex support and beyond in JavaCC QueryParser > ---------------------------------------------- > > Key: LUCENE-2039 > URL: https://issues.apache.org/jira/browse/LUCENE-2039 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2039.patch, LUCENE-2039_field_ext.patch, > LUCENE-2039_field_ext.patch, LUCENE-2039_field_ext.patch, > LUCENE-2039_field_ext.patch, LUCENE-2039_field_ext.patch > > > Since the early days the standard query parser was limited to the queries > living in core, adding other queries or extending the parser in any way > always forced people to change the grammar file and regenerate. Even if you > change the grammar you have to be extremely careful how you modify the parser > so that other parts of the standard parser are affected by customisation > changes. Eventually you had to live with all the limitation the current > parser has like tokenizing on whitespaces before a tokenizer / analyzer has > the chance to look at the tokens. > I was thinking about how to overcome the limitation and add regex support to > the query parser without introducing any dependency to core. I added a new > special character that basically prevents the parser from interpreting any of > the characters enclosed in the new special characters. I choose the forward > slash '/' as the delimiter so that everything in between two forward slashes > is basically escaped and ignored by the parser. All chars embedded within > forward slashes are treated as one token even if it contains other special > chars like * []?{} or whitespaces. This token is subsequently passed to a > pluggable "parser extension" with builds a query from the embedded string. I > do not interpret the embedded string in any way but leave all the subsequent > work to the parser extension. Such an extension could be another full > featured query parser itself or simply a ctor call for regex query. The > interface remains quiet simple but makes the parser extendible in an easy way > compared to modifying the javaCC sources. > The downsides of this patch is clearly that I introduce a new special char > into the syntax but I guess that would not be that much of a deal as it is > reflected in the escape method though. It would truly be nice to have more > than once extension an have this even more flexible so treat this patch as a > kickoff though. > Another way of solving the problem with RegexQuery would be to move the JDK > version of regex into the core and simply have another method like: > {code} > protected Query newRegexQuery(Term t) { > ... > } > {code} > which I would like better as it would be more consistent with the idea of the > query parser to be a very strict and defined parser. > I will upload a patch in a second which implements the extension based > approach I guess I will add a second patch with regex in core soon too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org