[
https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789374#action_12789374
]
David Kaelbling commented on LUCENE-2039:
-----------------------------------------
> I would suggest to modify the ExtensionQuery ctor to take a QueryParser
> instance and add
> the corresponding getters to it. That way we can maintain a consistent view
> on the setting
> even if they are reset on the master parser without overriding all setters
> though. Would
> that make sense?
Simon, it sounds like the right direction! But relying on ExtensionParser
implementors to manually copy all the parent settings to the child seems like a
maintenance problem. We add new settings relatively often. Unfortunately
there's nothing like the token Attribute stuff for QueryParser...
> Another way of enable this is to pass the query parser instance into
> ExtensionQuery and
> simply add a getter so extension parsers can access the parser and its
> utilities too.
Umm, I don't quite follow. If the extension wraps an existing parser, the
extension would have to subclass it, override all the getters/setters to
delegate to the parent, and trust that everyone uses them? That's not
currently true -- for example QueryParser.getPrefixQuery() directly accesses
allowLeadingWildcard, without using the getter. There are probably other cases
too, that's just the first one I checked.
> Regex support and beyond in JavaCC QueryParser
> ----------------------------------------------
>
> Key: LUCENE-2039
> URL: https://issues.apache.org/jira/browse/LUCENE-2039
> Project: Lucene - Java
> Issue Type: Improvement
> Components: QueryParser
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2039.patch, LUCENE-2039_field_ext.patch,
> LUCENE-2039_field_ext.patch, LUCENE-2039_field_ext.patch,
> LUCENE-2039_field_ext.patch, LUCENE-2039_field_ext.patch
>
>
> Since the early days the standard query parser was limited to the queries
> living in core, adding other queries or extending the parser in any way
> always forced people to change the grammar file and regenerate. Even if you
> change the grammar you have to be extremely careful how you modify the parser
> so that other parts of the standard parser are affected by customisation
> changes. Eventually you had to live with all the limitation the current
> parser has like tokenizing on whitespaces before a tokenizer / analyzer has
> the chance to look at the tokens.
> I was thinking about how to overcome the limitation and add regex support to
> the query parser without introducing any dependency to core. I added a new
> special character that basically prevents the parser from interpreting any of
> the characters enclosed in the new special characters. I choose the forward
> slash '/' as the delimiter so that everything in between two forward slashes
> is basically escaped and ignored by the parser. All chars embedded within
> forward slashes are treated as one token even if it contains other special
> chars like * []?{} or whitespaces. This token is subsequently passed to a
> pluggable "parser extension" with builds a query from the embedded string. I
> do not interpret the embedded string in any way but leave all the subsequent
> work to the parser extension. Such an extension could be another full
> featured query parser itself or simply a ctor call for regex query. The
> interface remains quiet simple but makes the parser extendible in an easy way
> compared to modifying the javaCC sources.
> The downsides of this patch is clearly that I introduce a new special char
> into the syntax but I guess that would not be that much of a deal as it is
> reflected in the escape method though. It would truly be nice to have more
> than once extension an have this even more flexible so treat this patch as a
> kickoff though.
> Another way of solving the problem with RegexQuery would be to move the JDK
> version of regex into the core and simply have another method like:
> {code}
> protected Query newRegexQuery(Term t) {
> ...
> }
> {code}
> which I would like better as it would be more consistent with the idea of the
> query parser to be a very strict and defined parser.
> I will upload a patch in a second which implements the extension based
> approach I guess I will add a second patch with regex in core soon too.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]