[jira] Commented: (LUCENE-2039) Regex support and beyond in JavaCC QueryParser

David Kaelbling (JIRA) Fri, 11 Dec 2009 08:31:41 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789374#action_12789374
 ]


David Kaelbling commented on LUCENE-2039:
-----------------------------------------

> I would suggest to modify the ExtensionQuery ctor to take a QueryParser 
> instance and add 
> the corresponding getters to it. That way we can maintain a consistent view 
> on the setting 
> even if they are reset on the master parser without overriding all setters 
> though. Would 
> that make sense?

Simon, it sounds like the right direction! But relying on ExtensionParser 
implementors to manually copy all the parent settings to the child seems like a 
maintenance problem.  We add new settings relatively often. Unfortunately 
there's nothing like the token Attribute stuff for QueryParser...


> Another way of enable this is to pass the query parser instance into 
> ExtensionQuery and 
> simply add a getter so extension parsers can access the parser and its 
> utilities too.

Umm, I don't quite follow.  If the extension wraps an existing parser, the 
extension would have to subclass it, override all the getters/setters to 
delegate to the parent, and trust that everyone uses them?  That's not 
currently true -- for example QueryParser.getPrefixQuery() directly accesses 
allowLeadingWildcard, without using the getter.  There are probably other cases 
too, that's just the first one I checked.


> Regex support and beyond in JavaCC QueryParser
> ----------------------------------------------
>
>                 Key: LUCENE-2039
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2039
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2039.patch, LUCENE-2039_field_ext.patch, 
> LUCENE-2039_field_ext.patch, LUCENE-2039_field_ext.patch, 
> LUCENE-2039_field_ext.patch, LUCENE-2039_field_ext.patch
>
>
> Since the early days the standard query parser was limited to the queries 
> living in core, adding other queries or extending the parser in any way 
> always forced people to change the grammar file and regenerate. Even if you 
> change the grammar you have to be extremely careful how you modify the parser 
> so that other parts of the standard parser are affected by customisation 
> changes. Eventually you had to live with all the limitation the current 
> parser has like tokenizing on whitespaces before a tokenizer / analyzer has 
> the chance to look at the tokens. 
> I was thinking about how to overcome the limitation and add regex support to 
> the query parser without introducing any dependency to core. I added a new 
> special character that basically prevents the parser from interpreting any of 
> the characters enclosed in the new special characters. I choose the forward 
> slash  '/' as the delimiter so that everything in between two forward slashes 
> is basically escaped and ignored by the parser. All chars embedded within 
> forward slashes are treated as one token even if it contains other special 
> chars like * []?{} or whitespaces. This token is subsequently passed to a 
> pluggable "parser extension" with builds a query from the embedded string. I 
> do not interpret the embedded string in any way but leave all the subsequent 
> work to the parser extension. Such an extension could be another full 
> featured query parser itself or simply a ctor call for regex query. The 
> interface remains quiet simple but makes the parser extendible in an easy way 
> compared to modifying the javaCC sources.
> The downsides of this patch is clearly that I introduce a new special char 
> into the syntax but I guess that would not be that much of a deal as it is 
> reflected in the escape method though. It would truly be nice to have more 
> than once extension an have this even more flexible so treat this patch as a 
> kickoff though.
> Another way of solving the problem with RegexQuery would be to move the JDK 
> version of regex into the core and simply have another method like:
> {code}
> protected Query newRegexQuery(Term t) {
>   ... 
> }
> {code}
> which I would like better as it would be more consistent with the idea of the 
> query parser to be a very strict and defined parser.
> I will upload a patch in a second which implements the extension based 
> approach I guess I will add a second patch with regex in core soon too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2039) Regex support and beyond in JavaCC QueryParser

Reply via email to