[ 
https://issues.apache.org/jira/browse/SOLR-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826939#comment-15826939
 ] 

Georg Sorst commented on SOLR-9968:
-----------------------------------

I've implemented a fix ({{SOLR-9968-configurable-tokenizer.patch}}) for this 
that fulfills my use case: Make the tokenizer to use for context filter queries 
configurable. This makes it possible to use {{KeywordTokenizer}}, which handles 
special characters just fine.

The config setting is {{contextFilterQueryTokenizer}}, it defaults to 
{{StandardTokenizer}}.

The patch also contains a testcase.

The configuration uses the registered name (eg. {{keyword}}, {{standard}}) of 
the Tokenizer instead of the class name (eg. {{solr.KeywordTokenizerFactory}}, 
{{solr.StandardTokenizerFactory}}. I would have preferred the latter way but 
couldn't figure out how to do this properly.
I'll gladly change the behavior if it makes sense and someone can point me in 
the right direction.

> Cannot use special characters in Suggester Context Query
> --------------------------------------------------------
>
>                 Key: SOLR-9968
>                 URL: https://issues.apache.org/jira/browse/SOLR-9968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Suggester
>    Affects Versions: 6.0, 6.3
>            Reporter: Georg Sorst
>         Attachments: SOLR-9968-configurable-tokenizer.patch, 
> test_context_query_with_special_characters.patch
>
>
> h4. Reproduce
> 1. Configure the Suggester to use a {{contextField}}, eg. {{context}}
> 2. Add a document containing special characters in that field, eg. '{{c#x}}'
> 3. Use a context query with the Suggester, eg. 
> {noformat}suggest.cfq=context:c#x{noformat}
>   * Escaping the character makes no difference, eg. 
> {noformat}suggest.cfq=context:c\#x{noformat}
> h4. What happens
> The suggestions are not properly filtered
> h4. What should happen
> The suggestions should be limited to documents where the field {{context}} is 
> '{{c#x}}'
> ----
> What happens is this:
> 1. {{SolrSuggester.contextFilterQueryAnalyzer}} is hardwired to use 
> {{StandardTokenizer}}
> 2. The context query is parsed like this:
> {code:title=SolrSuggester.parseContextFilterQuery}
> query = new 
> StandardQueryParser(contextFilterQueryAnalyzer).parse(contextFilter, 
> CONTEXTS_FIELD_NAME);
> {code}
> 3. The {{StandardQueryParser}} together with {{StandardTokenizer}} will turn 
> the context query into '{{context:c context:x}}'
> 4. This is used for filtering the suggestions
> 5. Thus, the suggestion where {{context}} is '{{c(x}}' is not returned
> Attached is an extension to {{SuggestComponentContextFilterQueryTest}} to 
> reproduce this behavior.
> So, the question is, how to get the parser and tokenizer to use these special 
> characters verbatim? Two ways I can think of:
> * Make {{contextFilterQueryAnalyzer}} configurable so {{KeywordTokenizer}} 
> can be used
> * Use the analyzer defined for the context field in the schema



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to