[ 
https://issues.apache.org/jira/browse/SOLR-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Georg Sorst updated SOLR-9968:
------------------------------
    Description: 
h4. Reproduce

1. Configure the Suggester to use a {{contextField}}, eg. {{context}}
2. Add a document containing special characters in that field, eg. '{{c#x}}'
3. Use a context query with the Suggester, eg. 
{noformat}suggest.cfq=context:c#x{noformat}
  * Escaping the character makes no difference, eg. 
{noformat}suggest.cfq=context:c\#x{noformat}

h4. What happens

The suggestions are not properly filtered

h4. What should happen

The suggestions should be limited to documents where the field {{context}} is 
'{{c#x}}'

----

What happens is this:

1. {{SolrSuggester.contextFilterQueryAnalyzer}} is hardwired to use 
{{StandardTokenizer}}
2. The context query is parsed like this:
{code:title=SolrSuggester.parseContextFilterQuery}
query = new 
StandardQueryParser(contextFilterQueryAnalyzer).parse(contextFilter, 
CONTEXTS_FIELD_NAME);
{code}
3. The {{StandardQueryParser}} together with {{StandardTokenizer}} will turn 
the context query into '{{context:c context:x}}'
4. This is used for filtering the suggestions
5. Thus, the suggestion where {{context}} is '{{c(x}}' is not returned

Attached is an extension to {{SuggestComponentContextFilterQueryTest}} to 
reproduce this behavior.

So, the question is, how to get the parser and tokenizer to use these special 
characters verbatim? Two ways I can think of:

* Make {{contextFilterQueryAnalyzer}} configurable so {{KeywordTokenizer}} can 
be used
* Use the analyzer defined for the context field in the schema



  was:
h4. Reproduce

1. Configure the Suggester to use a {{contextField}}, eg. {{context}}
2. Add a document containing special characters in that field, eg. '{{c#x}}'
3. Use a context query with the Suggester, eg. 
{noformat}suggest.cfq=context:c#x{noformat}
  * Escaping the character makes no difference, eg. 
{noformat}suggest.cfq=context:c\#x{noformat}

h4. What happens

The suggestions are not properly filtered

h4. What should happen

The suggestions should be limited to documents where the field {{context}} is 
'{{c#x}}'

----

What happens is this:

1. {{SolrSuggester.contextFilterQueryAnalyzer}} is hardwired to use 
{{StandardTokenizer}}
2. The context query is parsed like this:
{code:title=SolrSuggester.parseContextFilterQuery}
query = new 
StandardQueryParser(contextFilterQueryAnalyzer).parse(contextFilter, 
CONTEXTS_FIELD_NAME);
{code}
3. The {{StandardQueryParser}} together with {{StandardTokenizer}} will turn 
the context query into '{{context:c context:x}}'
4. This is used for filtering the suggestions
5. Thus, the suggestion where {{context}} is '{{c(x}}' is not returned

So, the question is, how to get the parser and tokenizer to use these special 
characters verbatim? Two ways I can think of:

* Make {{contextFilterQueryAnalyzer}} configurable so {{KeywordTokenizer}} can 
be used
* Use the analyzer defined for the context field in the schema


> Cannot use special characters in Suggester Context Query
> --------------------------------------------------------
>
>                 Key: SOLR-9968
>                 URL: https://issues.apache.org/jira/browse/SOLR-9968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Suggester
>    Affects Versions: 6.0, 6.3
>            Reporter: Georg Sorst
>         Attachments: test_context_query_with_special_characters.patch
>
>
> h4. Reproduce
> 1. Configure the Suggester to use a {{contextField}}, eg. {{context}}
> 2. Add a document containing special characters in that field, eg. '{{c#x}}'
> 3. Use a context query with the Suggester, eg. 
> {noformat}suggest.cfq=context:c#x{noformat}
>   * Escaping the character makes no difference, eg. 
> {noformat}suggest.cfq=context:c\#x{noformat}
> h4. What happens
> The suggestions are not properly filtered
> h4. What should happen
> The suggestions should be limited to documents where the field {{context}} is 
> '{{c#x}}'
> ----
> What happens is this:
> 1. {{SolrSuggester.contextFilterQueryAnalyzer}} is hardwired to use 
> {{StandardTokenizer}}
> 2. The context query is parsed like this:
> {code:title=SolrSuggester.parseContextFilterQuery}
> query = new 
> StandardQueryParser(contextFilterQueryAnalyzer).parse(contextFilter, 
> CONTEXTS_FIELD_NAME);
> {code}
> 3. The {{StandardQueryParser}} together with {{StandardTokenizer}} will turn 
> the context query into '{{context:c context:x}}'
> 4. This is used for filtering the suggestions
> 5. Thus, the suggestion where {{context}} is '{{c(x}}' is not returned
> Attached is an extension to {{SuggestComponentContextFilterQueryTest}} to 
> reproduce this behavior.
> So, the question is, how to get the parser and tokenizer to use these special 
> characters verbatim? Two ways I can think of:
> * Make {{contextFilterQueryAnalyzer}} configurable so {{KeywordTokenizer}} 
> can be used
> * Use the analyzer defined for the context field in the schema



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to