Andy Webb created SOLR-14189:
--------------------------------

             Summary: Some whitespace characters bypass zero-length test in 
query parsers
                 Key: SOLR-14189
                 URL: https://issues.apache.org/jira/browse/SOLR-14189
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: query parsers
            Reporter: Andy Webb


The edismax and some other query parsers treat pure whitespace queries as empty 
queries, but they use Java's {{String.trim()}} method to normalise queries. 
That method only treats characters 0-32 as whitespace. Other whitespace 
characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - which bypass the test 
and lead to {{400 Bad Request}} responses - see for example 
{{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
{{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
exception:
{noformat}
org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at 
line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> 
... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... 
<REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ... 
<TERM> ...
{noformat}

(PR incoming!)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to