[ https://issues.apache.org/jira/browse/SOLR-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erick Erickson resolved SOLR-1196. ---------------------------------- Resolution: Won't Fix 2013 Old JIRA cleanup > Incorrect matches when using non alphanumeric search string !@#$%\^\&\*\(\) > --------------------------------------------------------------------------- > > Key: SOLR-1196 > URL: https://issues.apache.org/jira/browse/SOLR-1196 > Project: Solr > Issue Type: Bug > Affects Versions: 1.3 > Environment: Solr 1.3/ Java 1.6/ Win XP/Eclipse 3.3 > Reporter: Sam Michael > > When matching strings that do not include alphanumeric chars, all the data is > returned as matches. (There is actually no match, so nothing should be > returned.) > When I run a query like - (activity_type:NAME) AND title:(\!@#$%\^&\*\(\)) > all the documents are returned even though there is not a single match. There > is no title that matches the string (which has been escaped). > My document structure is as follows > <doc> > <str name="activity_type">NAME</str> > <str name="title">Bathing</str> > .... > </doc> > The title field is of type text_title which is described below. > <fieldType name="text_title" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > ----------------------------------------------------- > Yonik's analysis as follows. > <str name="rawquerystring">-features:foo features:(\!@#$%\^&\*\(\))</str> > <str name="querystring">-features:foo features:(\!@#$%\^&\*\(\))</str> > <str name="parsedquery">-features:foo</str> > <str name="parsedquery_toString">-features:foo</str> > The text analysis is throwing away non alphanumeric chars (probably > the WordDelimiterFilter). The Lucene (and Solr) query parser throws > away term queries when the token is zero length (after analysis). > Solr then interprets the left over "-features:foo" as "all documents > not containing foo in the features field", so you get a bunch of > matches. > As per his suggestion, a bug is filed. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org