I'm migrating from 3.x to 4.x and I'm running some queries to verify that
everything works like before. I've found however that the query "galaxy s3"
is giving much less results. In 3.x numFound=1628, in 4.x numFound=70.

Here's the relevant schema part:

<fieldtype name="text_pt" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="false">
   <analyzer type="index">
       <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="-" replacement="IIIHYPHENIII"/>
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.PatternReplaceFilterFactory"
pattern="IIIHYPHENIII" replacement="-"/>
       <filter class="solr.ASCIIFoldingFilterFactory" />
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" preserveOriginal="1"
catenateWords="1" catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="false"
words="portugueseStopWords.txt"/>
       <filter class="solr.BrazilianStemFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
       <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="-" replacement="IIIHYPHENIII"/>
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.PatternReplaceFilterFactory"
pattern="IIIHYPHENIII" replacement="-"/>
       <filter class="solr.ASCIIFoldingFilterFactory" />
       <filter class="solr.SynonymFilterFactory" ignoreCase="true"
synonyms="portugueseSynonyms.txt" expand="true"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
preserveOriginal="1" catenateNumbers="0" catenateAll="0"
protected="protwords.txt"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="false"
words="portugueseStopWords.txt"/>
       <filter class="solr.BrazilianStemFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer></fieldtype>

The synonyms involved in this query are:

siii, s3
galaxy, galax

My default search operator is AND (in both versions, even if it's
deprecated in 4.x), and the output of the debug is:

SOLR 3.x

<str name="parsedquery">+(title_search_pt:galaxy
title_search_pt:galax) +MultiPhraseQuery(title_search_pt:"(sii s3 s)
3")</str>

SOLR 4.x

<str name="parsedquery">+((title_search_pt:galaxy
title_search_pt:galax)/no_coord) +(+title_search_pt:sii
+title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str>

The weird thing is that it does not return results like 'galaxy s3'. This
is the debug query:

no match on required clause (+title_search_pt:sii +title_search_pt:s3
+title_search_pt:s +title_search_pt:3)
(NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s), *no
match on required clause (title_search_pt:sii)*
(NON-MATCH) no matching term
(MATCH) weight(title_search_pt:s3 in 1834535)
(MATCH) weight(title_search_pt:s in 1834535)
(MATCH) weight(title_search_pt:3 in 1834535)

How is that sii is *required* when it should be OR'ed with s and s3 ?

The analysis output shows that sii has token position 2, like it's
synonyms, like so:

galaxy  sii 3
galax   s3
        s

Thanks,

Raúl Cardozo.

Reply via email to