Hi Andrea, This is a long-standing issue: see https://issues.apache.org/jira/browse/LUCENE-4065 <https://issues.apache.org/jira/browse/LUCENE-4065> and https://issues.apache.org/jira/browse/LUCENE-8250 <https://issues.apache.org/jira/browse/LUCENE-8250> for discussion. I don’t think we’ve reached a consensus on how to fix it yet, but more examples are good.
Unfortunately I don’t think changing the StopFilter to ignore SYNONYM tokens will work, because then you’ll generate queries that always fail - they’ll search for ‘of’ in the middle of the phrase, but ‘of’ never gets indexed because it’s removed by the StopFilter at index time. - Alan > On 26 Jul 2018, at 08:04, Andrea Gazzarini <a.gazzar...@sease.io > <mailto:a.gazzar...@sease.io>> wrote: > > Hi, > I have the following field type definition: > <fieldtype name="text" class="solr.TextField" > autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms.txt" ignoreCase="false" expand="true"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="false"/> > </analyzer> > </fieldtype> > Where synonyms and stopwords are defined as follows: > > synonyms = out of warranty,oow > stopwords = of > > Running the following query: > > q=my tv went out of warranty something of > > I get wrong results, with the following explain: > > title:my title:tv title:went (title:oow PhraseQuery(title:"out ? warranty > something")) > > That is, the synonyms is correctly detected, I see the graph information are > correctly reported in the positionLength, it seems they are wrongly > interpreted by the QueryParser. > I guess the reason is the "of" removal operated by the StopFilter, which > removes the "of" term within the phrase (I wouldn't want that) > creates a "hole" in the span defined by the "oow" term, which has been marked > as a synonym with a positionLength = 3, therefore including the next > available term (something). > I tried to change the StopFilter in order to ignore stopwords that are marked > as SYNONYM or that are part of a previous synonym span, and it works: it > correctly produces the following query: > > title:my title:tv title:went (title:oow PhraseQuery(title:"out of warranty")) > title:something > > So I'd like to ask your opinion about this. Am I missing something? Do you > think it's better to open a JIRA issue? If the solution is a graph aware stop > filter, do you think it's better to change the existing filter or to subclass > it? > > Best, > Andrea > >