I have a field title in my solr schema: <field name="title" type="text_en" termVectors="true" indexed="true" required="true" stored="true" />
text_en is defined as follows: <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100" docValues="false" multiValued="false"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true" /> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms_en.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.PorterStemFilterFactory" /> </analyzer> </fieldType> I'm encountering strange behaviour when using multi-word synonyms which contain stopwords. If the stopwords appear in the middle, it works fine. For example, if I have the following in my synonyms file (where i is a stopword): iphone, apple i phone And if I query: /select?q=iphone&qf=title&defType=edismax The parsed query is: +DisjunctionMaxQuery(((((+title:appl +title:phone) title:iphon)))) Same for query: /select?q=apple i phone&qf=title&defType=edismax But if stopwords appear at the start or end, then behaviour is unpredictable. In most of the cases, the entire synonym is dropped. For example, if I change my synonyms file to: iphone, i phone and do the same query again (with iphone), I get: +DisjunctionMaxQuery(((title:iphon))) I was expecting iphon and phone (as i would be dropped) in my dismax query. In some cases, behaviour is even more weird. For example, if my synonyms file is: between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best and I have ferns and best as my stopwords. If I do the following query: /select?q=netflix comedy&qf=title&defType=edismax I get this: +DisjunctionMaxQuery((((+title:between +title:two +title:galifianaki +title:show) (+title:netflix +title:2019 +title:comedi)))) which is kind of a very weird combinations. I'm not able to understand this behaviour and have not found anything related to this in documentation or internet. Maybe I'm missing something. Any help/pointers is highly appreciated. Solr version: 8.4.1