SynonymGraphFilter followed by StopFilter

Andrea Gazzarini Thu, 26 Jul 2018 00:05:21 -0700

Hi,
I have the following field type definition:

<fieldtype name="text" class="solr.TextField"autoGeneratePhraseQueries="true">

    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"ignoreCase="false" expand="true"/><filter class="solr.StopFilterFactory" words="stopwords.txt"ignoreCase="false"/>

    </analyzer>
</fieldtype>


Where synonyms and stopwords are defined as follows:

synonyms = out of warranty,oow
stopwords = of

Running the following query:

q=my tv went out *of* warranty something *of*

I get wrong results, with the following explain:

title:my title:tv title:went (title:oow *PhraseQuery(title:"out ?warranty something"))*

That is, the synonyms is correctly detected, I see the graph informationare correctly reported in the positionLength, it seems they are wronglyinterpreted by the QueryParser.

I guess the reason is the "of" removal operated by the StopFilter, which

 * removes the "of" term within the phrase (I wouldn't want that)
 * creates a "hole" in the span defined by the "oow" term, which has
   been marked as a synonym with a positionLength = 3, therefore
   including the next available term (something).

I tried to change the StopFilter in order to ignore stopwords that aremarked as SYNONYM or that are part of a previous synonym span, and itworks: it correctly produces the following query:

title:my title:tv title:went *(title:oow PhraseQuery(title:"out ofwarranty"))* title:something

So I'd like to ask your opinion about this. Am I missing something? Doyou think it's better to open a JIRA issue? If the solution is a graphaware stop filter, do you think it's better to change the existingfilter or to subclass it?


Best,
Andrea

SynonymGraphFilter followed by StopFilter

Reply via email to