Diogo Guilherme Leão Edelmuth created SOLR-10980:
----------------------------------------------------

             Summary: SynonymGraphFilterFactory proximity search error
                 Key: SOLR-10980
                 URL: https://issues.apache.org/jira/browse/SOLR-10980
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
    Affects Versions: 6.6
            Reporter: Diogo Guilherme Leão Edelmuth


There seems to be an issue when doing proximity searches that include terms 
that have multi-word synonyms.

Example:
consider there's is configured in synonyms.txt
(
grand mother, grandmother
grandfather, granddad
)
and there's an indexed field with: (My mother and my grandmother went...)

Proximity search with: ("mother grandmother"~8)
won't return the file, while ("father grandfather"~8) does return the analogous 
file.

I am not a developer of Solr, so pardon if I am wrong, but I ran it with 
debug=query and saw that when proximity searches are done with multi-term 
synonyms, the called function is spanNearQuery: 
"parsedquery":"SpanNearQuery(spanNear([laudo:mother,
spanOr([laudo:grand mother, laudo:grandmother])],*0*, true))"

while proximity searches with one-term synonyms are executed with:
"MultiPhraseQuery(laudo:\"father (grandfather granddad)\"~10)"

Note that the SpanNearQuery is called with a slope parameter of 0, no matter 
what is passed after the tilde. So if I search the exact phrase it does match.


Here is my field-type, just in case:
<fieldType name="text_pt_synonyms_ascii_minimal_lightStem" 
class="solr.TextField" positionIncrementGap="100">

    <analyzer type="index">

        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" format="snowball" 
words="lang/stopwords_pt.txt" ignoreCase="true"/>
        <filter class="solr.PortugueseLightStemFilterFactory"/>
</analyzer>

    <analyzer type="query">

        <tokenizer class="solr.StandardTokenizerFactory"/><filter 
class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" format="snowball" 
words="lang/stopwords_pt.txt" ignoreCase="true"/><filter 
class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
        <filter class="solr.SynonymGraphFilterFactory" expand="true" 
ignoreCase="true" synonyms="synonyms_radex.txt"/>
        <filter class="solr.PortugueseLightStemFilterFactory"/>
</analyzer>

</fieldType>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to