[jira] [Commented] (SOLR-11968) Multi-words query time synonyms

Robert Muir (JIRA) Tue, 20 Feb 2018 20:04:25 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370916#comment-16370916
 ]


Robert Muir commented on SOLR-11968:
------------------------------------

I think the issue is still valid, its a little more complex now because of 
positionLength (means more buffering when you see posLength > 1, because you'll 
need to adjust if you remove something in its path), but the idea is the same: 
give the user a choice between "insert mode" and "replace mode". But this new 
"insert mode" would actually work correctly, correcting posLengths before and 
posIncs after as appropriate. similar to how your editor might have to 
recompute some line breaks/word wrapping and so on.

If you have baseball (length=2), base(length=1), ball(length=1), and you delete 
"base" in this case, you need to change baseball's length to 1 before you omit 
it, because you deleted base. Thats the "buffering before" that would be 
required for posLength. And you still need the same buffering described on the 
issue for posInc=0 that might occur after the fact, so you don't wrongly 
transfer synonyms to different words entirely.

It would be slower than "replace mode" that we have today, but only because of 
the buffering, and I think its pretty contained, but I haven't fully thought it 
thru or tried to write any code.

> Multi-words query time synonyms
> -------------------------------
>
>                 Key: SOLR-11968
>                 URL: https://issues.apache.org/jira/browse/SOLR-11968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers, Schema and Analysis
>    Affects Versions: master (8.0), 6.6.2
>         Environment: Centos 7.x
>            Reporter: Dominique Béjean
>            Priority: Major
>
> I am trying multi words query time synonyms with Solr 6.6.2 and 
> SynonymGraphFilterFactory filter as explain in this article
>  
> [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
>   
>  My field type is :
> {code:java}
> <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
>              ignoreCase="true" expand="true"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>    </fieldType>{code}
>  
>  synonyms.txt contains the line :
> {code:java}
> om, olympique de marseille{code}
>  
>  stopwords.txt contains the word 
> {code:java}
> de{code}
>  
>  The order of words in my query has an impact on the generated query in 
> edismax
> {code:java}
> q={!edismax qf='name_text_gp' v=$qq}
>  &sow=false
>  &qq=...{code}
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the 
> synonyms expansion. It is working as expected.
> {code:java}
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil 
> +name_text_gp:maillot) name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu 
> +name_text_gp:marseil +name_text_gp:maillot)))",{code}
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the 
> same generated query 
> {code:java}
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code}
> I don't understand these generated queries. The first one looks like the 
> synonym expansion is ignored, but the second one shows it is not ignored and 
> only the synonym term is used.
>   
>  When I test the analisys for the field type the synonyms are correctly 
> expanded for both expressions
> {code:java}
> om maillot  
>  maillot om
>  olympique de marseille maillot
>  maillot olympique de marseille{code}
> resulting outputs always include the following terms (obvioulsly not always 
> in the same order)
> {code:java}
> olympiqu om marseil maillot {code}
>  
>  So, i suspect an issue with edismax query parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11968) Multi-words query time synonyms

Reply via email to