[ https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373554#comment-16373554 ]
Jim Ferenczi commented on SOLR-11968: ------------------------------------- .bq I think you're wrong, [~jim.ferenczi]. well it depends how you see the problem. I agree that the gap could be inferred when we build the graph, I have a patch that does that but there are some cases where we just can't. For instance the following synonym rules: `twd, the walking dead` creates a broken token stream if you set a stop word filter that removes "the" after the synonym filter: || ||twd||walking||dead|| |posinc|1|1|1| |poslen|3|1|1| The gap produced by "the" is not propagated to the posInc of "walking" because the stop word appears on a token with a posInc equals to 0. There are other cases where it is not possible to "fix" the graph produced by the token stream which is why I said that a stop filter that would remove gaps is IMO the best solution. .bq AFAICT Robert is suggesting a StopFilter *mode* that would *optionally* remove gaps. IOW its current behavior would remain (and be the default). Yes I know that it would be an optional mode but at least it would allow to remove stop words inside a multi words synonyms. > Multi-words query time synonyms > ------------------------------- > > Key: SOLR-11968 > URL: https://issues.apache.org/jira/browse/SOLR-11968 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers, Schema and Analysis > Affects Versions: master (8.0), 6.6.2 > Environment: Centos 7.x > Reporter: Dominique Béjean > Assignee: Steve Rowe > Priority: Major > > I am trying multi words query time synonyms with Solr 6.6.2 and > SynonymGraphFilterFactory filter as explain in this article > > [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/] > > My field type is : > {code:java} > <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.ElisionFilterFactory" ignoreCase="true" > articles="lang/contractions_fr.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.FrenchMinimalStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.ElisionFilterFactory" ignoreCase="true" > articles="lang/contractions_fr.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.FrenchMinimalStemFilterFactory"/> > </analyzer> > </fieldType>{code} > > synonyms.txt contains the line : > {code:java} > om, olympique de marseille{code} > > stopwords.txt contains the word > {code:java} > de{code} > > The order of words in my query has an impact on the generated query in > edismax > {code:java} > q={!edismax qf='name_text_gp' v=$qq} > &sow=false > &qq=...{code} > with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the > synonyms expansion. It is working as expected. > {code:java} > "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil > +name_text_gp:maillot) name_text_gp:om))", > "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu > +name_text_gp:marseil +name_text_gp:maillot)))",{code} > with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the > same generated query > {code:java} > "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", > "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code} > I don't understand these generated queries. The first one looks like the > synonym expansion is ignored, but the second one shows it is not ignored and > only the synonym term is used. > > When I test the analisys for the field type the synonyms are correctly > expanded for both expressions > {code:java} > om maillot > maillot om > olympique de marseille maillot > maillot olympique de marseille{code} > resulting outputs always include the following terms (obvioulsly not always > in the same order) > {code:java} > olympiqu om marseil maillot {code} > > So, i suspect an issue with edismax query parser. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org