An update on this: The problem occurs on phrase queries, using edismax, where the term in the nested query contains a multi-word synonym. In the example above, dog has a multiterm synonym "canis familiaris", and aspirin has "acetylsalicylic acid".
Creating a JIRA ticket. Thank you, Elizabeth On Wed, Apr 18, 2018 at 12:38 PM, Elizabeth Haubert < ehaub...@opensourceconnections.com> wrote: > I'm seeing pf and pf3 clauses fail to generate in long queries containing > synonyms. Wondering if anyone else has run into this, or if it needs to be > submitted as a bug in Jira. It is a showstopper problem for the current > project, as the pf and pf3 were pretty heavily tuned. > > Using Solr 7.1; all fields are using the following type: > > With query-time synonyms: > <fieldType name="my_text_general" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords_nostem.txt"/> > <filter class="solr.KStemFilterFactory"/> > <filter class="solr.FlattenGraphFilterFactory" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > <filter class="solr.SynonymGraphFilterFactory" > managed="synonyms_all" /> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords_nostem.txt"/> > <filter class="solr.KStemFilterFactory"/> > </analyzer> > <similarity class="solr.ClassicSimilarityFactory" /> > </fieldType> > > Without query-time synonyms: > <fieldType name="my_text_general" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > <filter class="solr.SynonymGraphFilterFactory" > managed="synonyms_all" /> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords_nostem.txt"/> > <filter class="solr.KStemFilterFactory"/> > <filter class="solr.FlattenGraphFilterFactory" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords_nostem.txt"/> > <filter class="solr.KStemFilterFactory"/> > </analyzer> > <similarity class="solr.ClassicSimilarityFactory" /> > </fieldType> > > Synonyms file is pretty long, so I'll just include the relevent bits for > an example: > > allergic, hypersensitive > aspirin, acetylsalicylic acid > dog, canine, canis familiris, k 9 > rat, rattus > > > The problem seems to occur when part of the query has a synonym, but the > whole phrase is not. Whitespace added to piece out what is going on; > believe any parentheses errors are due to my tinkering around. Beyond that > though, this is as from Solr. Slop has been tinkered with to identify > PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in > 1, pf3 ending in 2 eg ~10, ~11, ~12, etc. > > ============= > Example 1: "aspirin dose in rats" > ============== > > With query-time synonyms: > =============== > /// Q terms generate as expected /// > +((((kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 | > (species:\"acetylsalicylic acid\" species:aspirin) | > (keywords_bm25_no_norms:\"acetylsalicylic acid\" > keywords_bm25_no_norms:aspirin)^50.0 > | (description:\"acetylsalicylic acid\" description:aspirin) | > (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 | > (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic > acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic > acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 | > (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4 > ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 | > Synonym(species:dosage species:dose species:dose species:dose) | > (Synonym(keywords_bm25_no_norms:dosage keywords_bm25_no_norms:dose > keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose))^50.0 | > Synonym(description:dosage description:dose description:dose > description:dose) | (Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose > kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose) > | (Synonym(title:dosage title:dose title:dose title:dose))^100.0 | > (Synonym(keywordsranked_bm25_no_norms:dosage keywordsranked_bm25_no_norms:dose > keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose))^50.0 > | Synonym(authors:dosage authors:dose authors:dose authors:dose))~0.4 > ((Synonym(kw1:rat kw1:rattu))^100.0 | Synonym(species:rat species:rattu) | > (Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 | > Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat > kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat > title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat > keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat > authors:rattu))~0.4)~3) > > /// PF and PF2 are missing. /// > () () () () () > > /// This is actually PF3 with a missing ? where the stopword 'in' > belonged. /// > ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 | > (keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu > rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu > rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose) > (rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu > rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu > rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms( > const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map( > int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", > > With index-time synonyms: > =============== > > /// Q /// > "boost(+((((kw1:aspirin)^100.0 | species:aspirin | > (keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin | > (kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 | > (keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4 > ((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 | > description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0 > | (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4 > ((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 | > description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0 > | (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3) > /// PF /// > ((title:\"aspirin dose ? rats\"~20)^5000.0 | > (keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 | > (keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 | > (text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ? > rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4 > ((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ? > rats\")~0.4 > > /// PF2 /// > ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin > dose\"~11 | species:\"aspirin dose\"~11)~0.4 > > /// PF3 /// > (((title:\"aspirin dose\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"aspirin > dose\"~22)^1000.0 | (text:\"aspirin dose\"~22)^100.0)~0.4 ((title:\"dose ? > rats\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"dose ? > rats\"~22)^1000.0 | (text:\"dose ? rats\"~22)^100.0)~0.4) > (((keywords_bm25_no_norms:\"aspirin dose\"~12)^500.0 | > (kw1ranked:\"aspirin dose\"~12)^100.0 | (kw1:\"aspirin > dose\"~12)^100.0)~0.4 ((keywords_bm25_no_norms:\"dose ? rats\"~12)^500.0 > | (kw1ranked:\"dose ? rats\"~12)^100.0 | (kw1:\"dose ? > rats\"~12)^100.0)~0.4),product(max(10.0/(3.16E-11* > float(ms(const(1555545600000),date(dateint)))+6.0),int( > documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5) > ,null),0.5,2.0)))", > > > =============== > Example 2: "allergic reaction dogs" > The underlying issue isn't specifically PF, PF2, PF3. The following > example picks up PF2, but not PF or PF3 > =============== > > With Query-time synonyms: > /// Q /// > parsedquery_toString":"boost( > +((((Synonym(kw1:allergic kw1:allergy kw1:hypersensitive > kw1:hypersensitive))^100.0 | Synonym(species:allergic species:allergy > species:hypersensitive species:hypersensitive) | > (Synonym(keywords_bm25_no_norms:allergic > keywords_bm25_no_norms:allergy keywords_bm25_no_norms:hypersensitive > keywords_bm25_no_norms:hypersensitive))^50.0 | > Synonym(description:allergic description:allergy description:hypersensitive > description:hypersensitive) | (Synonym(kw1ranked:allergic kw1ranked:allergy > kw1ranked:hypersensitive kw1ranked:hypersensitive))^100.0 | > Synonym(text:allergic text:allergy text:hypersensitive text:hypersensitive) > | (Synonym(title:allergic title:allergy title:hypersensitive > title:hypersensitive))^100.0 | (Synonym(keywordsranked_bm25_no_norms:allergic > keywordsranked_bm25_no_norms:allergy > keywordsranked_bm25_no_norms:hypersensitive > keywordsranked_bm25_no_norms:hypersensitive))^50.0 | > Synonym(authors:allergic authors:allergy authors:hypersensitive > authors:hypersensitive))~0.4 ((kw1:reaction)^100.0 | species:reaction | > (keywords_bm25_no_norms:reaction)^50.0 | description:reaction | > (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 | > (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4 > ((kw1:\"cani familiari\" kw1:canine kw1:\"k 9\" kw1:\"cani lupu familiari\" > kw1:dog)^100.0 | (species:\"cani familiari\" species:canine species:\"k 9\" > species:\"cani lupu familiari\" species:dog) | > (keywords_bm25_no_norms:\"cani familiari\" keywords_bm25_no_norms:canine > keywords_bm25_no_norms:\"k 9\" keywords_bm25_no_norms:\"cani lupu > familiari\" keywords_bm25_no_norms:dog)^50.0 | (description:\"cani > familiari\" description:canine description:\"k 9\" description:\"cani lupu > familiari\" description:dog) | (kw1ranked:\"cani familiari\" > kw1ranked:canine kw1ranked:\"k 9\" kw1ranked:\"cani lupu familiari\" > kw1ranked:dog)^100.0 | (text:\"cani familiari\" text:canine text:\"k 9\" > text:\"cani lupu familiari\" text:dog) | (title:\"cani familiari\" > title:canine title:\"k 9\" title:\"cani lupu familiari\" title:dog)^100.0 | > (keywordsranked_bm25_no_norms:\"cani familiari\" > keywordsranked_bm25_no_norms:canine keywordsranked_bm25_no_norms:\"k 9\" > keywordsranked_bm25_no_norms:\"cani lupu familiari\" > keywordsranked_bm25_no_norms:dog)^50.0 | (authors:\"cani familiari\" > authors:canine authors:\"k 9\" authors:\"cani lupu familiari\" > authors:dog))~0.4)~3) > > /// PF /// > () () () () > > /// PF2 //// > (authors:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11 > | species:\"(hypersensitive allergy hypersensitive allergic) > reaction\"~11)~0.4 > > /// PF3 /// > () (), > product(max(10.0/(3.16E-11*float(ms(const(1555545600000), > date(dateint)))+6.0),int(documentdatefix)),scale(map( > int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", > > With index-timy synonyms: > /// Q /// > +((((kw1:allergic)^100.0 | species:allergic | > (keywords_bm25_no_norms:allergic)^50.0 > | description:allergic | (kw1ranked:allergic)^100.0 | text:allergic | > (title:allergic)^100.0 | (keywordsranked_bm25_no_norms:allergic)^50.0 | > authors:allergic)~0.4 ((kw1:reaction)^100.0 | species:reaction | > (keywords_bm25_no_norms:reaction)^50.0 | description:reaction | > (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 | > (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4 > ((kw1:dog)^100.0 | species:dog | (keywords_bm25_no_norms:dog)^50.0 | > description:dog | (kw1ranked:dog)^100.0 | text:dog | (title:dog)^100.0 | > (keywordsranked_bm25_no_norms:dog)^50.0 | authors:dog)~0.4)~3) > > /// PF /// > ((title:\"allergic reaction dog\"~20)^5000.0 | > (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~20)^5000.0 | > (keywords_bm25_no_norms:\"allergic reaction dog\"~20)^1500.0 | > (text:\"allergic reaction dog\"~20)^1000.0)~0.4 ((kw1ranked:\"allergic > reaction dog\"~10)^5000.0 | (kw1:\"allergic reaction dog\"~10)^500.0)~0.4 > ((authors:\"allergic reaction dog\")^250.0 | description:\"allergic > reaction dog\")~0.4 ((text:\"allergic reaction dog\"~100)^500.0)~0.4 > > /// PF2 /// > ((authors:\"allergic reaction\"~11 | species:\"allergic reaction\"~11)~0.4 > > /// PF3 /// > (authors:\"reaction dog\"~11 | species:\"reaction dog\"~11)~0.4) > ((title:\"allergic reaction dog\"~22)^1000.0 | > (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~22)^1000.0 | > (text:\"allergic reaction dog\"~22)^100.0)~0.4 > ((keywords_bm25_no_norms:\"allergic > reaction dog\"~12)^500.0 | (kw1ranked:\"allergic reaction dog\"~12)^100.0 | > (kw1:\"allergic reaction dog\"~12)^100.0)~0.4,product( > max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int( > documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5) > ,null),0.5,2.0)))", > > > Working on getting this rigged up in the debugger, but would appreciate > any feedback. > > Thank you, > Elizabeth >