[ https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir moved SOLR-3158 to LUCENE-3821: ------------------------------------------- Component/s: (was: search) Lucene Fields: New Affects Version/s: (was: 3.5) 4.0 3.5 Key: LUCENE-3821 (was: SOLR-3158) Project: Lucene - Java (was: Solr) > search slop problem introduced somewhere between Solr 1.4 and Solr 3.5 > ---------------------------------------------------------------------- > > Key: LUCENE-3821 > URL: https://issues.apache.org/jira/browse/LUCENE-3821 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 3.5, 4.0 > Reporter: Naomi Dushay > Attachments: schema.xml, solrconfig-test.xml > > > In upgrading from Solr 1.4 to Solr 3.5, the following phrase searches stopped > working in dismax: > "The Beatles as musicians : Revolver through the Anthology" > "Color-blindness [print/digital]; its dangers and its detection" > Both of these queries have a repeated work, and have many terms. It's not > the number of terms or the colon surrounded by spaces, because the following > phrase search works in Solr 3.5 (and Solr 1.4): > "International encyclopedia of revolution and protest : 1500 to the > present" > With Robert Muir's help, we have narrowed the problem down to slop > (proximity in lucene QueryParser, query slop in dismax). I have included > debugQuery details for the Beatles search; I confirmed the same behavior > with the color-blindness search. > Solr 3.5: it fails when (query) slop setting isn't 0. > ---- > lucene QueryParser with proximity set to 1 (or anything > 0) : no match > URL: q=all_search:"The Beatles as musicians : Revolver through the > Anthology"~1 > final query: all_search:"the beatl as musician revolv through the > antholog"~1 > lucene QueryParser with proximity set to 0: result! > URL: q=all_search:"The Beatles as musicians : Revolver through the > Anthology" > final query: all_search:"the beatl as musician revolv through the antholog" > 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through > the antholog" in 1064395), product of: > <snip> > 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 > musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) > <snip> > dismax QueryParser with qs=1: no match > ps=0 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=1&ps=0 > final query: +(all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the > antholog")~0.01 > ps=1 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=1&ps=1 > final query: +(all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 > dismax QueryParser with qs=0: result! > ps=0 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=0&ps=0 > final query: +(all_search:"the beatl as musician revolv through the > antholog")~0.01 (all_search:"the beatl as musician revolv through the > antholog")~0.01 > ps=1 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=0&ps=1 > final query: +(all_search:"the beatl as musician revolv through the > antholog")~0.01 (all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 > 8.564867 = (MATCH) sum of: > 4.2824335 = (MATCH) weight(all_search:"the beatl as musician revolv > through the antholog" in 1064395), product of: > <snip> > 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 > musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) > <snip> > Solr 1.4: it works regardless of slop settings > ---- > lucene QueryParser with any proximity value: result! > ~0 > URL: q=all_search:"The Beatles as musicians : Revolver through the > Anthology" > final query: all_search:"the beatl as musician revolv through the antholog" > ~1 > URL: q=all_search:"The Beatles as musicians : Revolver through the > Anthology"~1 > final query: all_search:"the beatl as musician revolv through the > antholog"~1 > 5.2672544 = fieldWeight(all_search:"the beatl as musician revolv through > the antholog" in 3469163), product of: > <snip> > 48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 > musician=11992 revolv=822 through=88522 the=3549637 antholog=11246) > <snip> > dismax QueryParser with any qs: result! > qs=0, ps=0 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=0&ps=0 > final query: +(all_search:"the beatl as musician revolv through the > antholog")~0.01 (all_search:"the beatl as musician revolv through the > antholog")~0.01 > qs=0, ps=1 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=0&ps=1 > final query: +(all_search:"the beatl as musician revolv through the > antholog")~0.01 (all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 > dismax QueryParser with qs=0: result! > qs=1, ps=0 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=1&ps=0 > final query: +(all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the > antholog")~0.01 > qs=1, ps=1 > URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver > through the Anthology"&qs=1&ps=1 > final query: +(all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the > antholog"~1)~0.01 > 7.4490223 = (MATCH) sum of: > 3.7245111 = weight(all_search:"the beatl as musician revolv through the > antholog"~1 in 3469163), product of: > <snip> > 48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 > musician=11992 revolv=822 through=88522 the=3549637 antholog=11246) > <snip> > More information: > schema.xml: > <field name="all_search" type="text" indexed="true" stored="false" /> > solr 3.5: > <fieldtype name="text" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.ICUFoldingFilterFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > splitOnCaseChange="1" generateWordParts="1" catenateWords="1" > splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1" > catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" /> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > </fieldtype> > solr1.4: > <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="schema.UnicodeNormalizationFilterFactory" > version="icu4j" composed="false" remove_diacritics="true" > remove_modifiers="true" fold="true" /> > <filter class="solr.WordDelimiterFilterFactory" > splitOnCaseChange="1" generateWordParts="1" catenateWords="1" > splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1" > catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > </fieldtype> > And the analysis page shows the same results for Solr 3.5 and 1.4 > Solr 3.5: > position 1 2 3 4 5 6 7 8 > term text the beatl as musician revolv through the > antholog > keyword false false false false false false false false > startOffset 0 4 12 15 27 36 44 48 > endOffset 3 11 14 24 35 43 47 57 > type word word word word word word word word > Solr 1.4: > term position 1 2 3 4 5 6 7 > 8 > term text the beatl as musician revolv through the > antholog > term type word word word word word word word word > source start,end 0,3 4,11 12,14 15,24 27,35 36,43 44,47 > 48,57 > For debug purposes, we can consider the Solr document as: > <doc> > <str name="all_search">The Beatles as musicians : Revolver through the > Anthology</str> > </doc> > I can't attached the full SolrDoc as all_search is indexed, but not stored, > and I use SolrJ to write to the index from java objects ... plus our objects > have a zillion fields (I work in a library with very rich metadata and very > exacting solr fields). I have attached the Solr 3.5 schema and solrconfig, > but they are big and ugly for the same reasons. > For more details, see the erroneously titled email thread "result present in > Solr 1.4 but missing in Solr 3.5, dismax only" started on 2012-02-22 on > solr-u...@lucene.apache.org. > - Naomi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org