[ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir moved SOLR-3158 to LUCENE-3821:
-------------------------------------------

          Component/s:     (was: search)
        Lucene Fields: New
    Affects Version/s:     (was: 3.5)
                       4.0
                       3.5
                  Key: LUCENE-3821  (was: SOLR-3158)
              Project: Lucene - Java  (was: Solr)
    
> search slop problem introduced somewhere between Solr 1.4 and Solr 3.5
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-3821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3821
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 3.5, 4.0
>            Reporter: Naomi Dushay
>         Attachments: schema.xml, solrconfig-test.xml
>
>
> In upgrading from Solr 1.4 to Solr 3.5, the following phrase searches stopped 
> working in dismax:
>   "The Beatles as musicians : Revolver through the Anthology"
>   "Color-blindness [print/digital]; its dangers and its detection"
> Both of these queries have a repeated work, and have many terms.  It's not 
> the number of terms or the colon surrounded by spaces, because the following 
> phrase search works in Solr 3.5 (and Solr 1.4):
>     "International encyclopedia of revolution and protest : 1500 to the 
> present"
> With Robert Muir's help, we have narrowed the problem down to slop  
> (proximity in lucene QueryParser, query slop in dismax).   I have included 
> debugQuery details for  the Beatles search;  I confirmed the same behavior 
> with the color-blindness search.
> Solr 3.5:   it fails when (query) slop setting isn't 0.
> ----
> lucene QueryParser with proximity set to 1 (or anything > 0) :  no match
>   URL: q=all_search:"The Beatles as musicians : Revolver through the 
> Anthology"~1
>   final query:  all_search:"the beatl as musician revolv through the 
> antholog"~1
> lucene QueryParser with proximity set to 0:    result!
>   URL:   q=all_search:"The Beatles as musicians : Revolver through the 
> Anthology"
>   final query:  all_search:"the beatl as musician revolv through the antholog"
>   6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through 
> the antholog" in 1064395), product of:
>      <snip>
>       48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>      <snip>
> dismax QueryParser with qs=1:  no match
>       ps=0
>   URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=1&ps=0
>   final query:   +(all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
> antholog")~0.01
>       ps=1
>   URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=1&ps=1
>   final query:   +(all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01
> dismax QueryParser with qs=0:    result!
>      ps=0
>   URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=0&ps=0
>   final query:  +(all_search:"the beatl as musician revolv through the 
> antholog")~0.01 (all_search:"the beatl as musician revolv through the 
> antholog")~0.01
>       ps=1
>   URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=0&ps=1
>   final query:  +(all_search:"the beatl as musician revolv through the 
> antholog")~0.01 (all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01
>   8.564867 = (MATCH) sum of:
>     4.2824335 = (MATCH) weight(all_search:"the beatl as musician revolv 
> through the antholog" in 1064395), product of:
>         <snip>
>         48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>         <snip>
> Solr 1.4:    it works regardless of slop settings
> ----
> lucene QueryParser with any proximity value:    result!
>       ~0
>   URL:   q=all_search:"The Beatles as musicians : Revolver through the 
> Anthology"
>   final query:  all_search:"the beatl as musician revolv through the antholog"
>       ~1
>   URL: q=all_search:"The Beatles as musicians : Revolver through the 
> Anthology"~1
>   final query:  all_search:"the beatl as musician revolv through the 
> antholog"~1
>   5.2672544 = fieldWeight(all_search:"the beatl as musician revolv through 
> the antholog" in 3469163), product of:
>      <snip>
>     48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 
> musician=11992 revolv=822 through=88522 the=3549637 antholog=11246)
>      <snip>
> dismax QueryParser with any qs:    result!
>       qs=0, ps=0
>    URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=0&ps=0
>    final query: +(all_search:"the beatl as musician revolv through the 
> antholog")~0.01 (all_search:"the beatl as musician revolv through the 
> antholog")~0.01
>       qs=0, ps=1
>    URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=0&ps=1
>    final query: +(all_search:"the beatl as musician revolv through the 
> antholog")~0.01 (all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01
> dismax QueryParser with qs=0:    result!
>       qs=1, ps=0
>    URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=1&ps=0
>    final query: +(all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
> antholog")~0.01
>       qs=1, ps=1
>    URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
> through the Anthology"&qs=1&ps=1
>    final query: +(all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
> antholog"~1)~0.01
>   7.4490223 = (MATCH) sum of:
>   3.7245111 = weight(all_search:"the beatl as musician revolv through the 
> antholog"~1 in 3469163), product of:
>         <snip>
>       48.157753 = idf(all_search: the=3549637 beatl=392 as=751093 
> musician=11992 revolv=822 through=88522 the=3549637 antholog=11246)
>         <snip>
> More information:
> schema.xml:
>   <field name="all_search" type="text" indexed="true" stored="false" />
> solr 3.5:
>       <fieldtype name="text" class="solr.TextField" 
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="solr.ICUFoldingFilterFactory"/>  
>         <filter class="solr.WordDelimiterFilterFactory"
>           splitOnCaseChange="1" generateWordParts="1" catenateWords="1"
>           splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1"
>           catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" />
>         <filter class="solr.EnglishPorterFilterFactory" 
> protected="protwords.txt" />
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>       </analyzer>
>     </fieldtype>
> solr1.4:
> <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="schema.UnicodeNormalizationFilterFactory" 
> version="icu4j" composed="false" remove_diacritics="true" 
> remove_modifiers="true" fold="true" />
>         <filter class="solr.WordDelimiterFilterFactory" 
>           splitOnCaseChange="1" generateWordParts="1" catenateWords="1" 
>           splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1" 
>           catenateAll="0" preserveOriginal="0" stemEnglishPossessive="1" />
>         <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.EnglishPorterFilterFactory" 
> protected="protwords.txt" />
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>       </analyzer>
>     </fieldtype>
> And the analysis page shows the same results for Solr 3.5 and 1.4
> Solr 3.5:
> position      1       2       3       4       5       6       7       8
> term text     the     beatl   as      musician        revolv  through the     
> antholog
> keyword       false   false   false   false   false   false   false   false
> startOffset   0       4       12      15      27      36      44      48
> endOffset     3       11      14      24      35      43      47      57
> type  word    word    word    word    word    word    word    word
> Solr 1.4:
> term position         1       2       3       4       5       6       7       
> 8
> term text     the     beatl   as      musician        revolv  through the     
> antholog
> term type     word    word    word    word    word    word    word    word
> source start,end      0,3     4,11    12,14   15,24   27,35   36,43   44,47   
> 48,57
> For debug purposes, we can consider the Solr document as:
> <doc>
>   <str name="all_search">The Beatles as musicians : Revolver through the 
> Anthology</str>
> </doc>
> I can't attached the full SolrDoc as all_search is indexed, but not stored, 
> and I use SolrJ to write to the index from java objects ... plus our objects 
> have a zillion fields (I work in a library with very rich metadata and very 
> exacting solr fields).  I have attached the Solr 3.5 schema and solrconfig, 
> but they are big and ugly for the same reasons.
> For more details, see the erroneously titled email thread "result present in 
> Solr 1.4 but missing in Solr 3.5, dismax only"  started on 2012-02-22 on 
> solr-u...@lucene.apache.org.
> - Naomi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to