I am using lucene 2.9.3 (via Solr 1.4.1) on windows and am trying to
understand ShingleFilter. I wrote the following code and find that if
I provide more words than the actual phrase indexed in the field, then
the search on that field fails (no score found with debugQuery=true).

Here is an example to reproduce, with field names:
Id: 1
title_1: Nina Simone
title_2: I put a spell on you

Query (dismax) with:
- “Nina Simone I put”  <- Fails i.e. no score shown from title_1
search (using debugQuery)
- “Nina Simone” <- SUCCESS

But, when I used Solr’s Field Analysis with the ‘shingle’ field (given
below) and tried “Nina Simone I put”, it succeeds. It’s only during
the query that no score is provided. I also checked ‘parsedquery’ and
it shows disjunctionMaxQuery issuing the string “Nina_Simone Simone_I
I_put” to the title_1 field.

title_1 and title_2 fields are of type ‘shingle’, defined as:

   <fieldType name="shingle" class="solr.TextField"
positionIncrementGap="100" indexed="true" stored="true">
       <analyzer type="index">
           <tokenizer class="solr.StandardTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ShingleFilterFactory"
maxShingleSize="2" outputUnigrams="false"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.StandardTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ShingleFilterFactory"
maxShingleSize="2" outputUnigrams="false"/>
       </analyzer>
   </fieldType>

Note that I also have a catchall field which is text. I have qf set
to: 'id^2 catchall' and pf set to: 'title_1^1.5 title_2^1.2'

If I am missing something or doing something wrong please let me know.

-Ethan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to