Hi Radha,

On 4/16/2009 at 8:35 AM, Radhalakshmi Sredharan wrote:
> I have a question related to SpanNearQuery.
> 
> I need a hit even if there are 2/3 terms found with the span being
> applied for those 2 terms.
> 
> Is there any custom implementation in place for this? I checked
> SrndQuery but that also doesn't work.
> 
> This is my workaround currently:
> 
> 1)      For a list of terms ( ab,bc, cd,ef) , make a set like ( ab,bc)
> , ( bc,cd) ( ab,cd) (bc,ef) ( ab,bc,cd) ( ab,bc,cd,ef)..... and so on.
> 
> 2)      Create a spanNearQuery for  each of these terms
> 
> 3)      Add it to the booleanQuery with a  SHOULD clause.
> 
> However this approach gives me puzzling scores
>  eg If my document has  only ( ab,bc,cd) the penalty for the missing ef
> is very high and my score comes down quite a bit.

Do you know about the scoring documentation on the Lucene site: 
<http://lucene.apache.org/java/2_4_1/scoring.html> ?  In particular, see the 
link from there to the Searcher.explain() javadocs - this functionality will 
help you understand what's happening with your queries.

I suspect that the penalty is due to fewer sub-queries matching; that is, not 
only does (ab,bc,cd,ef) fail to match, but (ab,bc,ef), (ab,cd,ef), (ab,ef) etc. 
also fail to match, and since all of these contribute to the final score, you 
will see a large drop off if you don't get a full match.

Instead of putting all of the alternatives together in a single large 
disjunction, if you package them such that the shorter alternatives don't 
influence the final score when larger ones match, you may get something more 
like what you want.  I think DisjunctionMaxQuery 
<http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DisjunctionMaxQuery.html>,
 along with judicious boosting, will do the trick, e.g.:

DMQ((ab,bc,cd,ef)^100,
    ((ab,bc,cd)^10 (ab,bc,ef)^10 (ab,cd,ef)^10 ...),
    ((ab,bc) (ab,cd) (ab,ef) ...))

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to