Re: 2/3 of terms matched + coverage filter

Donna L Gresh Wed, 31 Oct 2007 05:58:20 -0800

I am not an expert but I think you can solve problem 1 by overriding the 
coord function in the similarity class:


1.      coord(q,d) is a score factor based on how many of the query terms 
are found in the specified document. Typically, a document that contains 
more of the query's terms will receive a higher score than another 
document with fewer query terms. This is a search time factor computed in 
coord(q,d) by the Similarity in effect at search time. 

the default similarity class defines coord as thus:

coord
public float coord(int overlap,
                   int maxOverlap)
Implemented as overlap / maxOverlap. 



Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]


Tobias Hill <[EMAIL PROTECTED]> wrote on 10/31/2007 09:51:12 AM:

> My documents all hava a field with variables number of terms
> (but rather few):
> Doc1.field = "foo bar gro"
> Doc2.field = "foo bar gro mot slu"
> Now I would like to search using the terms "foo bar gro"
> 
> Problem 1:
> I like to express that at least any two of the three terms
> must match. Do I have to construct this clause myself - i.e.
> "(foo & bar) | (foo & gro) | (bar & gro)", or is there some
> clever way to do this?
> 
> Problem 2:
> I like to express that if the doc.field has too many terms
> that wasn't matched it should not be included at all in the
> result. In the example above Doc2 might have too many
> terms that was not matched to be included in the result.
> Is this kind of query possible, and how?
> 
> The general case:
> I want to find those docs that has X% of the search terms
> matched and that the acctual match covers at least Y% of
> the available terms on the document.
> 
> 
> I am very thankful for any feedback on this.
> Tobias
> 
> 
> 
>

Re: 2/3 of terms matched + coverage filter

Reply via email to