I am not an expert but I think you can solve problem 1 by overriding the
coord function in the similarity class:
1. coord(q,d) is a score factor based on how many of the query terms
are found in the specified document. Typically, a document that contains
more of the query's terms will receive a higher score than another
document with fewer query terms. This is a search time factor computed in
coord(q,d) by the Similarity in effect at search time.
the default similarity class defines coord as thus:
coord
public float coord(int overlap,
int maxOverlap)
Implemented as overlap / maxOverlap.
Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]
Tobias Hill <[EMAIL PROTECTED]> wrote on 10/31/2007 09:51:12 AM:
> My documents all hava a field with variables number of terms
> (but rather few):
> Doc1.field = "foo bar gro"
> Doc2.field = "foo bar gro mot slu"
> Now I would like to search using the terms "foo bar gro"
>
> Problem 1:
> I like to express that at least any two of the three terms
> must match. Do I have to construct this clause myself - i.e.
> "(foo & bar) | (foo & gro) | (bar & gro)", or is there some
> clever way to do this?
>
> Problem 2:
> I like to express that if the doc.field has too many terms
> that wasn't matched it should not be included at all in the
> result. In the example above Doc2 might have too many
> terms that was not matched to be included in the result.
> Is this kind of query possible, and how?
>
> The general case:
> I want to find those docs that has X% of the search terms
> matched and that the acctual match covers at least Y% of
> the available terms on the document.
>
>
> I am very thankful for any feedback on this.
> Tobias
>
>
>
>