Hi, thanks for reply.

Yeah, I've read the Similarity class documentation several times, but I need
some tip.

My queries are BooleanQueries but they always have the same structure (the
same structure of the docs, they are actually docs from collection): 3
fields.

What if I simplify the similarity scores, by removing coord factor and just
leaving the cosine similarity which is comparable ?

I want to underline the fact that my boolean queries are just a combination
of "field:term" items, and I always have the same 3 fields with different
terms obviously.

Thanks




On 28 March 2011 10:03, Uwe Schindler <[email protected]> wrote:

> No, scores are in general not comparable between different queries. The
> problem lies in many things:
> - Each query has a norm factor that makes it more compareable if they are
> sub clauses of a BooleanQuery. But you are right, this norm factor should
> be
> the same.
> - Some queries like FuzzyQuery rely on the terms in index and those matches
> the query
> - Inside Boolean queries, there is also a coord-factor involved
>
> If you are always using the same simple type of query (e.g. simple
> TermQuery, only with different term) on the same index, you can compare the
> scores. As soon as you are using complex queries (e.g several terms
> compared
> in a BooleanQuery as QueryParser produces), the scores are no longer
> comparable.
>
> You can read more on all factors that are included in scoring:
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Simila
> rity.html
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
> > -----Original Message-----
> > From: Patrick Diviacco [mailto:[email protected]]
> > Sent: Monday, March 28, 2011 9:44 AM
> > To: [email protected]
> > Subject: comparing lucene scores across queries
> >
> > Hi,
> >
> > sorry I've already asked few days ago, but I got no reply and I really
> need
> > some help on this..
> >
> > I'm running several queries against a doc collection. The queries are
> > documents of the collection itself, I need to measure how similar is each
> > document to the rest of the collection.
> >
> > Now, Lucene returns me a score per query, but I've been told such score
> is
> > not comparable across queries. Is this correct ?
> >
> > For example, arem't these scores comparable ?
> > query1, score:8.324234
> > query2, score:3.324238
> >
> > If so, why not ? Isn't the cosine similarity between the query vector and
> > collection docs vectors ? I really need a comparable measure.
> >
> > thanks
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to