Hi, thanks for reply.

Yeah, I've read the Similarity class documentation several times, but I need
some tip.

My queries are BooleanQueries but they always have the same structure (the
same structure of the docs, they are actually docs from collection): 3
fields.

What if I simplify the similarity scores, by removing coord factor and just
leaving the cosine similarity which is comparable ?

I want to underline the fact that my boolean queries are just a combination
of "field:term" items, and I always have the same 3 fields with different
terms obviously.

Thanks




On 28 March 2011 10:03, Uwe Schindler <u...@thetaphi.de> wrote:

> No, scores are in general not comparable between different queries. The
> problem lies in many things:
> - Each query has a norm factor that makes it more compareable if they are
> sub clauses of a BooleanQuery. But you are right, this norm factor should
> be
> the same.
> - Some queries like FuzzyQuery rely on the terms in index and those matches
> the query
> - Inside Boolean queries, there is also a coord-factor involved
>
> If you are always using the same simple type of query (e.g. simple
> TermQuery, only with different term) on the same index, you can compare the
> scores. As soon as you are using complex queries (e.g several terms
> compared
> in a BooleanQuery as QueryParser produces), the scores are no longer
> comparable.
>
> You can read more on all factors that are included in scoring:
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Simila
> rity.html
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -----Original Message-----
> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> > Sent: Monday, March 28, 2011 9:44 AM
> > To: java-user@lucene.apache.org
> > Subject: comparing lucene scores across queries
> >
> > Hi,
> >
> > sorry I've already asked few days ago, but I got no reply and I really
> need
> > some help on this..
> >
> > I'm running several queries against a doc collection. The queries are
> > documents of the collection itself, I need to measure how similar is each
> > document to the rest of the collection.
> >
> > Now, Lucene returns me a score per query, but I've been told such score
> is
> > not comparable across queries. Is this correct ?
> >
> > For example, arem't these scores comparable ?
> > query1, score:8.324234
> > query2, score:3.324238
> >
> > If so, why not ? Isn't the cosine similarity between the query vector and
> > collection docs vectors ? I really need a comparable measure.
> >
> > thanks
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to