I see, well if you say the norm isn't a problem for my case, I will just disable the coord factor by initializing BooleanQuery(true); and I should be done.
If this is not correct, please anybody let me know. On 28 March 2011 11:44, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > As you seem to want to do very specific things, it might still be > interesting to provide a modified Similarity (by subclassing > DefaultSimilaity). You could then e.g. return also 1.0 to disable the > queryNorm() which may also be a problem (but it isn't for your queries). > Theoretically, you can change the Similarity to only have the cosine > similarity left over - if you only want to use that one. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > Sent: Monday, March 28, 2011 11:39 AM > > To: java-user@lucene.apache.org > > Subject: Re: comparing lucene scores across queries > > > > ok thanks, I will pass well I dunno how to verify it. Even if I try then > I > get some > > scores, but I dunno if comparing them is reliable. > > > > > > On 28 March 2011 11:36, Uwe Schindler <u...@thetaphi.de> wrote: > > > > > Hi, > > > > > > You don't need to extend BooleanQuery, you can just pass "true" in its > > > ctor, > > > see: http://s.apache.org/QvK > > > Of course you can also subclass DefaultSimilarity and return 1 as > > > coord, but that is more work than passing true to a ctor. > > > > > > For your type of queries, disabling coord should be enough, but I am > > > not 100% sure! Why not simply try it out? > > > > > > Uwe > > > > > > ----- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > > > > -----Original Message----- > > > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > > > Sent: Monday, March 28, 2011 10:49 AM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: comparing lucene scores across queries > > > > > > > > One more thing, instead of extending the BooleanQuery class to > > > > remove the coord factor, can I also extend the Similarity class to do > it ? > > > > > > > > Still the other question is open: just to be sure, if I disable the > > > > coord > > > factor I > > > > can finally compare my BooleanQuery results ? > > > > > > > > thanks > > > > > > > > > > > > > > > > > > > > > > > > On 28 March 2011 10:11, Uwe Schindler <u...@thetaphi.de> wrote: > > > > > > > > > >> Hi Patrick, > > > > >> > > > > >> You can disable the coord factor in the constructor of > BooleanQuery. > > > > >> > > > > >> Uwe > > > > >> > > > > >> ----- > > > > >> Uwe Schindler > > > > >> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > > >> eMail: u...@thetaphi.de > > > > >> > > > > >> > > > > >> > -----Original Message----- > > > > >> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > > > >> > Sent: Monday, March 28, 2011 10:09 AM > > > > >> > To: java-user@lucene.apache.org > > > > >> > Subject: Re: comparing lucene scores across queries > > > > >> > > > > > >> > Hi, thanks for reply. > > > > >> > > > > > >> > Yeah, I've read the Similarity class documentation several > times, > > > > >> > but I > > > > >> need > > > > >> > some tip. > > > > >> > > > > > >> > My queries are BooleanQueries but they always have the same > > > > >> > structure (the same structure of the docs, they are actually > docs > > > > >> > from > > > > >> collection): > > > > >> 3 > > > > >> > fields. > > > > >> > > > > > >> > What if I simplify the similarity scores, by removing coord > factor > > > > >> > and > > > > >> just > > > > >> > leaving the cosine similarity which is comparable ? > > > > >> > > > > > >> > I want to underline the fact that my boolean queries are just a > > > > >> combination > > > > >> > of "field:term" items, and I always have the same 3 fields with > > > > >> different > > > > >> > terms obviously. > > > > >> > > > > > >> > Thanks > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > On 28 March 2011 10:03, Uwe Schindler <u...@thetaphi.de> wrote: > > > > >> > > > > > >> > > No, scores are in general not comparable between different > > > queries. > > > > >> > > The problem lies in many things: > > > > >> > > - Each query has a norm factor that makes it more compareable > if > > > > >> > > they are sub clauses of a BooleanQuery. But you are right, > this > > > > >> > > norm factor should be the same. > > > > >> > > - Some queries like FuzzyQuery rely on the terms in index and > > > > >> > > those matches the query > > > > >> > > - Inside Boolean queries, there is also a coord-factor > involved > > > > >> > > > > > > >> > > If you are always using the same simple type of query (e.g. > > > > >> > > simple TermQuery, only with different term) on the same index, > > > > >> > > you can compare the scores. As soon as you are using complex > > > > >> > > queries (e.g several terms compared in a BooleanQuery as > > > > >> > > QueryParser produces), the scores are no longer comparable. > > > > >> > > > > > > >> > > You can read more on all factors that are included in scoring: > > > > >> > > > > > > >> > > > > > > >> > > > > > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/sear > > > > >> > ch/ > > > > >> > > Simila > > > > >> > > rity.html > > > > >> > > > > > > >> > > ----- > > > > >> > > Uwe Schindler > > > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > > >> > > eMail: u...@thetaphi.de > > > > >> > > > > > > >> > > > > > > >> > > > -----Original Message----- > > > > >> > > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > > > >> > > > Sent: Monday, March 28, 2011 9:44 AM > > > > >> > > > To: java-user@lucene.apache.org > > > > >> > > > Subject: comparing lucene scores across queries > > > > >> > > > > > > > >> > > > Hi, > > > > >> > > > > > > > >> > > > sorry I've already asked few days ago, but I got no reply > and > I > > > > >> > > > really > > > > >> > > need > > > > >> > > > some help on this.. > > > > >> > > > > > > > >> > > > I'm running several queries against a doc collection. The > > > queries > > > > >> > > > are documents of the collection itself, I need to measure > how > > > > >> > > > similar is each document to the rest of the collection. > > > > >> > > > > > > > >> > > > Now, Lucene returns me a score per query, but I've been told > > > such > > > > >> > > > score > > > > >> > > is > > > > >> > > > not comparable across queries. Is this correct ? > > > > >> > > > > > > > >> > > > For example, arem't these scores comparable ? > > > > >> > > > query1, score:8.324234 > > > > >> > > > query2, score:3.324238 > > > > >> > > > > > > > >> > > > If so, why not ? Isn't the cosine similarity between the > query > > > > >> > > > vector and collection docs vectors ? I really need a > comparable > > > > >> measure. > > > > >> > > > > > > > >> > > > thanks > > > > >> > > > > > > >> > > > > > > >> > > > > > --------------------------------------------------------------------- > > > > >> > > To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org > > > > >> > > For additional commands, e-mail: java-user- > > h...@lucene.apache.org > > > > >> > > > > > > >> > > > > > > >> > > > > >> > > > > >> > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >> > > > > >> > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >