Re: Scoring for all the documents in the index relative to a query

Grant Ingersoll Mon, 19 Nov 2007 10:39:09 -0800

Lucene only scores those documents that have at least one match term,it doesn't implement a pure vector space model whereby all documentsare scored (it uses a combination of the Boolean Model and VSM).Thus, I am not sure you can do a pure comparison. I suppose you couldsimulating the relevance by using TermVectors and looping over alldocuments, but I think one could argue this isn't exactly what Lucenedoes, so it isn't comparable.

http://lucene.apache.org/java/docs/scoring.html might help inunderstanding this stuff.


HTH,
Grant

On Nov 19, 2007, at 1:25 PM, HAIDUC SONIA wrote:

I am trying to order all the documents in the index according totheir similarity to a given query. I am interested in having acomplete list of *all* the documents in the index with their score.From what I understood by reading some documentation, Luceneinternally assigns scores to all the documents in the indexaccording to their similarity to the query, but when returning thehits, all the scores that are less than 0 are rounded to 0 and onlythe documents with the score > 0 are returned as hits. But what Iwould like to get is the list before this intermediate processing,so the list of all the documents with their raw score. I am tryingto compare Lucene with LSI and for the comparison I want to do, Ineed the entire list of documents. Is there a way that I can getthat with Lucene?I hope I explained it clearly this time. If you need more detailslet me know.
Thank you,
Sonia

----- Original Message ----
From: Erick Erickson <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, November 19, 2007 11:55:00 AM
Subject: Re: Scoring for all the documents in the index relative toa query
Could you explain a bit more what problem you're trying to solve?
The reason I ask is that your question doesn't make sense to me,
since I have no idea what you expect by the term "negative score".

My simplistic view has been that all the docs returned via Hits
or HitCollector have scores > 0, and all the rest have scores of 0,
and this view is supported by the explanation of
HitCollector.collect

" Called once for every non-zero scoring document, with the
document number and its score."

You might also get value from this page:
http://lucene.apache.org/java/docs/scoring.html#Scoring

Best
Erick

On Nov 19, 2007 11:05 AM, HAIDUC SONIA <[EMAIL PROTECTED]> wrote:
Hi everyone,

I am trying to obtain the score for each document in the index
relative to
a given query. For example, if I have the query "search file", I am
trying
to get the list of all documents in the index and their scores
relative to
the given query. I tried first using Hits, which gave me the
normalized
score. I thought that I don't see the whole list of documents and
their
scores because of the normalization, so I tried using HitsCollector.
But
even after using HitsCollector, I get the same number of matching
documents,
so the normalization didn't exclude documents because of negative
scoring.
Does Lucene actually compute the score for all the documents in the
index or
just for matching documents? I really need to have the scores for all
the
documents in the index relative to the query (even if negative), not
just
the ones that contain the query terms(this is what Lucene considers
"matching documents", right?). Is this possible using Lucene?

I really appreciate your time and effort!
Thanks,
Sonia
____________________________________________________________________________________
Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs
____________________________________________________________________________________
Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Scoring for all the documents in the index relative to a query

Reply via email to