RE: Scoring a document (count?)

Russell M. Allen Mon, 31 Jul 2006 06:50:56 -0700

Thank you for the reply.

I am certainly open to different ways of organizing / indexing our
documents.  However, the example I provided was simplified for the sake
of the discussion.  In truth, what I was calling a category may be an
arbitrary set of movie ids (determined by a previous query).  This
precludes 'burning in' the associations as independently indexed fields.


I'd like to take a whack at the scorer approach.  I've read through most
of the Lucene web site and have reviewed the source code quite a bit
lately.  However, I admit I am still a little lost in how lucene works
under the covers.  Are there any design documents available to give me a
head start?  Is the Lucene in Action book the only source of
information?  Does it discuss how Lucene works under the covers?

Thanks!
-Russell

-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 31, 2006 4:02 AM
To: Lucene Users
Subject: Re: Scoring a document (count?)


it would certainly be possible to get a score that was a simple count of
the number of matching clauses of a boolean query -- probably just with
a modified Similarity (no coord, 1/0 tf, no idf, no norms) but you
*might* need a slightly modified TermScorer to do that.

In general though, i think you are solving your problem the wrong way
...
don't just put the movie Ids in the movie-star docs ... also have one
indexed/stored field per category of movie (ie: "horror" would be an
indexed
field) that would only be set on actors which have appeared in a movie
of that type -- the value of the field would be the number of movies
they have appeared in of that type.

now you do your main query, with a simple filter on the "horror" field
to ensure it has a value and you've got the stored value of the "horror"
field to tell you how many movies they've been in.




: Date: Thu, 27 Jul 2006 12:02:46 -0400
: From: Russell M. Allen <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Scoring a document (count?)
:
: I am curious about the potential use of document scoring as a means to
: extract additional data from an index.  Specifically, I would like the
: score to be a count of how many times a particular field matched a set
: of terms.
:
: For example, I am indexing movie-stars (Each document is a
movie-star).
: A movie-star has a number of fields, such as name, movies they have
been
: in, etc.  I want to produce an 'index' of stars by name and show how
: many movies, which match a filter, that they have appeared in.
:
: In natural language my query might be:
:       "List all stars who have appeared in a 'horror' movie, where
: last name starts with A, and tell me how many horror movies they were
: in."
:
: My search will look something like this:
:       "+lastName:A* +movie:(1 7 21 58 92)"    //where movie is a
: previously computed list of 'horror' movie ids
:
: If my index contained the following documents:
:     doc1 = lastName:Anna   movie:{3 10}
:     doc2 = lastName:Aba    movie:{1 10 12}
:     doc3 = lastName:Addd   movie:{3 21 55 92}
:     doc4 = lastName:Baaa   movie:{7 56}
:
: I would like to get back:
:     doc2, score of 1  //score of 1 because only movie 1 matched
:     doc3, score of 2  //score of 2 because movies 21 and 92 matched
:
:
:
: Currently, we perform an initial query against our Star index to
: retrieve a list of stars.  Then we perform N queries against a
separate
: movie index to count the number of movies that match our sub filter
: 'horror'.  This is obviously very inefficient, and as I've shown
above,
: the information (count) is available during the primary query.
:
: Thoughts?
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Scoring a document (count?)

Reply via email to