Thank you for the reply. I am certainly open to different ways of organizing / indexing our documents. However, the example I provided was simplified for the sake of the discussion. In truth, what I was calling a category may be an arbitrary set of movie ids (determined by a previous query). This precludes 'burning in' the associations as independently indexed fields.
I'd like to take a whack at the scorer approach. I've read through most of the Lucene web site and have reviewed the source code quite a bit lately. However, I admit I am still a little lost in how lucene works under the covers. Are there any design documents available to give me a head start? Is the Lucene in Action book the only source of information? Does it discuss how Lucene works under the covers? Thanks! -Russell -----Original Message----- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, July 31, 2006 4:02 AM To: Lucene Users Subject: Re: Scoring a document (count?) it would certainly be possible to get a score that was a simple count of the number of matching clauses of a boolean query -- probably just with a modified Similarity (no coord, 1/0 tf, no idf, no norms) but you *might* need a slightly modified TermScorer to do that. In general though, i think you are solving your problem the wrong way ... don't just put the movie Ids in the movie-star docs ... also have one indexed/stored field per category of movie (ie: "horror" would be an indexed field) that would only be set on actors which have appeared in a movie of that type -- the value of the field would be the number of movies they have appeared in of that type. now you do your main query, with a simple filter on the "horror" field to ensure it has a value and you've got the stored value of the "horror" field to tell you how many movies they've been in. : Date: Thu, 27 Jul 2006 12:02:46 -0400 : From: Russell M. Allen <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Scoring a document (count?) : : I am curious about the potential use of document scoring as a means to : extract additional data from an index. Specifically, I would like the : score to be a count of how many times a particular field matched a set : of terms. : : For example, I am indexing movie-stars (Each document is a movie-star). : A movie-star has a number of fields, such as name, movies they have been : in, etc. I want to produce an 'index' of stars by name and show how : many movies, which match a filter, that they have appeared in. : : In natural language my query might be: : "List all stars who have appeared in a 'horror' movie, where : last name starts with A, and tell me how many horror movies they were : in." : : My search will look something like this: : "+lastName:A* +movie:(1 7 21 58 92)" //where movie is a : previously computed list of 'horror' movie ids : : If my index contained the following documents: : doc1 = lastName:Anna movie:{3 10} : doc2 = lastName:Aba movie:{1 10 12} : doc3 = lastName:Addd movie:{3 21 55 92} : doc4 = lastName:Baaa movie:{7 56} : : I would like to get back: : doc2, score of 1 //score of 1 because only movie 1 matched : doc3, score of 2 //score of 2 because movies 21 and 92 matched : : : : Currently, we perform an initial query against our Star index to : retrieve a list of stars. Then we perform N queries against a separate : movie index to count the number of movies that match our sub filter : 'horror'. This is obviously very inefficient, and as I've shown above, : the information (count) is available during the primary query. : : Thoughts? : : : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]