Doron, thanks for the code offer. That would be great. I was able to get a partial implementation working myself, but I ran into some issues (most of which are rooted in a lack of understanding of Lucene internals on my part). I am sure I can learn a few things from your solution to this problem.
You may email me directly with the code, or if it's small enough, post it to the list for posterity. Thanks again! -Russell -----Original Message----- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, August 03, 2006 6:04 PM To: java-user@lucene.apache.org Subject: RE: Scoring a document (count?) Hi Russel, I am also interested in the internals of Lucene's ranking and how one can/should alter the scoring. For now I was just learning from existing code of Lucene scorers and Weights. Your question seemed interesting, so I in fact implemented a quick scorer that would return the raw tf as a score, as an exercise. It is not a product level implementation of course, but if you think this will help you (?) I can share the code. (Would have responded sooner, for my working computer went off for a few days with a fan error...:-). Regards, Doron "Russell M. Allen" <[EMAIL PROTECTED]> wrote on 31/07/2006 07:35:50: > Thank you for the reply Doran! You are exactly right about the sql > count(*). I need the equivalent of group by, and count(). > > We have considered a 'joined' index where we would have a document for > each permutation. We discarded it (possibly prematurely) based on the > rapid explosion in the number of documents. In our domain, we have > movies as the main document type, and 5 other satellite document types > with their own indexes: Star, Studio, Director, Series, and Category > (genre). With the exception of series, a movie has a many to many > relationship with the other indexes. So, with 60k movies, 20k stars, > 2k studios, ... The document count quickly shoots through the roof. > > Also, the majority of our searching is based on a single domain type, > such as movie. It is only a small handful of corner cases where we > want what amounts to a joined query. If we merged these indexes, we > would constantly have to 'roll up' the results into distinct instances > of a type. (The equivalent of an SQL 'group by') > > > I find the parallels between the expressiveness of Lucene and SQL > interesting. I'm glad to see you compared what I was looking for to > an sql count(*) as well. We have a handful of indexing issues that I > am attempting to solve/optimize, of which performing a count(*) is > only one. I also have the need to perform a JOIN across two indexes. > I have 'ideas' about how I might go about this, but for now we are > fortunate enough to have fairly static data and half of the join is > static. As a result we can cache a bitset filter for the results of > half the join and apply it to the other (dynamic) half of the join query. > > Anyway, I digress... > > I saw you second post regarding creating a scorer. I'd like to > continue down that path. My main issue now is simply understanding > how lucene works under the covers enough to write the TermQuery variant. > > Thanks for the help, > Russell. > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]