This is generally refered to as "faceted" searching ... you might find descriptions of how to generate the "counts" per facet by searching for that kwyword in the archive .. it also comes up now and then under the subject of "category counts"
There is however a seperate issue that it sounds like you need help with first: picking the facets, ie deciding that for one users search it makes sense to use decade ranges, but for another persons search it makes sense to use individual years. This is really a problem of personal prefrence, and not something that can be solved purely with technology. Each field might need seperate rules to determine what facets to provide (ie: dates might make sense to narrow by decade untill only one decade is matched by the search, then narrowed by years untill only one year is valid, then by month etc... but meanwhile "author name" faceting might make sense to use initial first, and then switch to two letter name prefixes, etc...) one thing to keep in mind when picking facets for dates or numeric values, is that even though you might think you are helping your users by giving them facets with an "even distribution" of documents, you may actual confuse them if they are trying to get a sense of how the data is distributed ... like showing someone a line plot graph where the axises change half way down hte line. if i give you these facets/counts for your search results... 1901-1910 23 1911-1920 25 1921-1930 21 1931-1940 26 1941-1950 19 1951-1960 22 1961-1970 23 1971-1990 20 1991-2000 22 2000-2006 7 ...you might not notice that one of those ranges is 20 years. : Date: Mon, 24 Jul 2006 09:17:55 +0200 : From: Martin Braun <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED] : To: java-user@lucene.apache.org : Subject: drill-down heuristics WAS: Where to find drill-down examples : (source code) : : hi miles, : : thanks for the response. : I think I didn't explain my Problem good enough. : : The harder problem for me is how to get the proposals for the : refinement? I have a date-range of 16xx to now, for about 4 bn. docs. : So the number of found documents could be quite large. But the : distribution of the dates could be very different form one query to another. : I hope there is a better way than to collect all dates with HitIterator, : and do statistics on the data? : : Is there something that could be done while indexing? : What would be a high-performance heuristic? : : The same problem with other categories like the author: how to find good : proposals for a given result set? : : >> I want to have something like: : >> : >> Refine by Date: : >> * 1990-2000 (30 Docs) : >> * 2001-2003 (200 Docs) : >> * 2004-2006 (10 Docs) : >> : >> But not only DateRanges but also for other Categories. : >> : >> Does anybody knows where to find some Source Code, to get an Idea how to : >> implement this? : >> I think that's a useful property for a search engine, so are there any : >> contributions for Lucene for that? : >> : > : > If you want to do a refined search I'd put the original query in a : > QueryFilter, which filters on the new search. : > : > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/QueryFilter.html : > Alternatively, since you appear to only want to refine on dates and : > categories, you might want to put them in filters so they don't affect : > the score, and leave the query as is. In which case you can use a : > RangeQuery for the dates, and a wrap a TermQuery in a QueryFilter to : > handle the categories. : > : > If you need multiple filters you can use the ChainedFilter class. : > : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]