I like this approach. This may be what I'm looking for. Thanks JP! -Jay
On 6/15/05, Robichaud, Jean-Philippe <[EMAIL PROTECTED]> wrote: > > It may be simpler and more effective to use the Hits object and keep the > number of time each host was actually "returned" to the user and skip it if > the limit has been reach. This way, if your users just look at the 10-20 > highest hits, you will save you a lot of processing time, especially if your > index is huge... > > Here is some pseudo code stripped from a class I once wrote > > > Hits hits = iSearcher.search(myQuery); > IntHash hostFreqCount = new IntHash(); > > int i=0; > int j=0; > > while(i < hist.length) { > j=0; > for(; (i<hits.length && j < 10); i++,j++) { > > Document doc = iSearcher.doc(hits.doc(i)); > String host_id = doc.get("host_id"); > hostFreqCount.inc(host_id); > > if(hostFreqCount.get(host_id) > 3) continue; > > /// show the hit to the use... > > } > } > > > Hope it helped ! > > Jp > > > -----Original Message----- > From: Jay Hill [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 15, 2005 2:01 PM > To: java-user@lucene.apache.org > Subject: Re: Need a way to set a result limit on a particular field > > Thanks Tony and Erik for the replies. The trick is we don't know the > hosts that will be returned in advance, we just don't want more than 3 > from any one host. It's not unlike searching on Google where you might > see a link that says "More results from foo.com". We essentially want > to discard any results > 3 for any one host. In some of our searches > we might get high scores on 20 or 30 documents, but we don't want to > show page after page from the same host, we'd rather limit it to 3 > from each for more diversity. > > I may have to use a brute force approach using HitCollector as Tony > suggests. I was hoping to avoid the HitCollector, but there may be no > other way right now. > > Many thanks, > -Jay > > > On 6/14/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > > On Jun 14, 2005, at 7:23 PM, Jay Hill wrote: > > > I have a need to limit my Hits returned based on one of the indexed > > > fields. This is a web application and we want to limit the number of > > > hits from any one host. We have a field named "host_id" and I'd like > > > to be able to limit my results to no more than three results for any > > > one host_id. > > > > I may not be fully understanding your question, but I'll go with my > > assumptions... wrap the users query into a BooleanQuery as a required > > clause and then add another clause with a TermQuery for the specific > > host_id. Then simply constrain the number of Hits shown to the first > > 3. Does that do what you're after? > > > > Erik > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]