Thanks Richard, I'll check it out. -Jay
On 6/16/05, Richard Krenek <[EMAIL PROTECTED]> wrote: > To add to this option, you may want to use this patch > http://issues.apache.org/bugzilla/show_bug.cgi?id=27743 > This way instead of pulling the entire document back each time, just > pull back your host field. Then do your check and only pull pack the > rest of the document if you need to. This will help with speed if you > are going through a lot of documents and each document is big. > > On 6/15/05, Jay Hill <[EMAIL PROTECTED]> wrote: > > I like this approach. This may be what I'm looking for. > > > > Thanks JP! > > -Jay > > > > On 6/15/05, Robichaud, Jean-Philippe > > <[EMAIL PROTECTED]> wrote: > > > > > > It may be simpler and more effective to use the Hits object and keep the > > > number of time each host was actually "returned" to the user and skip it > > > if > > > the limit has been reach. This way, if your users just look at the 10-20 > > > highest hits, you will save you a lot of processing time, especially if > > > your > > > index is huge... > > > > > > Here is some pseudo code stripped from a class I once wrote > > > > > > > > > Hits hits = iSearcher.search(myQuery); > > > IntHash hostFreqCount = new IntHash(); > > > > > > int i=0; > > > int j=0; > > > > > > while(i < hist.length) { > > > j=0; > > > for(; (i<hits.length && j < 10); i++,j++) { > > > > > > Document doc = iSearcher.doc(hits.doc(i)); > > > String host_id = doc.get("host_id"); > > > hostFreqCount.inc(host_id); > > > > > > if(hostFreqCount.get(host_id) > 3) continue; > > > > > > /// show the hit to the use... > > > > > > } > > > } > > > > > > > > > Hope it helped ! > > > > > > Jp > > > > > > > > > -----Original Message----- > > > From: Jay Hill [mailto:[EMAIL PROTECTED] > > > Sent: Wednesday, June 15, 2005 2:01 PM > > > To: java-user@lucene.apache.org > > > Subject: Re: Need a way to set a result limit on a particular field > > > > > > Thanks Tony and Erik for the replies. The trick is we don't know the > > > hosts that will be returned in advance, we just don't want more than 3 > > > from any one host. It's not unlike searching on Google where you might > > > see a link that says "More results from foo.com". We essentially want > > > to discard any results > 3 for any one host. In some of our searches > > > we might get high scores on 20 or 30 documents, but we don't want to > > > show page after page from the same host, we'd rather limit it to 3 > > > from each for more diversity. > > > > > > I may have to use a brute force approach using HitCollector as Tony > > > suggests. I was hoping to avoid the HitCollector, but there may be no > > > other way right now. > > > > > > Many thanks, > > > -Jay > > > > > > > > > On 6/14/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > > > > > > On Jun 14, 2005, at 7:23 PM, Jay Hill wrote: > > > > > I have a need to limit my Hits returned based on one of the indexed > > > > > fields. This is a web application and we want to limit the number of > > > > > hits from any one host. We have a field named "host_id" and I'd like > > > > > to be able to limit my results to no more than three results for any > > > > > one host_id. > > > > > > > > I may not be fully understanding your question, but I'll go with my > > > > assumptions... wrap the users query into a BooleanQuery as a required > > > > clause and then add another clause with a TermQuery for the specific > > > > host_id. Then simply constrain the number of Hits shown to the first > > > > 3. Does that do what you're after? > > > > > > > > Erik > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]