To add to this option, you may want to use this patch
http://issues.apache.org/bugzilla/show_bug.cgi?id=27743
This way instead of pulling the entire document back each time, just
pull back your host field. Then do your check and only pull pack the
rest of the document if you need to. This will help with speed if you
are going through a lot of documents and each document is big.

On 6/15/05, Jay Hill <[EMAIL PROTECTED]> wrote:
> I like this approach. This may be what I'm looking for.
> 
> Thanks JP!
> -Jay
> 
> On 6/15/05, Robichaud, Jean-Philippe
> <[EMAIL PROTECTED]> wrote:
> >
> > It may be simpler and more effective to use the Hits object and keep the
> > number of time each host was actually "returned" to the user and skip it if
> > the limit has been reach.  This way, if your users just look at the 10-20
> > highest hits, you will save you a lot of processing time, especially if your
> > index is huge...
> >
> > Here is some pseudo code stripped from a class I once wrote
> >
> >
> > Hits hits = iSearcher.search(myQuery);
> > IntHash hostFreqCount = new IntHash();
> >
> > int i=0;
> > int j=0;
> >
> > while(i < hist.length) {
> >  j=0;
> >  for(; (i<hits.length && j < 10); i++,j++) {
> >
> >   Document doc = iSearcher.doc(hits.doc(i));
> >   String host_id = doc.get("host_id");
> >   hostFreqCount.inc(host_id);
> >
> >    if(hostFreqCount.get(host_id) > 3) continue;
> >
> >   ///  show the hit to the use...
> >
> >  }
> > }
> >
> >
> > Hope it helped !
> >
> > Jp
> >
> >
> > -----Original Message-----
> > From: Jay Hill [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, June 15, 2005 2:01 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Need a way to set a result limit on a particular field
> >
> > Thanks Tony and Erik for the replies. The trick is we don't know the
> > hosts that will be returned in advance, we just don't want more than 3
> > from any one host. It's not unlike searching on Google where you might
> > see a link that says "More results from foo.com". We essentially want
> > to discard any results > 3 for any one host. In some of our searches
> > we might get high scores on 20 or 30 documents, but we don't want to
> > show page after page from the same host, we'd rather limit it to 3
> > from each for more diversity.
> >
> > I may have to use a brute force approach using HitCollector as Tony
> > suggests. I was hoping to avoid the HitCollector, but there may be no
> > other way right now.
> >
> > Many thanks,
> > -Jay
> >
> >
> > On 6/14/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> > >
> > > On Jun 14, 2005, at 7:23 PM, Jay Hill wrote:
> > > > I have a need to limit my Hits returned based on one of the indexed
> > > > fields. This is a web application and we want to limit the number of
> > > > hits from any one host. We have a field named "host_id" and I'd like
> > > > to be able to limit my results to no more than three results for any
> > > > one host_id.
> > >
> > > I may not be fully understanding your question, but I'll go with my
> > > assumptions... wrap the users query into a BooleanQuery as a required
> > > clause and then add another clause with a TermQuery for the specific
> > > host_id.  Then simply constrain the number of Hits shown to the first
> > > 3.  Does that do what you're after?
> > >
> > >      Erik
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to