Re: Fine Tuning Lucene implementation

Askar Zaidi Wed, 25 Jul 2007 07:11:18 -0700

Hey Guys,

I need to know how I can use the HitCollector class ? I am using Hits and
looping over all the possible document hits (turns out its 92 times I am
looping; for 300 searches, its 300*92 !!). Can I avoid this using
HitCollector ? I can't seem to understand how its used.


thanks a lot,

Askar

On 7/25/07, Dmitry <[EMAIL PROTECTED]> wrote:
>
> Askar,
> why do you need to add +id:<idWeCareAbout>?
> thanks,
> dt,
> www.ejinz.com
> search engine news forms
> ----- Original Message -----
> From: "Askar Zaidi" <[EMAIL PROTECTED]>
> To: <[email protected]>; <[EMAIL PROTECTED]>
> Sent: Wednesday, July 25, 2007 12:39 AM
> Subject: Re: Fine Tuning Lucene implementation
>
>
> > Hey Hira ,
> >
> > Thanks so much for the reply. Much appreciate it.
> >
> > Quote:
> >
> > Would it be possible to just include a query clause?
> >   - i.e., instead of just contents:<userQuery>, also add
> > +id:<idWeCareAbout>
> >
> > How can I do that ?
> >
> > I see my query as :
> >
> > +contents:harvard +contents:business +contents:review
> >
> > where the search phrase was: harvard business review
> >
> > Now how can I add +id:<idWeCareAbout>  ??
> >
> > This would give me that one exact document I am looking for , for that
> id.
> > I
> > don't have to iterate through hits.
> >
> > thanks,
> >
> > Askar
> >
> >
> >
> > On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
> >>
> >> I'm no expert on this (so please accept the comments in that context)
> >> but 2 things seem weird to me:
> >>
> >> 1.  Iterating over each hit is an expensive proposition.  I've often
> >> seen people recommending a HitCollector.
> >>
> >> 2.  It seems that doBodySearch() is essentially saying, do this search
> >> and return the score pertinent to this ID (using an exhaustive loop).
> >> Would it be possible to just include a query clause?
> >>     - i.e., instead of just contents:<userQuery>, also add
> >> +id:<idWeCareAbout>
> >>
> >> In general though, I think your algorithm seems inefficient (if I
> >> understand it correctly):-- if I want to search for one term among 3 in
> >> a "collection" of 300 documents (as defined by some external
> attribute),
> >> I will wind up executing 300 x 3 searches, and for each search that is
> >> executed, I will iterate over every Hit, even if I've already found the
> >> one that I "care about".
> >>
> >> What would break if you:
> >> 1.  Included "creator" in the Lucene index (or, filtered out the Hits
> >> using a BitSet or something like it)
> >> 2.  Executed 1 search
> >> 3.  Collected the results of the first N Hits (where N is some
> >> reasonable limit, like 100 or 500)
> >>
> >> -h
> >>
> >>
> >> On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:
> >>
> >> > Sure.
> >> >
> >> >  public float doBodySearch(Searcher searcher,String query, int id){
> >> >
> >> >                  try{
> >> >                                 score = search(searcher, query,id);
> >> >                      }
> >> >                       catch(IOException io){}
> >> >                       catch(ParseException pe){}
> >> >
> >> >                       return score;
> >> >
> >> >                 }
> >> >
> >> >  private float search(Searcher searcher, String queryString, int id)
> >> > throws ParseException, IOException {
> >> >
> >> >         // Build a Query object
> >> >
> >> >         QueryParser queryParser = new QueryParser("contents", new
> >> > KeywordAnalyzer());
> >> >
> >> >         queryParser.setDefaultOperator(QueryParser.Operator.AND);
> >> >
> >> >         Query query = queryParser.parse(queryString);
> >> >
> >> >         // Search for the query
> >> >
> >> >         Hits hits = searcher.search(query);
> >> >         Document doc = null;
> >> >
> >> >         // Examine the Hits object to see if there were any matches
> >> >         int hitCount = hits.length();
> >> >
> >> >                 for(int i=0;i<hitCount;i++){
> >> >                 doc = hits.doc(i);
> >> >                 String str = doc.get("item");
> >> >                 int tmp = Integer.parseInt(str);
> >> >                 if(tmp==id)
> >> >                 score = hits.score(i);
> >> >                 }
> >> >
> >> >         return score;
> >> >     }
> >> >
> >> > I really need to optimize doBodySearch(...) as this takes the most
> >> > time.
> >> >
> >> > thanks guys,
> >> > Askar
> >> >
> >> >
> >> > On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
> >> >
> >> >         Could you show us the relevant source from doBodySearch()?
> >> >
> >> >         -h
> >> >
> >> >         On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
> >> >         > I ran some tests and it seems that the slowness is from
> >> >         Lucene calls when I
> >> >         > do "doBodySearch", if I remove that call, Lucene gives me
> >> >         results in 5
> >> >         > seconds. otherwise it takes about 50 seconds.
> >> >         >
> >> >         > But I need to do Body search and that field contains lots
> of
> >> >         text. The field
> >> >         > is <contents>. How can I optimize that ?
> >> >         >
> >> >         > thanks,
> >> >         > Askar
> >> >         >
> >> >         >
> >>
> >>
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Fine Tuning Lucene implementation

Reply via email to