Re: confused about an entry in the FAQ

Otis Gospodnetic Thu, 15 May 2008 21:01:10 -0700

pong.
Is that the most optimal use of FieldSelector?  What happens if you remove it 
from that HitCollector.collect method?
It looks like you are creating a new FieldSelector object for each hit found in 
each search thread.


If it's not that, is the index optimized?
If not, does optimizing it make a difference?

You are also examining every each and every Document in the result set.  Do you 
really need to do that?  That's expensive and you may be witnessing the cost.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Stephane Nicoll <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Wednesday, May 14, 2008 2:38:25 AM
> Subject: Re: confused about an entry in the FAQ
> 
> ping. Sorry for the long email but I prefer to provide all information first.
> 
> On Mon, May 12, 2008 at 12:13 PM, Stephane Nicoll
> wrote:
> > I tried all this and I am confused about the result. I am trying to
> >  implement an hybrid query handler where I fetch the IDs from a
> >  database criteria and the IDs from a full text lucene query and I
> >  intersect them to return the result to the user. The database query
> >  and the intersection works fine even with high load. However the
> >  lucene query is much slower when the number of concurrent users
> >  raises.
> >
> >  Here is what I am doing on the lucene side
> >
> >         final QueryParser queryParser = new
> >  QueryParser(criteria.getDefaultField(), analyzer);
> >         final Query q = queryParser.parse(criteria.getFullTextQuery());
> >         // Index Searcher is shared for all threads and is not
> >  reopened during the load test
> >         final IndexSearcher indexSearcher = getIndexSearcher();
> >         final Setresult = new TreeSet();
> >         indexSearcher.search(q, new HitCollector() {
> >             public void collect(int i, float v) {
> >                 try {
> >                     final Document d =
> >  indexSearcher.getIndexReader().document(i, new FieldSelector() {
> >                         public FieldSelectorResult accept(String s) {
> >                             if (s.equals(CatalogItem.ATTR_ID)) {
> >                                 return FieldSelectorResult.LOAD;
> >                             } else {
> >                                 return FieldSelectorResult.NO_LOAD;
> >                             }
> >                         }
> >                     });
> >                     result.add(Long.parseLong(d.get(CatalogItem.ATTR_ID)));
> >                 } catch (IOException e) {
> >                     throw new RuntimeException("Could not collect
> >  lucene IDs", e);
> >                 }
> >             }
> >         });
> >         return result;
> >
> >
> >  When running with one thread, I have the following figures per test:
> >
> >  Database query is done in[125 msecs] (size=598]
> >  Lucene query is done in[80 msecs (size=15204]
> >  Intersect is done in[4 msecs] (size=103]
> >  Hybrid query is done in[97 msecs]
> >
> >  -> 327 msec / user
> >
> >  When running with ten threads, I have the following figures per user per 
> test:
> >
> >  Database query is done in[222 msecs] (size=94]
> >  Lucene query is done in[2364 msecs (size=15367]
> >  Intersect is done in[0 msecs] (size=12]
> >  Hybrid query is done in[18 msecs]
> >
> >  -> 2.5 sec / user !!
> >
> >  I am just wondering how I can improve this. Clearly there is something
> >  wrong in my code since it's much slower with multiple threads running
> >  concurrently on the same index. The size of the index is 5Mb, I only
> >  store:
> >
> >  * an "id" field (which is the primary key of the related object in the db
> >  * a "class" field which is the class nazme of the related object
> >  (Hibernate search does that for me)
> >
> >  The "keywords" field is indexed but not stored as it is a
> >  representation of other data stored in the db. The searches are
> >  performed on the keywords field only ("foo AND bar" is a typical
> >  query)
> >
> >  Any help is appreciated. If you also know a Spring bean that could
> >  take care of opening/closing the index readers properly, let me know.
> >  Hibernate Search introduces deadlock with multiple threads and the
> >  lucene integration in spring modules does not seeem to do what I want.
> >
> >  Thanks,
> >  Stéphane
> >
> >
> >
> >
> >  On Sat, May 10, 2008 at 8:05 PM, Patrick Turcotte wrote:
> >  > Did you try the IndexSearcher.doc(int i, FieldSelector fieldSelector)  
> method?
> >  >
> >  >  Could be faster because Lucene don't have do "prepare" the whole 
> > document.
> >  >
> >  >  Patrick
> >  >
> >  >
> >  >  On Sat, May 10, 2008 at 9:35 AM, Stephane Nicoll
> >  >  wrote:
> >  >
> >  >
> >  > > From the FAQ:
> >  >  >
> >  >  > "Don't iterate over more hits than needed.
> >  >  > Iterating over all hits is slow for two reasons. Firstly, the search()
> >  >  > method that returns a Hits object re-executes the search internally
> >  >  > when you need more than 100 hits. Solution: use the search method that
> >  >  > takes a HitCollector instead."
> >  >  >
> >  >  > I had a look to HitCollector but it returns the documentId and the
> >  >  > javadoc recommends not fetching the original query there.
> >  >  >
> >  >  > I have to return *one* indexed field from the query result and
> >  >  > currently I am iterating on all results and it's slow. Can you explain
> >  >  > a bit more how I could improve this?
> >  >  >
> >  >  > Thanks,
> >  >  > Stéphane
> >  >  >
> >  >  >
> >  >  > --
> >  >  > Large Systems Suck: This rule is 100% transitive. If you build one,
> >  >  > you suck" -- S.Yegge
> >  >  >
> >  >
> >  > > ---------------------------------------------------------------------
> >  >  > To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  >
> >  > > For additional commands, e-mail: [EMAIL PROTECTED]
> >  >  >
> >  >  >
> >  >
> >  >
> >  > ---------------------------------------------------------------------
> >  >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  >
> >  >
> >  > For additional commands, e-mail: [EMAIL PROTECTED]
> >  >
> >  >
> >
> >
> >
> >  --
> >
> >
> > Large Systems Suck: This rule is 100% transitive. If you build one,
> >  you suck" -- S.Yegge
> >
> 
> 
> 
> -- 
> Large Systems Suck: This rule is 100% transitive. If you build one,
> you suck" -- S.Yegge
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: confused about an entry in the FAQ

Reply via email to