Re: Common Bottlenecks

Peter Keegan Wed, 24 Jun 2009 07:14:06 -0700

Our biggest bottleneck in searching is in a custom scorer which calls
AllTermDocs.next() very frequently. This class uses Lucene's own BitVector,
which I think is already highly optimized. Farther down in the list are
DocSetHitCollector.collect() and FieldSortedQueue.insert(). For indexing,
the main bottlneck is in the Analyzer/Filter, which is basically a
WhitespaceAnalyzer with custom code to add payloads to tokens and change the
positions between tokens.



Peter


On Tue, Jun 9, 2009 at 7:17 PM, Vico Marziale <[email protected]> wrote:

> Hello all. I am new to Lucene as well as this list. I am a PhD student at
> the University of New Orleans. My current research in in leveraging
> highly-multicore processors to speed computer forensics tools. For the
> moment I am trying to figure out what the most common performance bottleneck
> inside of Lucene itself is. I will then take a crack at porting some (small)
> portion of Lucene to CUDA (http://www.nvidia.com/object/cuda_what_is.html)
> and see what kind of speedups are achievable.
>
> The portion of code to be ported must be trivially parallelizable. After
> spending some time digging around the docs and source, StandardAnalyzer
> appears to be a likely candidate. I've run the demo code through a profiler,
> but it was less than helpful, especially in light of the fact bottlenecks
> are going to be dependent on the way the Lucene API is used. In
> general, what is the most computationally expensive part of the process?
> Does the analyzer seem like a reasonable choice?
>
> Thanks,
> --
> Lodovico Marziale
> PhD Candidate
> Department of Computer Science
> University of New Orleans
>

Re: Common Bottlenecks

Reply via email to