Hello all. I am new to Lucene as well as this list. I am a PhD student at the University of New Orleans. My current research in in leveraging highly-multicore processors to speed computer forensics tools. For the moment I am trying to figure out what the most common performance bottleneck inside of Lucene itself is. I will then take a crack at porting some (small) portion of Lucene to CUDA (http://www.nvidia.com/object/cuda_what_is.html) and see what kind of speedups are achievable.
The portion of code to be ported must be trivially parallelizable. After spending some time digging around the docs and source, StandardAnalyzer appears to be a likely candidate. I've run the demo code through a profiler, but it was less than helpful, especially in light of the fact bottlenecks are going to be dependent on the way the Lucene API is used. In general, what is the most computationally expensive part of the process? Does the analyzer seem like a reasonable choice? Thanks, -- Lodovico Marziale PhD Candidate Department of Computer Science University of New Orleans