Re: Splitting the index

Erick Erickson Wed, 27 Sep 2006 10:54:56 -0700

I'd ask for more details. You say that you've narrowed it down to Lucene
doing the searching.... But which part of the search? Here're two places
people have run into problems before (sorry if you already know this...).


1> Iterating through the entire returned set with Hits.doc(#).
2> opening and closing your indexreader between queries.

The first thing I'd do is insert some timing logging into your search code.
For instance, log the time after you've assembled your query and before you
execute the search. Log the time it takes to do the raw search. Log the time
you spend spinning through the returned hits preparing to return the
results. I'm not talking anything fancy here, just
System.currentTimeMilliseconds().

I can't emphasize strongly enough that you simply *cannot* jump to a
solution before you *know* where you're spending your time. I've spent
waaaaay more time that I want to admit to fixing code that I was *sure* was
slow only find out that the *real* problem was somewhere else.

Finally, what times are you seeing? And what was the index size before and
after? Without some numbers, nobody else can guess at any solutions.

Best
Erick

On 9/27/06, Rob Young <[EMAIL PROTECTED]> wrote:


Hi,

I'm using Lucene to search a product database (CDs, DVDs, games and now
books). Recently that index has increased in size to over a million items
(added books). I have been performance testing our search server and the
throughput of requests has dropped significantly, profiling the server it
all
seems to be in the Lucene searching.

So, now that I've narrowed it down to the searching itself rather than the
rest of the application. What can I do about it? I am running a TermQuery,
falling back to a FuzzyQuery when no results are found (each combined in a
boolean queries with the product type restrictions).

One solution I had in mind was to split the index down into four, would
this
provide any gains? It will require a lot of re-factoring so I don't want
to
commit myself if there's no chance it will help.

Another solution along the same train of thought was to use a caching
filter
search to cut the index into parts. How would this compare to the previous
idea?

Does anyone have any other ideas / suggestions?

Thanks
Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Splitting the index

Reply via email to