Index Partitioning ( was Re: Search deadlocking under load)

Paul Smith Fri, 08 Jul 2005 23:44:17 -0700

Nathan, first apologies for somewhat hijacking your thread, but Ibelieve my question to be very related.

Nathan's Scenario 1 is the one we're effectively employing (or in theprocess of setting up). Rather than 1 Index To Rule Them All, I havedecided to partition the index structure. Users tend to focus on aProject concept at a time, and within each Project, they haveDocuments and Mail (and some other types we'll eventually index, wecall them 'entities' to be generic). So I am creating an Index foreach Project-Entity. We should still be able to search across allentities for a given project (or even for all) by usingMultiSearcher. However I believed it would be faster to haveseparate indices (much smaller index to search).

Otis (and anyone else), are you suggesting this design is notsomething we should employ?

Nathan's point about pooling Searchers is something that we alsoaddressed by a LRU cache mechanism. In testing we also found thatthere was an upper limit on the number of IndexSearchers that can beopen at one time, and so I can see why he suffered OOM with creatingtemporary searchers for those requests outside the current pool-set.However his 2nd point is interesting that creating a new index eachtime eventually suffered OutOfMemory (even though he's closing them)is a worry. Is this because an IndexSearcher can be closed, but theunderlying IndexReader is not automatically closed?

Appreciate any thoughts on this. I'd rather know now while I havethe opportunity to change the design than later when in production.. :)


cheers,

Paul Smith

On 09/07/2005, at 5:39 AM, Otis Gospodnetic wrote:

Nathan,

3) is the recommended usage.
Your index is on an NFS share, which means you are searching it over
the network.  Make it local, and you should see performance
improvements.  Local or remove, it makes sense that searches take
longer to execute, and the load goes up.  Yes, it shouldn't deadlock.
You shouldn't need to synchronize access to IndexSearcher.
When your JVM locks up next time, kill it, get the thread dump, and
send it to the list, so we can try to remove the bottleneck, if that's
possible.

How many queries/second do you run, and what kinds of queries arethey,

how big is your index and what kind of hardware (disks, RAM, CPU) are
you using?

Otis

--- Nathan Brackett <[EMAIL PROTECTED]> wrote:

Hey all,

We're looking to use Lucene as the back end to our website and we're
running
into an unusual deadlocking problem.

For testing purposes, we're just running one web server (threaded
environment) against an index mounted on an NFS share. This machine
performs
searches only against this index so it's not being touched. We have
tried a
few different models so far:

1) Pooling IndexSearcher objects: Occasionally we would run into
OutOfMemory
problems as we would not block if a request came through and all
IndexSearchers were already checked out, we would just create a
temporary
one and then dispose of it once it was returned to the pool.

2) Create a new IndexSearcher each time: Every request to search
would
create an IndexSearcher object. This quickly gave OutOfMemory errors,
even
when we would close them out directly after.

3) Use a global IndexSearcher: This is the model we're working with
now. The
model holds up fine under low-moderate load and is, in fact, much
faster at
searching (probably due to some caching mechanism). Under heavy load
though,
the CPU will spike up to 99% and never come back down until we kill
-9 the
process. Also, as you ramp the load, we've discovered that search
times go
up as well. Searches will generally come back after 40ms, but as the
load
goes up the searches don't come back for up to 20 seconds.

We've been attempting to find where the problem is for the last week
with no
luck. Our index is optimized, so there is only one file. Do we need
to
synchronize access to the global IndexSearcher so that only one
search can
run at a time? That poses a bit of a problem as if a particular
search takes
a long time, all others will wait. This problem does not look like an
OutOrMemory error because the memory usage when the spike occurs is
usually
in the range of 150meg used with a ceiling of 650meg. Anyone else
experiencing any problems like this or have any idea where we should
be
looking? Thanks.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Index Partitioning ( was Re: Search deadlocking under load)

Reply via email to