Re: Index Partitioning ( was Re: Search deadlocking under load)

Paul Smith Sun, 10 Jul 2005 18:03:36 -0700


On 11/07/2005, at 10:43 AM, Chris Hostetter wrote:


: > Generally speaking, you only ever need one active Searcher, which
: > all of
: > your threads should be able to use.  (Of course, Nathan says that
: > in his

: > code base, doing this causes his JVM to freeze up, but I'venever seen

: > this myself).
: >
: Thanks for your response Chris.  Do you think we are going down a
: deadly path by having "many smaller" IndexSearchers open rather than
: "one very large one"?

I'm sorry ... i think i may have confused you, i forgot that thisthreadwas regarding partioning the index. i ment one searcher *perindex* ...

don't try to make a seperate searcher per client, or have a pool of
searchers, or anything like that.  But if you have a need to partition
your data into multiple indexes, then have one searcher per index.

Actually I think I confused you first, and then you confused meback... Let me... uhh, clarify 'ourselves'.. :)

My use of the word 'pool' was an error on my part (and a very sillyone). I should really have meant "LRU Cache".

We have recognized that there is a finite # of IndexSearchers thatcan probably be open at one time. So we'll use an LRU cache to makesure only the 'actively' in use Searchers are open. However therewill only be one IndexSearcher for a given physical Index directoryopen at a time, we're just making sure only the recently used onesare kept open to keep memory limits sane.

now assume you partition your data into two seperate indexes,unless the
way you partition your data lets you cleanly so that each of hte
two indexes contains only half the number of terms as if you hadone bigindex, then sorting on a field in those two indexes will requiremore RAM
then sorting on the same data in asingle index.

Our data is logically segmented into Projects. Each Project cancontain Documents and Mail. So we currently have 2 physical Indexesper Project. 90% of the time our users work within one project at atime, and only work in "document mode" or "mail mode". Every now andthen they may need to do a general search across all Entities and/orProjects they are involved in (accomplished with Mulitsearcher).Perhaps we should just put Documents and Mail all in one Index for aproject (ie have 1 Index per project)??

Part of the reason in to partition is to make the cost of rebuildinga given project cheaper. Reduces the risk of an Uber-Index beingcorrupted and screwing all the users up. We can order the reindexingof projects to make sure our more important customers get re-indexedfirst if there is a serious issue.

I would have thought that partitioning indexes would have performancebenefits too: a lot less data to scan (most of the data is alreadyrelevant).

Since this isn't in production yet, I'd rather be proven wrong nowrather than later! :)


Thanks for your input.

Paul

Re: Index Partitioning ( was Re: Search deadlocking under load)

Reply via email to