Re: Improving Search Performance on Large Indexes

Sharad Agarwal Thu, 24 May 2007 22:28:26 -0700

Su.Cheng wrote:

Hi Scott,


I met the same situation as you(index 100M documents). If the computer

has only one CPU and one disk, ParallelMultiSearcher is slower thanMultiSearcher.


I wrote an email "Who has sample code of remote multiple servers
multiple indexes searching" yesterday. If you have any suggestion,
please let me know.

We have been exposing individual index searchers as mbeans as it reducesthe amount of remoting code a lot (also getting client side proxies isvery easy)

On each index partition, run the mbean server, expose the followingmbean as follows:-

   public interface RemoteSearchableMBean extends Searchable{
       //add any extra methods you want to expose remotely
   }

class RemoteSearchable extends IndexSearcher implementsRemoteSearchableMBean {

         //initialize the super using appropriate constructors
   }

   Register the above mbean instance in the mbean server.

From the client side (server doing the merging across partitions), getthe proxy handles of all RemoteSearchableMBean from all your remote servers.- Use (RemoteSearchableMBean)MBeanServerInvocationHandler.newProxyInstance method

Create a MutliSearcher/ParallelMultiSearcher with all the handles ofRemSearchableMBean proxy handle :

- MutliSearcher(Searchable[] arg0)

The above approach is working pretty handy for us. And running inproduction for a while now.


- sharad

Best regards.

On Thu, 2007-05-24 at 11:38 -0700, Otis Gospodnetic wrote:

Scott,

Yes, take your big index and split it into multiple smaller shards.  Put those 
shards in different servers and then query them remotely (using the provided 
RMI thing in Lucene or using something custom), take top N results from each 
searcher, merge those, and take top N from the merged result set.

You could also experiment with a memory mapped Directory implementation.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Scott Sellman <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, May 24, 2007 1:31:49 PM
Subject: Improving Search Performance on Large Indexes

Hello,

Currently we are attempting to optimize the search time against an index
that is 26 GB in size (~35 million docs) and I was wondering what
experiences others have had in similar attempts.  Simple searches
against the index are still fast even at 26GB, but the problem is our
application allows the user a lot of options in searching, which can
generate complicated queries.  Based on previous posts we decided to try
splitting our index into multiple indexes and use ParallelMultiSearcher.
When we split our single index into 6 separate ones we recorded a 25%
decrease in response time on minimal load.  We haven't done any stress
testing on it yet, has anyone noticed problems with increased load when
using ParallelMultiSearcher?  What about using machines with more
processors in combination with the ParallelMultiSearcher, does this
result in much response time improvement?  Or is the slow down primarily
with disk access?

Any recommendations are welcome.Thanks in advance,

Scott





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Improving Search Performance on Large Indexes

Reply via email to