Keep in mind, that depending on your queries (lots of terms, wildcards, date ranges), you can spend quite a bit of time during the 'Weight' calculation...this all happens pre-search. During the Weight calculation, you will be making remote calls to the rewrite() and docFreq() methods. There will be (# of terms * # of remotes) of these remote calls made for each of the above methods.
And, I think the ParallelMultiSearcher will make all of these calls serially before it starts to thread the search process. I have found that this, serially, can account for quite a bit of the overall response time. I too am interested in learning more about a large scale distributed Lucene model. -----Original Message----- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 04, 2006 7:33 AM To: java-user@lucene.apache.org Subject: Re: Searching documents on big index by using ParallelMultiSearcher is slow... OK, you're now officially beyond my competence, so I'll have to wait for people who actually know <G>.... Although if I read your stats right, you're getting approximately 1 sec response time over 10M documents on a 10G index. That's not bad at all. What kind of response time do you need? On 10/3/06, Scott <[EMAIL PROTECTED]> wrote: > > Hi, > > > Well, the first question is always "are you opening/closing your > > IndexSearchers for each request on your remote machines?". This is > always a > > no-no. This is also a question for your single-searcher version. > > Yes I know, each search slave (RMI server) have single instance > of IndexSearcher and it's open once when RMI server starts. > > > What is your performance if you only go to one server? I'd start by > finding > > A performance on one server with FULL index (not divided by 10) > is about 2500 ms. > On one server with splitted index (divided by 10) is about 50 ms. > > And on ParallelMultiSearcher with 10 of remote searchable, > each RemoteSearchable returns in about 50 - 100 ms, > and ParallelMultiSearcher returns also 50 - 100 ms, because of > threading. > but Hits Searcher.search(Query, Sort) responds in about 500 - 1000 ms. > > I think that Searcher.search with Sort reads all of SortFields from > IndexReader and it's bottleneck. > > Are there results of high performance distributed Lucene with > ParallelMultiSearcher? > Or need hadoop? > > Erick Erickson wrote: > > Well, the first question is always "are you opening/closing your > > IndexSearchers for each request on your remote machines?". This is > always a > > no-no. This is also a question for your single-searcher version. > > > > What is your performance if you only go to one server? I'd start by > finding > > out what happens when you forget all the ParallelMultiSearcher stuff, > all > > the RMI stuff etc, and just see what your performance is on one of your > > index parts locally. Once that is answered, extend to RMI, then the > > Parallel...., at each step seeing if your performance degrades > > unacceptably. > > That'll at least give you a clue what part of the process is the biggest > > problem. > > > > And without knowing a LOT more about your searches, and your index, it's > > kind of hard to come up with solutions <G>.... > > > > Best > > Erick > > > > On 10/3/06, Scott <[EMAIL PROTECTED]> wrote: > >> > >> Hi, > >> > >> I have a question about ParallelMultiSearcher performance. > >> > >> I want to search documents on about 10 gigabytes of index. > >> (The index has 10,000,000 documents.) > >> > >> I get very slow performance using IndexSearcher with ONE index > normally. > >> Then I tried to use ParallelMultiSearcher with 10 servers of remote > >> searchable. > >> > >> Index: > >> Each search slaves have 1/10 of index. > >> (ONE index divided to 10 servers.) > >> > >> Search slave: > >> Each search slaves start remote searchable RMI server, > >> and wait connecting from search master. > >> > >> Search master: > >> The search master use Naming.lookup() to get remote searchable. > >> Get 10 remote searchables from each search slaves and build > >> ParallelMultiSearcher. > >> Then search. > >> > >> Any solution? > >> > >> -- > >> Scott > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > > -- > Scott > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]