RE: Searching documents on big index by using ParallelMultiSearcher is slow...

Haines, Ronald C. \(LNG-DAY\) Wed, 04 Oct 2006 05:16:10 -0700

Keep in mind, that depending on your queries (lots of terms, wildcards,
date ranges), you can spend quite a bit of time during the 'Weight'
calculation...this all happens pre-search.  During the Weight
calculation, you will be making remote calls to the rewrite() and
docFreq() methods.  There will be (# of terms * # of remotes) of these
remote calls made for each of the above methods.


And, I think the ParallelMultiSearcher will make all of these calls
serially before it starts to thread the search process.  I have found
that this, serially, can account for quite a bit of the overall response
time.

I too am interested in learning more about a large scale distributed
Lucene model.
 
-----Original Message-----
From: Erick Erickson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 04, 2006 7:33 AM
To: java-user@lucene.apache.org
Subject: Re: Searching documents on big index by using
ParallelMultiSearcher is slow...

OK, you're now officially beyond my competence, so I'll have to wait for
people who actually know <G>....

Although if I read your stats right, you're getting approximately 1 sec
response time over 10M documents on a 10G index. That's not bad at all.
What
kind of response time do you need?

On 10/3/06, Scott <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> > Well, the first question is always "are you opening/closing your
> > IndexSearchers for each request on your remote machines?". This is
> always a
> > no-no. This is also a question for your single-searcher version.
>
> Yes I know, each search slave (RMI server) have single instance
>   of IndexSearcher and it's open once when RMI server starts.
>
> > What is your performance if you only go to one server? I'd start by
> finding
>
> A performance on one server with FULL index (not divided by 10)
>   is about 2500 ms.
> On one server with splitted index (divided by 10) is about 50 ms.
>
> And on ParallelMultiSearcher with 10 of remote searchable,
>   each RemoteSearchable returns in about 50 - 100 ms,
>   and ParallelMultiSearcher returns also 50 - 100 ms, because of
>   threading.
> but Hits Searcher.search(Query, Sort) responds in about 500 - 1000 ms.
>
> I think that Searcher.search with Sort reads all of SortFields from
>   IndexReader and it's bottleneck.
>
> Are there results of high performance distributed Lucene with
> ParallelMultiSearcher?
> Or need hadoop?
>
> Erick Erickson wrote:
> > Well, the first question is always "are you opening/closing your
> > IndexSearchers for each request on your remote machines?". This is
> always a
> > no-no. This is also a question for your single-searcher version.
> >
> > What is your performance if you only go to one server? I'd start by
> finding
> > out what happens when you forget all the ParallelMultiSearcher
stuff,
> all
> > the RMI stuff etc, and just see what your performance is on one of
your
> > index parts locally. Once that is answered, extend to RMI, then the
> > Parallel...., at each step seeing if your performance degrades
> > unacceptably.
> > That'll at least give you a clue what part of the process is the
biggest
> > problem.
> >
> > And without knowing a LOT more about your searches, and your index,
it's
> > kind of hard to come up with solutions <G>....
> >
> > Best
> > Erick
> >
> > On 10/3/06, Scott <[EMAIL PROTECTED]> wrote:
> >>
> >> Hi,
> >>
> >> I have a question about ParallelMultiSearcher performance.
> >>
> >> I want to search documents on about 10 gigabytes of index.
> >> (The index has 10,000,000 documents.)
> >>
> >> I get very slow performance using IndexSearcher with ONE index
> normally.
> >> Then I tried to use ParallelMultiSearcher with 10 servers of remote
> >> searchable.
> >>
> >> Index:
> >> Each search slaves have 1/10 of index.
> >> (ONE index divided to 10 servers.)
> >>
> >> Search slave:
> >> Each search slaves start remote searchable RMI server,
> >> and wait connecting from search master.
> >>
> >> Search master:
> >> The search master use Naming.lookup() to get remote searchable.
> >> Get 10 remote searchables from each search slaves and build
> >> ParallelMultiSearcher.
> >> Then search.
> >>
> >> Any solution?
> >>
> >> --
> >> Scott
> >>
> >>
---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >
>
> --
> Scott
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Searching documents on big index by using ParallelMultiSearcher is slow...

Reply via email to