Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

Michael McCandless Wed, 13 Jan 2010 12:01:30 -0800

IndexWriter should show good concurrency, ie, as you add threads you
should see indexing speedup, assuming you have no external
synchronization, your hardware has free concurrency and you use a
large enough RAM buffer, and don't commit too frequently.


But you should use a single IndexWriter, which the threads share.
Trying to open a different IW per thread will lead to the lock timeout
exception.

Mike

On Wed, Jan 13, 2010 at 2:08 PM, jchang <jchangkihat...@gmail.com> wrote:
>
> I don't specifically need a cluster of servers writing indexes.  Actually, at
> the moment, I only have one server, but multiple message consuming threads,
> so I still land back at the same problem of contention for the index lock.
> Why do I have multiple message consumers?  Speed...I wanted to dequeue my
> items to be indexed fast.  However, I'm getting the impression that may have
> been a foolish effort.  I find that only having one writer thread is not
> much slower than having 20, which makes sense if they are all waiting on one
> file.  If only one writer thread can be fast enough (which gets rid of
> timeout exceptions that I asked about in a different thread), that that is
> good enough for me.
>
> Do you know what kind of index writes per second I can hope to hit with one
> writer thread?  I guess it depends on many factors.
>
> Also, I know 2.9.0 is faster than 2.4.0 (which I'm on), but I'm not sure I
> can move up to 2.9.0 really easily because all my Lucene usage is wrapped in
> Compass, which does not yet support 2.9.0.  I think I'd have to rewrite my
> service to use straight Lucene, which might be a good idea, but I can't do
> quickly.  We don't use Solr.
>
> Thanks for your help thus far and thanks in advance for any more responses.
>
>
>
> Jake Mannix wrote:
>>
>> On Tue, Jan 12, 2010 at 8:15 PM, Otis Gospodnetic <
>> otis_gospodne...@yahoo.com> wrote:
>>
>>> John, you should have a look at Zoie.  I just finished adding LinkedIn's
>>> case study about Zoie to Lucene in Action 2, so this is fresh in my mind.
>>
>> :)
>>>
>>
>> Yep, Zoie ( http://zoie.googlecode.com ) will handle the server restart
>> part, in that while yes, you lose what is in RAM, Zoie keeps track of an
>> "index version" on disk alongside the Lucene index which it uses to decide
>> where it must reindex from to "catch up" if it there have been incoming
>> indexing events while the server was out of commission.
>>
>> Zoie does not support multiple servers using the same index, because each
>> zoie instance has IndexWriter instances, and you'll get locking problems
>> trying to do that.  You could have one Zoie instance effectively as the
>> "master/writer/realtime reader", and a bunch of raw Lucene "slaves" which
>> could read off of that index, but as you say, could not get access to the
>> RAMDirectory information until it was flushed to disk.
>>
>> Why do you need a "cluster" of servers hitting the same index?  Are they
>> different applications (with different search logic, so they need to be
>> different instances), or is it just to try and utilize your hardware
>> efficiently?  If it's for performance reasons, you might find you get
>> better
>> use of your CPU cores by just sharding your one index into smaller ones,
>> each having their own Zoie instance, and putting a "broker" on top of them
>> searching across all and mergesorting the results.  Often even this isn't
>> necessary, because Zoie will be opening the disk-backed IndexReader in
>> readonly mode, and thus all the synchronized blocks are gone, and one
>> single
>> Zoie instance will easily saturate your cpu cores by simple
>> multi-threading
>> by your appserver.
>>
>> If you really needed to do many different kinds of writes (from different
>> applications) and also have applications not involved in the writing also
>> seeing (in real-time) these writes, then you could still do it with Zoie,
>> but it would take some interesting architectural juggling (write your own
>> StreamDataProvider class which takes input from a variety of sources and
>> merges them together to feed to one Zoie instance, then a broker on top of
>> zoie which serves out IndexReaders to different applications living on top
>> which can wrap them up in their own business logic as they saw fit... as
>> long as it was ok to have all the applications in the same JVM, of
>> course).
>>
>>   -jake
>>
>>
>>>
>>>  Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>>>
>>>
>>>
>>> ----- Original Message ----
>>> > From: jchang <jchangkihat...@gmail.com>
>>> > To: java-dev@lucene.apache.org
>>> > Sent: Tue, January 12, 2010 6:10:56 PM
>>> > Subject: Lucene 2.9.0 Near Real Time Indexing and Service
>>> Crashes/restarts
>>> >
>>> >
>>> > Lucene 2.9.0 has near real time indexing, writing to a RAMDir which
>>> gets
>>> > flushed to disk when you do a search.
>>> >
>>> > Does anybody know how this works out with service restarts (both
>>> orderly
>>> > shutdown and a crash)?  If the service goes down while indexed items
>>> are
>>> in
>>> > RAMDir but not on disk, are they lost?  Or is there some kind of log
>>> > recovery?
>>> >
>>> > Also, does anybody know the impact of this which clustered lucene
>>> servers?
>>> > If you have numerous servers running off one index, I assume there is
>>> no
>>> way
>>> > for the other services to pick up the newly indexed items until they
>>> are
>>> > flushed to disk, correct?  I'd be happy if that is not so, but I
>>> suspect
>>> it
>>> > is so.
>>> >
>>> > Thanks,
>>> > John
>>> > --
>>> > View this message in context:
>>> >
>>> http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27136539.html
>>> > Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27148813.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

Reply via email to