IndexWriter should show good concurrency, ie, as you add threads you should see indexing speedup, assuming you have no external synchronization, your hardware has free concurrency and you use a large enough RAM buffer, and don't commit too frequently.
But you should use a single IndexWriter, which the threads share. Trying to open a different IW per thread will lead to the lock timeout exception. Mike On Wed, Jan 13, 2010 at 2:08 PM, jchang <jchangkihat...@gmail.com> wrote: > > I don't specifically need a cluster of servers writing indexes. Actually, at > the moment, I only have one server, but multiple message consuming threads, > so I still land back at the same problem of contention for the index lock. > Why do I have multiple message consumers? Speed...I wanted to dequeue my > items to be indexed fast. However, I'm getting the impression that may have > been a foolish effort. I find that only having one writer thread is not > much slower than having 20, which makes sense if they are all waiting on one > file. If only one writer thread can be fast enough (which gets rid of > timeout exceptions that I asked about in a different thread), that that is > good enough for me. > > Do you know what kind of index writes per second I can hope to hit with one > writer thread? I guess it depends on many factors. > > Also, I know 2.9.0 is faster than 2.4.0 (which I'm on), but I'm not sure I > can move up to 2.9.0 really easily because all my Lucene usage is wrapped in > Compass, which does not yet support 2.9.0. I think I'd have to rewrite my > service to use straight Lucene, which might be a good idea, but I can't do > quickly. We don't use Solr. > > Thanks for your help thus far and thanks in advance for any more responses. > > > > Jake Mannix wrote: >> >> On Tue, Jan 12, 2010 at 8:15 PM, Otis Gospodnetic < >> otis_gospodne...@yahoo.com> wrote: >> >>> John, you should have a look at Zoie. I just finished adding LinkedIn's >>> case study about Zoie to Lucene in Action 2, so this is fresh in my mind. >> >> :) >>> >> >> Yep, Zoie ( http://zoie.googlecode.com ) will handle the server restart >> part, in that while yes, you lose what is in RAM, Zoie keeps track of an >> "index version" on disk alongside the Lucene index which it uses to decide >> where it must reindex from to "catch up" if it there have been incoming >> indexing events while the server was out of commission. >> >> Zoie does not support multiple servers using the same index, because each >> zoie instance has IndexWriter instances, and you'll get locking problems >> trying to do that. You could have one Zoie instance effectively as the >> "master/writer/realtime reader", and a bunch of raw Lucene "slaves" which >> could read off of that index, but as you say, could not get access to the >> RAMDirectory information until it was flushed to disk. >> >> Why do you need a "cluster" of servers hitting the same index? Are they >> different applications (with different search logic, so they need to be >> different instances), or is it just to try and utilize your hardware >> efficiently? If it's for performance reasons, you might find you get >> better >> use of your CPU cores by just sharding your one index into smaller ones, >> each having their own Zoie instance, and putting a "broker" on top of them >> searching across all and mergesorting the results. Often even this isn't >> necessary, because Zoie will be opening the disk-backed IndexReader in >> readonly mode, and thus all the synchronized blocks are gone, and one >> single >> Zoie instance will easily saturate your cpu cores by simple >> multi-threading >> by your appserver. >> >> If you really needed to do many different kinds of writes (from different >> applications) and also have applications not involved in the writing also >> seeing (in real-time) these writes, then you could still do it with Zoie, >> but it would take some interesting architectural juggling (write your own >> StreamDataProvider class which takes input from a variety of sources and >> merges them together to feed to one Zoie instance, then a broker on top of >> zoie which serves out IndexReaders to different applications living on top >> which can wrap them up in their own business logic as they saw fit... as >> long as it was ok to have all the applications in the same JVM, of >> course). >> >> -jake >> >> >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch >>> >>> >>> >>> ----- Original Message ---- >>> > From: jchang <jchangkihat...@gmail.com> >>> > To: java-dev@lucene.apache.org >>> > Sent: Tue, January 12, 2010 6:10:56 PM >>> > Subject: Lucene 2.9.0 Near Real Time Indexing and Service >>> Crashes/restarts >>> > >>> > >>> > Lucene 2.9.0 has near real time indexing, writing to a RAMDir which >>> gets >>> > flushed to disk when you do a search. >>> > >>> > Does anybody know how this works out with service restarts (both >>> orderly >>> > shutdown and a crash)? If the service goes down while indexed items >>> are >>> in >>> > RAMDir but not on disk, are they lost? Or is there some kind of log >>> > recovery? >>> > >>> > Also, does anybody know the impact of this which clustered lucene >>> servers? >>> > If you have numerous servers running off one index, I assume there is >>> no >>> way >>> > for the other services to pick up the newly indexed items until they >>> are >>> > flushed to disk, correct? I'd be happy if that is not so, but I >>> suspect >>> it >>> > is so. >>> > >>> > Thanks, >>> > John >>> > -- >>> > View this message in context: >>> > >>> http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27136539.html >>> > Sent from the Lucene - Java Developer mailing list archive at >>> Nabble.com. >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> > For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >> >> > > -- > View this message in context: > http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27148813.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org