Re: java gc with a frequently changing index?

Kay Roepke Mon, 30 Jul 2007 15:50:10 -0700

Hi Tim!

On Jul 25, 2007, at 8:41 PM, Tim Sturge wrote:

I am indexing a set of constantly changing documents. The changerate is moderate (about 10 docs/sec over a 10M document collectionwith a 6G total size) but I want to be right up to date (ideallywithin a second but within 5 seconds is acceptable) with the index.

We have a change rate between 2-3 to 60 docs/sec over a bit smallerindex (but not too much smaller). We are actually reopeningIndexSearchers every five seconds or if the amount of index changesexceeds a certain threshold (100 changes IIRC). The latter is toguard against spikes in updates we like to see reflected earlier.This is purely an implementation detail, though.

Right now I have code that adds new documents to the index anddeletes old ones using updateDocument() in the 2.1 IndexWriter. Inorder to see the changes, I need to recreate the IndexReader/IndexSearcher every second or so. I am not calling optimize() onthe index in the writer, and the mergeFactor is 10.

Is there a separation between the code that inserts/updates and theone that searches? We have that distinction and it's been workinggreat. Might not possible for your application (I simply don't knowwhat your objectives are) but might be worth considering. In otherwords we have separate VMs doing the updates and searches, so we canset different heap sizes and GC strategies.

The problem I am facing is that java gc is terrible at collectingthe IndexSearchers I am discarding. I usually have a 3msec querytime, but I get gc pauses of 300msec to 3 sec (I assume is iscollecting the "tenured" generation in these pauses, which is myold IndexSearcher)

We used to have that, too, until we switched GC algorithms. It wasunbearable.

I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC"and calling System.gc() right after I close the old index withoutmuch luck (I get the pauses down to 1sec, but get 3x as many. Iwant < 25 msec pauses). So my question is, should I be avoidingreloading my index in this way? Should I keep a separateIndexReader (which only deletes old documents) and one for newdocuments? Is there a standard technique for a quickly changing index?

So, these are the settings we use for the search application (this isJava 6, though, YMMV):

-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10

You might have to tweak the generation sizes for your application.That is rather tricky business, but

-verbosegc
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps

might help you to figure out what the correct sizes are. Thosesettings should also tell you whether your tweaks are actuallyworking for you.Systems.gc() is just asking for trouble, really. I have yet to see asituation where it really helped me. The best way is to figure outthe right settings for the GC itself, and then forget about it. Itactually took some experimenting and load-testing to find the rightmixture for us.

GC pauses aren't user-noticable in our application (which is web-based). Given our architecture we have a certain amount of latencybetween a document change and the reflection of that in the index,but it is not limited by GC. The machines are 64bit P4 Xeons with 4GBRAM, so nothing out of the ordinary.Java 6 made a noticable difference for us, on the order of some 10%performance increase, both in load average and response time.

We have yet to encounter problems with it...

The updating part of the application runs with a simple -XX:+UseParallelGC and its max heap size is much smaller.

Also we are using a custom refcounted scheme for index searchers, sothat new requests always get the latest IndexSearcher opened. Wereopen searchers constantly, as I mentioned above. This pretty muchensures that we meet our 5 second max delay time. I cannot say thatit actually takes that long to reopen, though we have made somemodifications to the Lucene core which should make it even slower toreopen and write to disc. So I guess this is not your bottleneck,either.


HTH,
-k
--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: java gc with a frequently changing index?

Reply via email to