Hi,

I am indexing a set of constantly changing documents. The change rate is moderate (about 10 docs/sec over a 10M document collection with a 6G total size) but I want to be right up to date (ideally within a second but within 5 seconds is acceptable) with the index.

Right now I have code that adds new documents to the index and deletes old ones using updateDocument() in the 2.1 IndexWriter. In order to see the changes, I need to recreate the IndexReader/IndexSearcher every second or so. I am not calling optimize() on the index in the writer, and the mergeFactor is 10.

The problem I am facing is that java gc is terrible at collecting the IndexSearchers I am discarding. I usually have a 3msec query time, but I get gc pauses of 300msec to 3 sec (I assume is is collecting the "tenured" generation in these pauses, which is my old IndexSearcher)

I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and calling System.gc() right after I close the old index without much luck (I get the pauses down to 1sec, but get 3x as many. I want < 25 msec pauses). So my question is, should I be avoiding reloading my index in this way? Should I keep a separate IndexReader (which only deletes old documents) and one for new documents? Is there a standard technique for a quickly changing index?

Thanks,

Tim


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to