Hi,
I am indexing a set of constantly changing documents. The change rate is
moderate (about 10 docs/sec over a 10M document collection with a 6G
total size) but I want to be right up to date (ideally within a second
but within 5 seconds is acceptable) with the index.
Right now I have code that adds new documents to the index and deletes
old ones using updateDocument() in the 2.1 IndexWriter. In order to see
the changes, I need to recreate the IndexReader/IndexSearcher every
second or so. I am not calling optimize() on the index in the writer,
and the mergeFactor is 10.
The problem I am facing is that java gc is terrible at collecting the
IndexSearchers I am discarding. I usually have a 3msec query time, but I
get gc pauses of 300msec to 3 sec (I assume is is collecting the
"tenured" generation in these pauses, which is my old IndexSearcher)
I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and
calling System.gc() right after I close the old index without much luck
(I get the pauses down to 1sec, but get 3x as many. I want < 25 msec
pauses). So my question is, should I be avoiding reloading my index in
this way? Should I keep a separate IndexReader (which only deletes old
documents) and one for new documents? Is there a standard technique for
a quickly changing index?
Thanks,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]