Thanks Paul! Using your suggestion, I have changed the update check code to use only the indexReader:
try { localReader = IndexReader.open(path); while (keyIter.hasNext()) { key = (String) keyIter.next(); term = new Term("key", key); TermDocs tDocs = localReader.termDocs(term); if (tDocs != null) { try { while (tDocs.next()) { localReader.delete(tDocs.doc()); } } finally { tDocs.close(); } } } } finally { if (localReader != null) { localReader.close(); } } Unfortunately it didn't seem to make any dramatic difference. I also see the CPU is only 30-50% busy, so I am guessing it's spending a lot of time in IO. Anyway of making the CPU work harder? Is batch size of 500 too small for 1 million documents? Currently I am seeing a linear speed degredation of 0.3 milliseconds per document. Thanks -John On Wed, 24 Nov 2004 09:05:39 +0100, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Wednesday 24 November 2004 00:37, John Wang wrote: > > > > Hi: > > > > I am trying to index 1M documents, with batches of 500 documents. > > > > Each document has an unique text key, which is added as a > > Field.KeyWord(name,value). > > > > For each batch of 500, I need to make sure I am not adding a > > document with a key that is already in the current index. > > > > To do this, I am calling IndexSearcher.docFreq for each document and > > delete the document currently in the index with the same key: > > > > while (keyIter.hasNext()) { > > String objectID = (String) keyIter.next(); > > term = new Term("key", objectID); > > int count = localSearcher.docFreq(term); > > To speed this up a bit make sure that the iterator gives > the terms in sorted order. I'd use an index reader instead > of a searcher, but that will probably not make a difference. > > Adding the documents can be done with multiple threads. > Last time I checked that, there was a moderate speed up > using three threads instead of one on a single CPU machine. > Tuning the values of minMergeDocs and maxMergeDocs > may also help to increase performance of adding documents. > > Regards, > Paul Elschot > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]