Hi,

in my application I have to write tons of small documents to the index, but with a twist. Many of the documents are actually aggregations of pieces of information that appear in a data stream, usually close together, but nevertheless merged with information for other documents.

When information a1 for my document A arrives, I create my A-object, store it with index.addDocument() and forget about it. Later, when a2 arrives, I fetch A from the index, delete it from the index, update it, and store its updated version. To fetch it from the index, I use a reader retrieved with IndexReader.openIfChanged(). So for one piece of information I have roughly the following sequence:

  get searcher via IndexReader.openIfChanged()
  find previously stored document, if any
  if document already available {
    update document object
    index.deleteDocument(new Term(IDFIELD, id))
  } else {
    create document object
  }
  index.addDocument()


The overall speed is not too bad, but I wonder if more is possible. I changed RAMBufferSizeMB from the default 16 to 200 but saw no improvement in speed.

I would think that keeping documents in RAM for some time such that many updates happen in RAM, rather then being written to disk would improve the overall running time.

Any hints how to configure and use Lucene to improve the speed without layering my own caching on top of it?

Harald.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to