We've discussed realtime search before, it looks like after the next release
we can get some sort of realtime search working.  I was going to open a new
issue but decided it might be best to discuss realtime search on the dev
list.

Lucene can implement realtime search as the ability to add, update, or
delete documents with latency in the sub 5 millisecond range.  A couple of
different options are available.

1) Expose a rolling set of realtime readers over the memory index used by
IndexWriter.  Requires incrementally updating field caches and filters, and
is somewhat unclear how IndexReader versioning would work (for example
versions of the term dictionary).
2) Implement realtime search by incrementally creating and merging readers
in memory.  The system would use MemoryIndex or InstantiatedIndex to quickly
(more quickly than RAMDirectory) create indexes from added documents.  The
in memory indexes would be periodically merged in the background and
according to RAM used write to disk.  Each update would generate a new
IndexReader or MultiSearcher that includes the new updates.  Field caches
and filters could be cached per IndexReader according to how Lucene works
today.  The downside of this approach is the indexing will not be as fast as
#1 because of the in memory merging which similar to the Lucene pre 2.3
which merged in memory segments using RAMDirectory.

Are there other implementation options?

A new patch would focus on providing in memory indexing as part of the core
of Lucene.  The work of LUCENE-1483 and LUCENE-1314 would be used.  I am not
sure if option #2 can become part of core if it relies on a contrib module?
It makes sense to provide a new realtime oriented merge policy that merges
segments based on the number of deletes rather than a merge factor.  The
realtime merge policy would keep the segments within a minimum and maximum
size in kilobytes to limit the time consumed by merging which is assumed
would occur frequently.

LUCENE-1313 which includes a transaction log with rollback and was designed
with distributed search and may be retired or the components split out.

Reply via email to