Is there something that I am missing? I see lots of references to using "memory mapped" files to "dramatically" improve performance.

I don't think this is the case at all. At the lowest levels, it is somewhat more efficient from a CPU standpoint, but with a decent OS cache the IO performance difference is going to negligible.

The primary benefit of memory mapped files is simplicity in code (although in Java there is another layer needed - think C ), and the file can be treated as a random accessible memory array.

From my OS design experience, the page at http://en.wikipedia.org/ wiki/Memory-mapped_file is incorrect.

Even if the memory mapped file is mapped into the virtual memory space, unless you specialized memory controllers and disk systems, when a page fault occurs, the OS loads the page just as any other.

The difference with direct IO, is that there is first a simple translation from position to disk page, and the OS disk page cache is checked. Almost exactly the same thing occurs with a memory mapped file.

The memory addressed is accessed, if not in memory, a page fault occurs, and the page is loaded from the file (it may be loaded from the OS disk cache in this process).

The point being, if the page is not in the cache (which is probably the case with a large index), the time to load the page is far greater than the difference between the IO address translation and the memory address lookup.

If all of the pages of the index can fit in memory, a properly configured system is going to have them in the page cache anyway....



On Dec 23, 2008, at 8:22 PM, Marvin Humphrey wrote:

On Tue, Dec 23, 2008 at 05:51:43PM -0800, Jason Rutherglen wrote:

Are there other implementation options?

Here's the plan for Lucy/KS:

1) Design index formats that can be memory mapped rather than slurped,
     bringing the cost of opening/reopening an IndexReader down to a
     negligible level.
  2) Enable segment-centric sorted search. (LUCENE-1483)
  3) Implement tombstone-based deletions, so that the cost of deleting
documents scales with the number of deletions rather than the size of the
     index.
4) Allow 2 concurrent writers: one for small, fast updates, and one for
     big background merges.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to