Re: Realtime Search

robert engels Tue, 23 Dec 2008 18:36:58 -0800

Is there something that I am missing? I see lots of references tousing "memory mapped" files to "dramatically" improve performance.

I don't think this is the case at all. At the lowest levels, it issomewhat more efficient from a CPU standpoint, but with a decent OScache the IO performance difference is going to negligible.

The primary benefit of memory mapped files is simplicity in code(although in Java there is another layer needed - think C ), and thefile can be treated as a random accessible memory array.

From my OS design experience, the page at http://en.wikipedia.org/wiki/Memory-mapped_file is incorrect.

Even if the memory mapped file is mapped into the virtual memoryspace, unless you specialized memory controllers and disk systems,when a page fault occurs, the OS loads the page just as any other.

The difference with direct IO, is that there is first a simpletranslation from position to disk page, and the OS disk page cache ischecked. Almost exactly the same thing occurs with a memory mapped file.

The memory addressed is accessed, if not in memory, a page faultoccurs, and the page is loaded from the file (it may be loaded fromthe OS disk cache in this process).

The point being, if the page is not in the cache (which is probablythe case with a large index), the time to load the page is fargreater than the difference between the IO address translation andthe memory address lookup.

If all of the pages of the index can fit in memory, a properlyconfigured system is going to have them in the page cache anyway....




On Dec 23, 2008, at 8:22 PM, Marvin Humphrey wrote:

On Tue, Dec 23, 2008 at 05:51:43PM -0800, Jason Rutherglen wrote:

Are there other implementation options?


Here's the plan for Lucy/KS:

1) Design index formats that can be memory mapped rather thanslurped,

     bringing the cost of opening/reopening an IndexReader down to a
     negligible level.
  2) Enable segment-centric sorted search. (LUCENE-1483)
  3) Implement tombstone-based deletions, so that the cost of deleting

documents scales with the number of deletions rather than thesize of the

     index.

4) Allow 2 concurrent writers: one for small, fast updates, andone for

     big background merges.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Realtime Search

Reply via email to