Thanks for the clarification Uwe. If the whole idea is a new RAMDirectory
implementation, that is more efficient, then it's ok. I think that the
ideas you write are interesting.

Have you tried MMapDir for read access in comparison to RAMDirectory for a
> larger index
>

I have, and I support the decision not to use RAMDirectory for such cases.
BUT, MMapDir is not recommended for use on all platforms / JDKs. Second, it
cannot be used on e.g. HDFS. So sometimes RAMDirectory is the best you can
do.

Again, if the whole idea is improving RAMDirectory's implementation, then
that I totally agree with and it makes sense. My point was that we should
not lose the ability to load indexes into RAM.

Shai

On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler <[email protected]> wrote:

> Hi,****
>
> ** **
>
> You misunderstood the whole thing. The idea was to maybe replace
> RAMDirectory by a “clone” of MMapDirectory that uses large
> DirectByteBuffers outside the JVM heap. The current RAMDirectory is very
> limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte Index in
> this RAMDirectory, your GC simply drives crazy – we investigated this
> several times for customers. RAMDirectory was in fact several times slower
> than a simple disk-based MMapDir). Also the locking on the RAMFile class is
> horrible, as for large indexes you have to change buffer several times when
> seeking/reading/…, which does heavily locking. In contrast, MMapDir is
> completely lock-free!****
>
> ** **
>
> Until there is no replacement we will not remove it, but the current
> RAMDirectory is not useable for large indexes. That’s a limitation and the
> design of this class does not support anything else. It’s currently
> unfixable and instead of putting work into fixing it, the time should be
> spent in working on a new ByteBuffer-based RAMDir with larger blocs/blocs
> that merge or IOContext helping to calculate the file size before writing
> it (e.g. when triggering a merge you know the approximate size of the file
> before, so you can allocate a buffer that’s better than 8 Kilobytes). Also
> directByteBuffer helps to make GC happy, as the RAMdir is outside JVM heap.
> ****
>
> ** **
>
> **Ø  **Also, RAMDirectory is still more efficient than MMapDirectory, if
> you want to index (and then search) on a small (sometimes even transient)
> amount of data****
>
> ** **
>
> That’s not true, as RAMdir uses more time for switching buffers than
> reading the data. The proble m is that MMapDir does not support **writing**
> and that why we plan to improve this. Have you tried MMapDir for read
> access in comparison to RAMDirectory for a larger index, it outperforms
> several times (depending on OS and if file data is in FS cache already).
> The new directory will simply mimic the MMapIndexInput, add
> MMapIndexOutput, but not based on a mmaped buffer, instead a in-memory
> (Direct)ByteBuffer (outside or inside JVM heap – both will be supported).
> This simplifies code a lot.****
>
> ** **
>
> The discussions about the limitations of crappy RAMDirectory were
> discussed on conferences, sorry. We did **not**decide to remove it
> (without a patch/replacement). The whole “message” on the issue was that
> RAMDirectory is a bad idea. The recommended approach at the moment to
> handle large in-ram directories would be to use a tmpfs on Linux/Solaris
> and use MMapDir on top (for larger indexes). The MMap would then directly
> map the RAM of the underlying tmpfs.****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> H.-H.-Meier-Allee 63, D-28213 Bremen****
>
> http://www.thetaphi.de****
>
> eMail: [email protected]****
>
> ** **
>
> *From:* Shai Erera [mailto:[email protected]]
> *Sent:* Tuesday, December 20, 2011 2:13 PM
> *To:* [email protected]
> *Subject:* Plans to remove RAMDirectory?****
>
> ** **
>
> Hi
>
> Uwe mentioned on LUCENE-3653 that there are plans to remove RAMDirectory
> from Trunk and move to tests only: "RAMDirectory is written for tests, not
> for production use. There are already plans to remove it from Lucene trunk
> and move to tests only." (see full 
> comment<https://issues.apache.org/jira/browse/LUCENE-3653?focusedCommentId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13172338>
> )
>
> I wasn't aware of such plans - were there emails about it or it has been
> discussed on IRC?
>
> I disagree that RAMDirectory is useful only for tests. For example, when
> someone wants to index on Hadoop, RAMDirectory can be very useful (even
> though it's not the only solution). Also, RAMDirectory is still more
> efficient than MMapDirectory, if you want to index (and then search) on a
> small (sometimes even transient) amount of data. We use it in several cases
> for such purposes.
>
> If RAMDirectory needs to improve (for instance, allocate bigger byte[]
> chunks), then IMO we should do that, rather than drop it entirely from
> core. I think it's a very valuable Directory implementation that Lucene
> offers, and I'd hate to see it disappear.
>
> Shai****
>

Reply via email to