Done: https://issues.apache.org/jira/browse/LUCENE-3659

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

> -----Original Message-----
> From: DM Smith [mailto:[email protected]]
> Sent: Tuesday, December 20, 2011 4:08 PM
> To: [email protected]
> Subject: Re: Plans to remove RAMDirectory?
> 
> How about an issue to track this? I'd be glad to do it, but I'm not really
the
> "reporter" for it.
> 
> -- DM
> 
> On 12/20/2011 09:51 AM, Shai Erera wrote:
> > Thanks for the clarification Uwe. If the whole idea is a new
> > RAMDirectory implementation, that is more efficient, then it's ok. I
> > think that the ideas you write are interesting.
> >
> > Have you tried MMapDir for read access in comparison to RAMDirectory
> > for a
> >> larger index
> >>
> > I have, and I support the decision not to use RAMDirectory for such
cases.
> > BUT, MMapDir is not recommended for use on all platforms / JDKs.
> > Second, it cannot be used on e.g. HDFS. So sometimes RAMDirectory is
> > the best you can do.
> >
> > Again, if the whole idea is improving RAMDirectory's implementation,
> > then that I totally agree with and it makes sense. My point was that
> > we should not lose the ability to load indexes into RAM.
> >
> > Shai
> >
> > On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler<[email protected]>  wrote:
> >
> >> Hi,****
> >>
> >> ** **
> >>
> >> You misunderstood the whole thing. The idea was to maybe replace
> >> RAMDirectory by a “clone” of MMapDirectory that uses large
> >> DirectByteBuffers outside the JVM heap. The current RAMDirectory is
> >> very limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte
> >> Index in this RAMDirectory, your GC simply drives crazy – we
> >> investigated this several times for customers. RAMDirectory was in
> >> fact several times slower than a simple disk-based MMapDir). Also the
> >> locking on the RAMFile class is horrible, as for large indexes you
> >> have to change buffer several times when seeking/reading/…, which
> >> does heavily locking. In contrast, MMapDir is completely
> >> lock-free!****
> >>
> >> ** **
> >>
> >> Until there is no replacement we will not remove it, but the current
> >> RAMDirectory is not useable for large indexes. That’s a limitation
> >> and the design of this class does not support anything else. It’s
> >> currently unfixable and instead of putting work into fixing it, the
> >> time should be spent in working on a new ByteBuffer-based RAMDir with
> >> larger blocs/blocs that merge or IOContext helping to calculate the
> >> file size before writing it (e.g. when triggering a merge you know
> >> the approximate size of the file before, so you can allocate a buffer
> >> that’s better than 8 Kilobytes). Also directByteBuffer helps to make GC
> happy, as the RAMdir is outside JVM heap.
> >> ****
> >>
> >> ** **
> >>
> >> **Ø  **Also, RAMDirectory is still more efficient than MMapDirectory,
> >> if you want to index (and then search) on a small (sometimes even
> >> transient) amount of data****
> >>
> >> ** **
> >>
> >> That’s not true, as RAMdir uses more time for switching buffers than
> >> reading the data. The proble m is that MMapDir does not support
> >> **writing** and that why we plan to improve this. Have you tried
> >> MMapDir for read access in comparison to RAMDirectory for a larger
> >> index, it outperforms several times (depending on OS and if file data
is in FS
> cache already).
> >> The new directory will simply mimic the MMapIndexInput, add
> >> MMapIndexOutput, but not based on a mmaped buffer, instead a
> >> in-memory (Direct)ByteBuffer (outside or inside JVM heap – both will be
> supported).
> >> This simplifies code a lot.****
> >>
> >> ** **
> >>
> >> The discussions about the limitations of crappy RAMDirectory were
> >> discussed on conferences, sorry. We did **not**decide to remove it
> >> (without a patch/replacement). The whole “message” on the issue was
> >> that RAMDirectory is a bad idea. The recommended approach at the
> >> moment to handle large in-ram directories would be to use a tmpfs on
> >> Linux/Solaris and use MMapDir on top (for larger indexes). The MMap
> >> would then directly map the RAM of the underlying tmpfs.****
> >>
> >> ** **
> >>
> >> Uwe****
> >>
> >> ** **
> >>
> >> -----****
> >>
> >> Uwe Schindler****
> >>
> >> H.-H.-Meier-Allee 63, D-28213 Bremen****
> >>
> >> http://www.thetaphi.de****
> >>
> >> eMail: [email protected]****
> >>
> >> ** **
> >>
> >> *From:* Shai Erera [mailto:[email protected]]
> >> *Sent:* Tuesday, December 20, 2011 2:13 PM
> >> *To:* [email protected]
> >> *Subject:* Plans to remove RAMDirectory?****
> >>
> >> ** **
> >>
> >> Hi
> >>
> >> Uwe mentioned on LUCENE-3653 that there are plans to remove
> >> RAMDirectory from Trunk and move to tests only: "RAMDirectory is
> >> written for tests, not for production use. There are already plans to
> >> remove it from Lucene trunk and move to tests only." (see full
> >> comment<https://issues.apache.org/jira/browse/LUCENE-
> 3653?focusedComm
> >> entId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:c
> >> omment-tabpanel#comment-13172338>
> >> )
> >>
> >> I wasn't aware of such plans - were there emails about it or it has
> >> been discussed on IRC?
> >>
> >> I disagree that RAMDirectory is useful only for tests. For example,
> >> when someone wants to index on Hadoop, RAMDirectory can be very
> >> useful (even though it's not the only solution). Also, RAMDirectory
> >> is still more efficient than MMapDirectory, if you want to index (and
> >> then search) on a small (sometimes even transient) amount of data. We
> >> use it in several cases for such purposes.
> >>
> >> If RAMDirectory needs to improve (for instance, allocate bigger
> >> byte[] chunks), then IMO we should do that, rather than drop it
> >> entirely from core. I think it's a very valuable Directory
> >> implementation that Lucene offers, and I'd hate to see it disappear.
> >>
> >> Shai****
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to