[ https://issues.apache.org/jira/browse/LUCENE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546695#comment-16546695 ]
Michael Braun commented on LUCENE-8406: --------------------------------------- Would love to have the performance of RAMDirectory improved if possible - when experimenting with Luwak (https://github.com/flaxsearch/luwak) CC [~romseygeek], there was noticeable performance degradation due to contention at the RAMFile level. Would the need for RAMFile be eliminated entirely with this? > Make ByteBufferIndexInput public > -------------------------------- > > Key: LUCENE-8406 > URL: https://issues.apache.org/jira/browse/LUCENE-8406 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Fix For: 6.7 > > > The logic of handling byte buffers splits, their proper closing (cleaner) and > all the trickery involved in slicing, cloning and proper exception handling > is quite daunting. > While ByteBufferIndexInput.newInstance(..) is public, the parent class > ByteBufferIndexInput is not. I think we should make the parent class public > to allow advanced users to make use of this (complex) piece of code to create > IndexInput based on a sequence of ByteBuffers. > One particular example here is RAMDirectory, which currently uses a custom > IndexInput implementation, which in turn reaches to RAMFile's synchronized > methods. This is the cause of quite dramatic congestions on multithreaded > systems. While we clearly discourage RAMDirectory from being used in > production environments, there really is no need for it to be slow. If > modified only slightly (to use ByteBuffer-based input), the performance is on > par with FSDirectory. Here's a sample log comparing FSDirectory with > RAMDirectory and the "modified" RAMDirectory making use of the ByteBuffer > input: > {code} > 14:26:40 INFO console: FSDirectory index. > 14:26:41 INFO console: Opened with 299943 documents. > 14:26:50 INFO console: Finished: 8.820 s, 240000 matches. > 14:26:50 INFO console: RAMDirectory index. > 14:26:50 INFO console: Opened with 299943 documents. > 14:28:50 INFO console: Finished: 2.012 min, 240000 matches. > 14:28:50 INFO console: RAMDirectory2 index (wrapped byte[] buffers). > 14:28:50 INFO console: Opened with 299943 documents. > 14:29:00 INFO console: Finished: 9.215 s, 240000 matches. > 14:29:00 INFO console: RAMDirectory2 index (direct memory buffers). > 14:29:00 INFO console: Opened with 299943 documents. > 14:29:08 INFO console: Finished: 8.817 s, 240000 matches. > {code} > Note the performance difference is an order of magnitude on this 32-CPU > system (2 minutes vs. 9 seconds). The tiny performance difference between the > implementation based on direct memory buffers vs. those acquired via > ByteBuffer.wrap(byte[]) is due to the fact that direct buffers access their > data via unsafe and the wrapped counterpart uses regular java array access > (my best guess). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org