[jira] [Commented] (LUCENE-8406) Make ByteBufferIndexInput public

Michael Braun (JIRA) Tue, 17 Jul 2018 07:30:12 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546695#comment-16546695
 ]


Michael Braun commented on LUCENE-8406:
---------------------------------------

Would love to have the performance of RAMDirectory improved if possible - when 
experimenting with Luwak (https://github.com/flaxsearch/luwak) CC 
[~romseygeek], there was noticeable performance degradation due to contention 
at the RAMFile level. Would the need for RAMFile be eliminated entirely with 
this?

> Make ByteBufferIndexInput public
> --------------------------------
>
>                 Key: LUCENE-8406
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8406
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 6.7
>
>
> The logic of handling byte buffers splits, their proper closing (cleaner) and 
> all the trickery involved in slicing, cloning and proper exception handling 
> is quite daunting. 
> While ByteBufferIndexInput.newInstance(..) is public, the parent class 
> ByteBufferIndexInput is not. I think we should make the parent class public 
> to allow advanced users to make use of this (complex) piece of code to create 
> IndexInput based on a sequence of ByteBuffers.
> One particular example here is RAMDirectory, which currently uses a custom 
> IndexInput implementation, which in turn reaches to RAMFile's synchronized 
> methods. This is the cause of quite dramatic congestions on multithreaded 
> systems. While we clearly discourage RAMDirectory from being used in 
> production environments, there really is no need for it to be slow. If 
> modified only slightly (to use ByteBuffer-based input), the performance is on 
> par with FSDirectory. Here's a sample log comparing FSDirectory with 
> RAMDirectory and the "modified" RAMDirectory making use of the ByteBuffer 
> input:
> {code}
> 14:26:40 INFO  console: FSDirectory index.
> 14:26:41 INFO  console: Opened with 299943 documents.
> 14:26:50 INFO  console: Finished: 8.820 s, 240000 matches.
> 14:26:50 INFO  console: RAMDirectory index.
> 14:26:50 INFO  console: Opened with 299943 documents.
> 14:28:50 INFO  console: Finished: 2.012 min, 240000 matches.
> 14:28:50 INFO  console: RAMDirectory2 index (wrapped byte[] buffers).
> 14:28:50 INFO  console: Opened with 299943 documents.
> 14:29:00 INFO  console: Finished: 9.215 s, 240000 matches.
> 14:29:00 INFO  console: RAMDirectory2 index (direct memory buffers).
> 14:29:00 INFO  console: Opened with 299943 documents.
> 14:29:08 INFO  console: Finished: 8.817 s, 240000 matches.
> {code}
> Note the performance difference is an order of magnitude on this 32-CPU 
> system (2 minutes vs. 9 seconds). The tiny performance difference between the 
> implementation based on direct memory buffers vs. those acquired via 
> ByteBuffer.wrap(byte[]) is due to the fact that direct buffers access their 
> data via unsafe and the wrapped counterpart uses regular java array access 
> (my best guess).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8406) Make ByteBufferIndexInput public

Reply via email to