[jira] [Commented] (LUCENE-8438) RAMDirectory speed improvements and cleanup

Robert Muir (JIRA) Mon, 06 Aug 2018 04:09:15 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570052#comment-16570052
 ]


Robert Muir commented on LUCENE-8438:
-------------------------------------

I do think the new code looks pretty clean and I like the additional checks in 
the code (which was desperately needed), but i have some concerns about the 
timing. How to reduce the risks here release-wise? 

I'm pretty much against committing this to trunk and then immediately trying to 
start spinning up lucene 8.0. The problem is this code has its tentacles in 
everything: a bug in this thing will impact far more than just windows users 
who can't use mmap over tmpfs :) Core codecs etc are using little 
ramoutputstreams here and there for various crap. 

We need a strategy to reduce the risks here for so many changes to o.a.l.store 
code. And we should honestly discuss whether the tradeoffs are the right ones. 
For Lucene 8 which Adrien wants to work on soon, i would rather us just tell 
users to use mmap over tmpfs and not have corruption. 


> RAMDirectory speed improvements and cleanup
> -------------------------------------------
>
>                 Key: LUCENE-8438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8438
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: capture-4.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and 
> even if we discourage its use in favor of native (mmapped) buffers, there 
> seem to be benefits of keeping RAMDirectory available (quick throw-away 
> indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The 
> implementation is also open for all sorts of abuses – the streams can be 
> reset and are used all around the place as temporary buffers, even without 
> the presence of RAMDirectory itself. This complicates the implementation and 
> is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load, 
> consider this PoC pseudo-benchmark. It creates a single monolithic segment 
> with 500K very short documents (single field, with norms). The index is ~60MB 
> once created. We then run semi-complex Boolean queries on top of that index 
> from N concurrent threads. The attached capture-4 shows the result (queries 
> per second over 5-second spans) for a varying number of concurrent threads on 
> an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 
> 16 hyper-threaded). That red line at the bottom (which drops compared to a 
> single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an 
> alternative implementation I wrote that uses ByteBuffers. Yes, it's slower 
> than the native mmapped implementation, but a *lot* faster then the current 
> RAMDirectory (and more GC-friendly because it uses dynamic progressive block 
> scaling internally).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8438) RAMDirectory speed improvements and cleanup

Reply via email to