[jira] [Commented] (LUCENE-8438) RAMDirectory speed improvements and cleanup

Dawid Weiss (JIRA) Mon, 06 Aug 2018 05:45:20 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570164#comment-16570164
 ]


Dawid Weiss commented on LUCENE-8438:
-------------------------------------

Jeez, Robert, you're fast. I was writing some follow-up intro to the patch 
(because it is quite large). 
{quote}bq. I'm pretty much against committing this to trunk and then 
immediately trying to start spinning up lucene 8.0. The problem is this code 
has its tentacles in everything: a bug in this thing will impact far more than 
just windows users who can't use mmap over tmpfs :)
{quote}
I absolutely agree with you. I'm in no pressure of time – this PR is for review 
and discussion. I'll be happy to keep cleaning things up (an they pop up as I 
go), but I didn't want to dig myself a hole people will reject.

If we do decide to deprecate RAMDirectory (instead of changing its 
implementation) then I'd add deprecation markers to Lucene 8.0 though (without 
introducing this patch in full). 

> RAMDirectory speed improvements and cleanup
> -------------------------------------------
>
>                 Key: LUCENE-8438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8438
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: capture-4.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and 
> even if we discourage its use in favor of native (mmapped) buffers, there 
> seem to be benefits of keeping RAMDirectory available (quick throw-away 
> indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The 
> implementation is also open for all sorts of abuses – the streams can be 
> reset and are used all around the place as temporary buffers, even without 
> the presence of RAMDirectory itself. This complicates the implementation and 
> is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load, 
> consider this PoC pseudo-benchmark. It creates a single monolithic segment 
> with 500K very short documents (single field, with norms). The index is ~60MB 
> once created. We then run semi-complex Boolean queries on top of that index 
> from N concurrent threads. The attached capture-4 shows the result (queries 
> per second over 5-second spans) for a varying number of concurrent threads on 
> an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 
> 16 hyper-threaded). That red line at the bottom (which drops compared to a 
> single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an 
> alternative implementation I wrote that uses ByteBuffers. Yes, it's slower 
> than the native mmapped implementation, but a *lot* faster then the current 
> RAMDirectory (and more GC-friendly because it uses dynamic progressive block 
> scaling internally).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8438) RAMDirectory speed improvements and cleanup

Reply via email to