[ https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dawid Weiss resolved LUCENE-8438. --------------------------------- Resolution: Fixed Fix Version/s: master (9.0) > RAMDirectory speed improvements and cleanup > ------------------------------------------- > > Key: LUCENE-8438 > URL: https://issues.apache.org/jira/browse/LUCENE-8438 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Fix For: master (9.0) > > Attachments: capture-1.png, capture-4.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > RAMDirectory screams for a cleanup. It is used and abused in many places and > even if we discourage its use in favor of native (mmapped) buffers, there > seem to be benefits of keeping RAMDirectory available (quick throw-away > indexes without the need to setup external tmpfs, for example). > Currently RAMDirectory performs very poorly under concurrent loads. The > implementation is also open for all sorts of abuses – the streams can be > reset and are used all around the place as temporary buffers, even without > the presence of RAMDirectory itself. This complicates the implementation and > is pretty confusing. > An example of how dramatically slow RAMDirectory is under concurrent load, > consider this PoC pseudo-benchmark. It creates a single monolithic segment > with 500K very short documents (single field, with norms). The index is ~60MB > once created. We then run semi-complex Boolean queries on top of that index > from N concurrent threads. The attached capture-4 shows the result (queries > per second over 5-second spans) for a varying number of concurrent threads on > an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, > 16 hyper-threaded). That red line at the bottom (which drops compared to a > single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an > alternative implementation I wrote that uses ByteBuffers. Yes, it's slower > than the native mmapped implementation, but a *lot* faster then the current > RAMDirectory (and more GC-friendly because it uses dynamic progressive block > scaling internally). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org