sanastas commented on issue #5698: Oak: New Concurrent Key-Value Map 
URL: 
https://github.com/apache/incubator-druid/issues/5698#issuecomment-503995370
 
 
   @jihoonson please take a look on the following proposal draft!
   
   PROPOSAL
   
   ### Motivation
   
   The following are some imperfections of current Druid IncrementalIndex:
   1. No possibility to take the Incremental Index data off-heap. Off-heap 
memory allows working with bigger memory chunks in RAM without additional 
performance degradation.
   2. Big on-heap memory foot-print. A lot of objects vs serialization.
   3. Frequent and slow segment merges.
   4. Low multi-thread scalability of queries
   
   ### Proposed changes
   
   Oak (Off-heap Allocated Keys) is a scalable concurrent KV-map for real-time 
analytics, which has all its keys and values kept in the off-heap memory. Oak 
is faster and scales better with the number of CPU cores than popular 
NavigableMap implementations, e.g., Doug Lea’s ConcurrentSkipListMap (Java’s 
default).
   We suggest to integrate Oak-based Incremental Index as an alternative to 
currently existing Druid’s Incremental Index. Because Oak is naturally built 
for off-heap memory allocation, allows usage of much bigger RAM, has greater 
concurrency support, and should show better performance results.
   
   ### Rationale
   
   A discussion of why this particular solution is the best one. One good way 
to approach this is to discuss other alternative solutions that you considered 
and decided against. This should also include a discussion of any specific 
benefits or drawbacks you are aware of.
   
   Let’s follow the initial motivation points:
   
   1. Taking data off-heap allows:
   - Working with bigger data without paying the cost of JVM GC slowdown
   - Oak allows writing the data off-heap (even with additional copy to 
off-heap) while increasing the ingestion speed significantly
   
   2. When working with IncrementalIndex Druid benchmarks and ingesting (for 
example) 2GB of data, we have encountered that we need to give 10GB more (!) in 
order to see proper results. Please see the IncrementalIngestionBenchmark 
results below. OakMap serialize the keys and the values (no object overheads). 
OakMap metadata is modest and it mostly comes as arrays (again no object 
overheads). Additionally, OakMap can estimate its off-heap usage precisely.
   3. Segment merge in Druid is quite expensive. Segments are merged after (not 
big) IncrementalIndexes are flushed to disk and then need to be combined. Oak 
allows to keep more data in-memory giving the same (and sometimes much better) 
performance. Therefore less flushes to disk (less files requiring merge) may 
appear. As a future work to OakMaps can be merged while still in-memory.
   4. Experimenting with IncrementalIndexReads are still working in progress. 
However, comparing the multithreading scalability of OakMap with 
ConcurrentSkipListMap we see that OakMap scales better. So we believe to see 
that scalability also in Oak-based IncrementalIndex. 
   
   ### Operational impact
   
   The suggested OakIncrementalIndex should live alongside the current 
solution, giving the ability to switch to OakIncrementalIndex when preferable. 
As OakIncrementalIndex doesn’t affect the on disk layout, we do not foresee any 
specific operational impact.
   
   ### Test plan (optional)
   
   The plan is to add a specific unit-test for OakIncrementalIndex and to pass 
all existing tests for cluster based on OakIncrementalIndex.
   
   ### Future work (optional)
   
   As StringIndexer also takes a significant part of IncrementalIndex we might 
take also this map off-heap.
   
   ### IncrementalIngestionBenchmark results
   
   In attached files please see the results of IncrementalIngestionBenchmark 
when 2 million vs 3 million of rows are inserted. It is done with native 
Druid’s incremental index and with newly suggested OakIncrementalIndex (data 
taken off-heap). Druid’s incremental index always gets 12GB on heap memory. 
OakIncrementalIndex always gets 8GB on-heap and 4GB off-heap (in total 12GB 
RAM). Please note that the rows come prepared before the benchmark measurement 
and actually already take big chunk of on-heap memory. The graphs show 
throughput (number of operations in seconds) so the bigger the better. In all 
results  OakIncrementalIndex show better performance. When moving from 2M rows 
to 3M rows (using same memory limits) Oak shows about 10% performance 
degradation while IncrementalIndex shows about 60% degradation.
   
   [Ingestion Throughput              2M rows (2.5GB data) 
ingested.pdf](https://github.com/apache/incubator-druid/files/3310244/Ingestion.Throughput.2M.rows.2.5GB.data.ingested.pdf)
   [Ingestion Throughput      3M rows (3.8GB data) 
ingested.pdf](https://github.com/apache/incubator-druid/files/3310245/Ingestion.Throughput.3M.rows.3.8GB.data.ingested.pdf)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to