[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-08-26 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-52475
 
 
   Hey, we are going to update this PR soon


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-06-24 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-504968841
 
 
   @b-slim , sure I can provide more details.
   
   As I have mentioned we have run Druid's IncrementalIngestionBenchmark 
(incubator-druid/benchmarks/src/main/java/org/apache/druid/benchmark/indexing/IndexIngestionBenchmark.java)
 with the data/key generator used there. In order to be more sensitive to data 
allocation we have increased the number of rows to 3 million 
(rowsPerSegment=300).
   
   Experiments compare OakIncrementalIndex implemented with two different 
memory allocators: OakNativeAllocator and NettyAllocator. Those are the only 
differences between two Oak implementations. Both allocators allocate from 
off-heap. The results are presented in the two columns compared in the table I 
pasted above. The rows in the table represent how many off-heap memory was 
given to each experiment, for example (-XX:MaxDirectMemorySize=4g). In each 
experiment 3.8GB are written off-heap.
   
   What other details would you like to know?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-06-19 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-503631910
 
 
   Hi Slim (@b-slim ),
   
   We have taken your comment about Netty Allocator seriously and implemented 
and measured OakIncrementalIndex performance based on Netty. The comparison was 
running IncrementalIngestionBenchmark and inserting 3 million rows which is 
about 3.8GGB of off-heap data. We have tried giving different off-heap size 
limits, the result can be seen in the following table.
   
   [Oak Native Allocator vs 
Netty.pdf](https://github.com/apache/incubator-druid/files/3306983/Oak.Native.Allocator.vs.Netty.pdf)
   
   To summarize, Netty Allocator requires up to twice memory and when its 
memory request is satisfied the latency is a bit slower. Looking closer on 
Netty implementation it is understandable, because Netty Allocator Pools hold 
buffers in different (exponentially increasing) sizes. This is reasonable for 
network protocol buffers but less reasonable for data storage.
   
   Bottom line, based on empirical results we would like to continue with the 
primitive OakNativeAllocator which satisfies the Druid needs and gives better 
performance. The experiments with Netty Allocator will continue. Thank you for 
you valuable input!
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-05-22 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-494687467
 
 
   Thank you for your input!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-05-21 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-494413285
 
 
   Here we have some results for 
org.apache.druid.benchmark.indexing.IndexIngestionBenchmark 
(IndexIngestionBenchmark)
   
   We try to insert 3M rows, that should be about 4GB data. We try to give the 
same amount of data for Oak case and for native Druid IncrementalIndex. Pay 
attention that as IndexIngestionBenchmark is written the rows are generated 
prior to the benchmark and hold almost 4GB of on-heap memory anyway. So native 
Druid IncrementalIndex has some advantage, as it just needs to reference 
something already on-heap, but Oak needs to copy and needs additional memory. 
Also many on-heap space goes to StringIndexer and other structures.
   
   
[IngestionOakvsIncrIdx.pdf](https://github.com/apache/incubator-druid/files/3203010/IngestionOakvsIncrIdx.pdf)
   
   Finally, this is single threaded. We see that we can give much more 
advantage in a multithreaded case, which we will describe shortly.
   
   The command lines for reference:
   ```
   java -Xmx15g -XX:MaxDirectMemorySize=0g -jar 
benchmarks/target/benchmarks.jar IndexIngestionBenchmark -p 
rowsPerSegment=300 -p indexType=onheap
   java -Xmx9g -XX:MaxDirectMemorySize=6g -jar benchmarks/target/benchmarks.jar 
IndexIngestionBenchmark -p rowsPerSegment=300 -p indexType=oak
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-05-21 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-494300561
 
 
   Hi Slim (@b-slim ),
   
   I am impressed with such a thorough review! Thanks! It looks like the issue 
of lock-free programming and lock-free memory management is as close to your 
heart as it is close to mine! This is the reason I did PhD in this field :) I 
am glad to find an ally! What was your PhD thesis about? I have some answers 
below. If I understand something wrongly, I would be happy to hear and learn.
   
   > that is simple not true, some of those locks are part of allocate calls 
and some are part of iterator loop to do some stuff that i doubt it is static 
...
   
   I would be happy to speak with you about internal Oak implementation. As it 
is time consuming to discuss it via chat, I suggest us to arrange us an 
off-line discussion via Hangouts for example. Let me know what time-slots suit 
you.
   
   > Looking at the allocator briefly i see bunch of red flags, like possible 
internal fragmentations up to factor 2 and not clear way to deal with external 
fragmentations … Memory Management is a very hard thing in general and make it 
lock free is not going to be simple task  
   
   We're very well aware that the full lock-free Memory Management is a hard 
thing and actually not available yet. However, the Druid's usage of 
IncrementalIndex is a restricted one, IMHO. For example, there are no 
concurrent deletions, thus dealing with internal fragmentation should not be an 
issue. Dealing with external fragmentation might be an issue of tuning per 
specific Block size and common workloads. Bottom line, we had no intent to 
write a perfect general-purpose full lock-free Memory Management, but a 
specialized memory management. Not to say that it is not final and can be 
fixed/replaced if some objective issues are found. 
   
   > Writing one form scratch is a common source of bugs, In fact most of the 
big data projects avoid this task and use the Netty allocator.
   
   As you have seen our memory manager and memory allocator is quite 
restricted. So we believe it can perform faster and it can be easier to find 
bugs there. Thanks for referring to the Netty Allocator. I have taken a brief 
look on Netty Allocator, it appears to be based on reference-counting and their 
underlying data structure is their own ByteBuf (not ByteBuffer). It is assumed 
that epoch-based GC is more efficient in performance than reference-counting. 
But mostly important, I am not sure we can convert ByteBuf into ByteBuffer 
without any complications or performance issues. Nor can we require Druid to 
start working with Netty ByteBufs. However, we will continue investigation and 
if Netty Allocator can be used and gives a better performance, no problems to 
switch to it.
   
   > The take our of this, current state is not lock free thus lets avoid 
confusion, (FYI even on the oak github page i do not see it called lock-free so 
not sure why you are calling it lock free).
   
   Generally speaking lock-free is about the data structure not about the 
underlying memory management. As a full lock-free memory management 
(GC+allocation) isn’t available yet, not any of currently existing lock-free 
data structures can be called lock-free and neither their performance results 
are valid… Huge amount of scientific papers wasted :) .  But I will remove the 
"lock-free” term from anywhere it appears here.
   
   As I said, I am a big fan of lock-free programming, and I will be the first 
one to fight to eliminate any lock anywhere. But in reality what does matter is 
the final performance results and in order to see gains from lock-free 
programming we need big contention of multi-threading, is it the case in 
Druid’s IncrementalIndex? All the benchmarks are single threaded…
   
   I am OK with lock not on the hot path and with lock that doesn't impact the 
performance at all.
   
   > But the most important thing to add here is how OKA memory management will 
fit with how Druid manages Direct memory.
   
   Exactly! This is the mostly important! And here we need your input! How does 
Druid manages Direct memory? Is there some policy? Documentation to read? 
Pointers to code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-05-21 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-494267528
 
 
   @fjy , thanks for being interested, we are going to publish some Druid 
benchmarks soon


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] sanastas commented on issue #7676: Add OakIncrementalIndex to Druid

2019-05-18 Thread GitBox
sanastas commented on issue #7676: Add OakIncrementalIndex to Druid
URL: https://github.com/apache/incubator-druid/pull/7676#issuecomment-493675946
 
 
   Hi @b-slim ,
   
   First, it is great you are taking a look :) , thanks!
   
   Second, we are trying now different Memory Allocators and Memory Managers. 
The implementation is not final, so we allow ourselves to make it easier and 
just to check the performance. The locks you see are for singleton 
initialization just upon a start. Most-likely, Memory Allocator/Pool will not 
continue to be singleton.
   Anyway, there are currently no locks on the hot path, and no locks are going 
to be there. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org