[GitHub] [incubator-druid] Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches.
Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches. URL: https://github.com/apache/incubator-druid/issues/8126#issuecomment-518356531 Thanks, that was our impression - that off-heap incremental index is not operational (however it does exist in the code). So, indeed there is no way to compare to it. I also agree that doing the oak-sketches-druid integration in one step might be too complicated. We already have an open issue #5698 and a PR #7676 for getting Oak incremental index into Druid and we hope to get progress there soon. Oak is not based on Memory , yet :). The context of my suggestion is how to support growable sketches off-heap while ingesting data in druid - namely, in the context of Oak. So the order should be first integrating Oak, then having Oak support growable sketches. If I now understand correctly the purpose of the current proposal is to handle the queries aggregation problem. Oak might be a solution also for this problem but this is something we haven't looked at yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches.
Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches. URL: https://github.com/apache/incubator-druid/issues/8126#issuecomment-517983639 > [off-heap incremental index] It's not used more widely, including during the data ingestion specifically because of the unsolved problem with growable complex aggregations - this is what this proposal is all about. Then presenting a solution to this problem using Oak to manage the off-heap index, handling growable sketches using Memory, and showing that it performs as good as or better than the existing implementation would be a win-win-win solution, correct? But I understand that it may not cover the entire scope of this issue, namely queries aggregation then I think the best thing to do would be to open a new issue for it. BTW, if the current off-heap solution is operational we can compare against it in the current system (cluster) benchmarks that we are running, and compare performance (without sketches at this point). From what I know last time we tried to evaluate the off-heap incremental index through component level test it crashed and we were told it is not properly maintained. Nevertheless, we will try running it in cluster mode. Any documentation on how we should configure the cluster to allow it running in off-heap mode? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches.
Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches. URL: https://github.com/apache/incubator-druid/issues/8126#issuecomment-517273629 @himanshug - What is the context of the current proposal? Does it refers to aggregators that are used only in the context of queries, when querying immutable segments that are cached off-heap? Does it also cover off-heap incremental index roll-up aggregation? If it is the first then it is reasonable to define an API for external memory allocator, if it is the second then what I am suggesting is more relevant. Note that Oak is considered as a core contrib and not extension, and is proposed as an efficient alternative for the existing off-heap incremental index. Which brings me back to my previous question - does the current off-heap incremental index considered operational, with reasonable performance? I think that by introducing a new writable memory aggregator we avoid backward compatibility issues, do we not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches.
Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches. URL: https://github.com/apache/incubator-druid/issues/8126#issuecomment-516737638 2 naive question - 1) Does buffer aggregators used by on-heap incremental index or only by off-heap incremental index? 2) If the answer to (1) is only off-heap incremental index, then does off-heap incremental index being used in production anywhere? does it perform well enough? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches.
Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches. URL: https://github.com/apache/incubator-druid/issues/8126#issuecomment-516736029 Thanks Roman. #3892 is a very long issue that splits into multiple discussions covering many different things, so I am not sure what is the bottom line. Also it has not been discussed over the last year. Can you summarize any progress made wrt Memory aggregator if any. If it is blocked then what is the reason - is it community rejection due to backward compatibility? fear of performance degradation? or simply lack of working hands? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [incubator-druid] Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches.
Eshcar commented on issue #8126: [Proposal] BufferAggregator support for growable sketches. URL: https://github.com/apache/incubator-druid/issues/8126#issuecomment-516411997 As part of the work we are doing towards integrating Oak (off-heap based incremental index) integration into druid #5698, we invested some thought on how to bridge the gap between Oak--with its internal memory management, off-heap sketches--based on WritableMemory, and druid aggregators. I can share our thoughts, they aim to handle the same problems raised in this issue however the solution is different, hence it might be better to introduce it in a different issue. In a nutshell, (1) Oak manages its memory and needs all allocations of buffers to go through the internal memory manager and only be exposed through Oak's API. (2) Off-heap sketches are based on WritableMemory, and can work with external memory manager that can allocate new WritableMemory when the sketch needs to grow (3) Druid aggregators access sketches through the aggregator API (init, aggregate, get) and a mapping from bytebuffer,position -> sketch What we suggest is to have oak manage the memory, including re-allocation of space when needed. Oak will implement its own WritebleMemory and MemoryRequestServer that are needed for a correct behaviour of the sketches wrt Oak index. Finally, we suggest to have a new Aggregator type - WritableMemoryAggregator that maps WritableMemory to sketch and can work the same way as buffer aggregators are working, and it does not need to worry about growing size of sketches. There might be other alternatives for closing this loop; let's discuss them. Does all this make sense @leventov @himanshug ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org