jon-wei commented on issue #5698: Oak: New Concurrent Key-Value Map 
URL: 
https://github.com/apache/incubator-druid/issues/5698#issuecomment-506881097
 
 
   @sanastas 
   
   Thanks for contributing https://github.com/apache/incubator-druid/pull/7676!
   
   I've been thinking about what the path to potentially merging #7676 would 
look like.
   
   In Druid, there are currently two categories of code contributions, and for 
merging consideration, these two categories have different requirements.
     - Core Druid and core extensions
     - Contrib extensions
   
   The requirements for contrib extensions are looser and could roughly be 
described as "reasonable implementation and potentially useful for some use 
cases or experimentation". The contrib extensions aren't actively maintained by 
the Druid committers, and are generally less extensively tested.
   
   It can make sense for a new feature to start out as a contrib extension, and 
potentially migrate to core as it evolves. Examples of this include the Google 
Cloud Storage extension and the ORC format extension, which started out as 
contrib extensions and were recently adopted as core extensions.
   
   For the Oak-based incremental index, this path could make sense as well, but 
Druid does not currently provide an extension point for incremental index 
implementations. To open that as an extension point would first involve 
discussion/consensus on whether it's a good idea to have that extension point, 
and there would also be significant design thought/implementation work required.
   
   Given those difficulties, I think it makes sense to think about the path to 
merging Oak-based incremental index as a core feature. For merging a 
contribution into core, the requirement is essentially: "Convince Druid 
committers such that they are willing to take responsibility for and maintain 
the contribution going forward."
   
   At the highest level, setting aside implementation details, I think it'd be 
helpful to see a comparison of performance metrics between Oak incremental 
index and the existing implementation on a real cluster. 
   
   I would try to set up realistic workloads for native batch ingestion and 
Kafka indexing service ingestion, and gather metrics for the following:
   - Ingestion throughput
   - Query performance (realtime tasks like Kafka indexing service tasks can 
answer queries)
   - Index persist performance
   
   -------------------------
   
   Separately from the incremental index topic, I wonder if OakMap could be 
used as part of Druid's GroupBy V2 query. There is a class called 
`ConcurrentGrouper` which is responsible for grouping/aggregating rows 
off-heap, with concurrent writes. This sounds like an area where OakMap could 
potentially be beneficial. If you're interested, that could be another 
worthwhile avenue for investigation.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to