a2l007 opened a new issue #11133:
URL: https://github.com/apache/druid/issues/11133


   ### Affected Version
   
   0.17 and higher
   
   ### Description
   
   For Druid clusters with parallel merge enabled on the brokers, query 
performance can degrade for sketch based queries. These queries can exhibit 30% 
- 100% degradation in performance, depending on the value set for 
`druid.processing.merge.pool.parallelism` and the number of sequences that 
needs to be merged and combined . With a smaller parallelism setting however, 
the performance is slightly better than running it single-threaded. 
   Setting `druid.processing.merge.pool.parallelism` to 3 gave the best 
performance (~10% faster than the serial merge). However, increasing the 
parallelism beyond 3 is when we start to see degradation mentioned earlier.
   
   I compared the flame graphs for a query on the broker toggling the merge 
parallelism and it looks like the parallel merge does a lot more combine 
operations on [Theta 
Union](https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/theta/Union.html)
 objects compared to its serial counterpart. 
   
   <img width="1660" alt="Screen Shot 2021-04-19 at 3 21 59 PM" 
src="https://user-images.githubusercontent.com/4603202/115302880-8827b700-a128-11eb-80be-e11e8075710d.png";>
   
   Combining two union objects involves [compacting one of them into a sketch 
before updating it to the second 
union](https://github.com/apache/druid/blob/fdc3c2f36281059c020df1d4e74eb940d4c922e8/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchHolder.java#L149-L150).
 This process is relatively slower as Theta union is designed to optimize merge 
efficiency for large number of sketch merges. 
   
   The first layer of the parallel merge process combines partitions of 
sequences and for sketch aggregators this generates Union objects  which are 
pushed into the intermediate blocking queues for the second layer.
   The `MergeCombineAction` in the second layer runs on a single thread and 
this has to do a bunch of union merge operations based on the number of 
intermediate blocking queues it has to read from. Therefore, higher the 
parallelism, more the union merge operations that the `MergeCombineAction` has 
to perform and worse the performance. 
   
   I don’t have a fix for this atm, but documenting this here for reference in 
case someone else runs into the same problem. If datasketches had a way to 
merge uncompacted unions, it could help alleviate this problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to