yuanlihan commented on issue #11256:
URL: https://github.com/apache/druid/issues/11256#issuecomment-841112760


   > > Adding a new Reservoir Sampling method to sample K elements each time 
instead of only one element each time.
   > 
   > I'm not sure how this can improve the performance. Does the new sampling 
method need to loop to sample K segments anyway? I guess I'm probably missing 
something. Would you please add more details on the proposed changes?
   
   Thanks for having a look at this. The default implementation samples only 
one segment in an iteration of all segments. Let's assume that:
   
   - the list of server holders contain 1 million segments
   - 1000 segments need to be picked up from these server holders
   
   Then the current implementation needs to call the sampling method 1000 times 
and each time needs to iterate 1 million segments. 
   I found that the Reservoir Sampling actually can sample K elements a single 
pass over the items. So in this case, the new method can sample 1000 segments 
in a method invocation, which means it needs to iterate 1 million segments only 
once.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to