Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19565
  
    I'm curious about the performance comparison, if "filter before sample" 
triggers a filter over the whole dataset for each `submitMiniBatch`, then 
there'll be some performance impact.
    
    And if "filter before sample" is used, IMO `miniBatchFraction`  should be 
adjusted proportionally.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to