Github user sethah commented on the issue: https://github.com/apache/spark/pull/19106 Ok, I guess I'm surprised that someone even noticed this... So, basically, we are changing the behavior of a private function for a specific case which is actually impossible to ever run into. I don't see the need. I know @jkbradley mentioned that it might happen for linear models, but this method is only used for RandomForest and DecisionTree. @srowen mentioned that finding the root cause would be better, and seems to me the root cause is: there is no root cause. Anyway, if we are analyzing the behavior of this method without considering the context from which it's called, then we need to test for not only all zero probabilities but negative ones as well. Otherwise, we're shoring up one edge case without attending to others. If it is not currently possible to call this method with all zero counts, and we don't want it to be possible ever in the future, why don't we just throw an error? I'm also fine with not changing this at all.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org