Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 The failing tests (in `DecisionTreeSuite`) fail because we've historically handled a) splits that have 0 gain differently from b) splits that fail to achieve user-specified minimum gain (`metadata.minInfoGain`) or don't meet minimum instance-counts per node (`metadata.minInstancesPerNode`). Previously we'd create a leaf node with valid impurity stats in case a) and invalid impurity stats in case b). This PR creates a leaf node with invalid impurity stats in both cases. As a fix I'd suggest creating a `LeafNode` with correct impurity stats in case a), but with the `stats.valid` member set to `false` to indicate that the node should not be split.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org