Github user smurching commented on the issue:

    https://github.com/apache/spark/pull/19433
  
    The failing tests (in `DecisionTreeSuite`) fail because we've historically 
handled
    
    a) splits that have 0 gain 
    
    differently from
    
    b) splits that fail to achieve user-specified minimum gain 
(`metadata.minInfoGain`) or don't meet minimum instance-counts per node 
(`metadata.minInstancesPerNode`).
    
    Previously we'd create a leaf node with valid impurity stats in case a) and 
invalid impurity stats in case b). This PR creates a leaf node with invalid 
impurity stats in both cases.
    
    As a fix I'd suggest creating a `LeafNode` with correct impurity stats in 
case a), but with the `stats.valid` member set to `false` to indicate that the 
node should not be split.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to