Joseph K. Bradley created SPARK-3934:
----------------------------------------

             Summary: RandomForest bug in sanity check in DTStatsAggregator
                 Key: SPARK-3934
                 URL: https://issues.apache.org/jira/browse/SPARK-3934
             Project: Spark
          Issue Type: Bug
          Components: MLlib
            Reporter: Joseph K. Bradley


When run with a mix of unordered categorical and continuous features, on 
multiclass classification, RandomForest fails.  The bug is in the sanity checks 
in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices 
for checking whether features are unordered.

Proposal: Remove the sanity checks since they are not really needed, and since 
they would require DTStatsAggregator to keep track of an extra set of indices 
(for the feature subset).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to