[ https://issues.apache.org/jira/browse/SPARK-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336132#comment-15336132 ]
Taylor Baldwin commented on SPARK-15995: ---------------------------------------- Will be closing this issue. Found everything we need in Boosting Strategy. Was unaware of separate contracts for Gradient Boost Trees and Random Forest / Decision Trees. > Gradient Boosted Trees - handling of Categorical Inputs > ------------------------------------------------------- > > Key: SPARK-15995 > URL: https://issues.apache.org/jira/browse/SPARK-15995 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.6.1 > Reporter: Taylor Baldwin > > Gradient Boosted trees appear to handle all inputs as continuous, or at least > ordered, values. The trees returned in the Gradient Boosted model have nodes > for categorical values containing a split that operates on the threshold not > the categories value. This treats categorical values as if the ordering of > the values is significant, which is not reasonable to assume. > Both Random Forest and Decision Trees accept the map for categorical features > info, while Gradient Boosted trees do not. Random Forest and Decision trees > provide nodes for categorical values that have split with the categories > populated. > According to the website documentation, Gradient Boosted trees should handle > categorical features yet there is no perceivable way to provide the > categorical information to enable handling them as categories not continuous > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org