[ https://issues.apache.org/jira/browse/SPARK-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697974#comment-16697974 ]
Pablo J. Villacorta commented on SPARK-24710: --------------------------------------------- It seems this is not considered a very interesting feature by the community... have you worked on it, Aleksey? > Information Gain Ratio for decision trees > ----------------------------------------- > > Key: SPARK-24710 > URL: https://issues.apache.org/jira/browse/SPARK-24710 > Project: Spark > Issue Type: New Feature > Components: ML > Affects Versions: 2.3.1 > Reporter: Pablo J. Villacorta > Priority: Minor > Labels: features > > Spark currently uses Information Gain (IG) to decide the next feature to > branch on when building a decision tree. In case of categorical features, IG > is known to be biased towards features with a large number of categories. > [Information Gain Ratio|https://en.wikipedia.org/wiki/Information_gain_ratio] > solves this problem by dividing the IG by a number that characterizes the > intrinsic information of a feature. > As far as I know, Spark has IG but not IGR. It would be nice to have the > possibility to choose IGR instead of IG. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org