[ https://issues.apache.org/jira/browse/MAHOUT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-216: ----------------------------- Fix Version/s: 0.3 Affects Version/s: 0.2 > Improve the results of MAHOUT-145 by uniformly distributing the classes in > the partitioned data > ----------------------------------------------------------------------------------------------- > > Key: MAHOUT-216 > URL: https://issues.apache.org/jira/browse/MAHOUT-216 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.2 > Reporter: Deneche A. Hakim > Assignee: Deneche A. Hakim > Fix For: 0.3 > > > the poor results of the partial decision forest implementation may be > explained by the particular distribution of the partitioned data. For > example, if a partition does not contain any instance of a given class, the > decision trees built using this partition won't be able to classify this > class. > According to [CHAN, 95]: > {quote} > Random Selection of the partitioned data sets with a uniform distribution of > classes is perhaps the most sensible solution. Here we may attempt to > maintain the same frequency distribution over the ''class attribute" so that > each partition represents a good but a smaller model of the entire training > set > {quote} > [CHAN, 95]: Philip K. Chan, "On the Accuracy of Meta-learning for Scalable > Data Mining" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.