[ 
https://issues.apache.org/jira/browse/MAHOUT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-216:
-----------------------------

        Fix Version/s: 0.3
    Affects Version/s: 0.2

> Improve the results of MAHOUT-145 by uniformly distributing the classes in 
> the partitioned data
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-216
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-216
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: 0.3
>
>
> the poor results of the partial decision forest implementation may be 
> explained by the particular distribution of the partitioned data. For 
> example, if a partition does not contain any instance of a given class, the 
> decision trees built using this partition won't be able to classify this 
> class. 
> According to [CHAN, 95]:
> {quote}
> Random Selection of the partitioned data sets with a uniform distribution of 
> classes is perhaps the most sensible solution. Here we may attempt to 
> maintain the same frequency distribution over the ''class attribute" so that 
> each partition represents a good but a smaller model of the entire training 
> set
> {quote}
> [CHAN, 95]: Philip K. Chan, "On the Accuracy of Meta-learning for Scalable 
> Data Mining" 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to