[ 
https://issues.apache.org/jira/browse/MAHOUT-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved MAHOUT-747.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.6
         Assignee: Sean Owen

OK I submitted this with moderate changes. Most of it is streamlining and using 
some common code that simplifies this. Much of it was putting this all into its 
own package. Some is small style changes. I had to fix it to ignore _SUCCESS 
files found in Hadoop 0.22+ distros.

> Entropy implementation in Map/Reduce
> ------------------------------------
>
>                 Key: MAHOUT-747
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-747
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Christoph Nagel
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>         Attachments: MAHOUT-747.patch
>
>
> Hi again,
> because I got much to work with entropy and information gain ratio, I want to 
> implement the following distributed algorithms:
> * Entropy 
> (https://secure.wikimedia.org/wikipedia/en/wiki/Entropy_%28information_theory%29)
> * Conditional Entropy 
> (https://secure.wikimedia.org/wikipedia/en/wiki/Conditional_entropy)
> * Information Gain
> * Information Gain Ratio 
> (https://secure.wikimedia.org/wikipedia/en/wiki/Information_gain_ratio)
> This issue is at first only for entropy.
> Some questions:
> * In which package do the classes belong. I put them first at 
> 'org.apache.mahout.math.stats', don't know if this is right, because they are 
> components of information retrieval.
> * Entropy only reads a set of elements. As input i took a sequence file with 
> keys of type Text and values anyone, because I only work with the keys. Is 
> this the best practise?
> * Is there a generic solution, so that the type of keys can be anything 
> inherited from Writable?
> In Hadoop is a TokenCounterMapper, which emits each value with an 
> IntWritable(1). I added a KeyCounterMapper into 
> 'org.apache.mahout.common.mapreduce' which does the same with the keys.
> Will append my patch soon.
> Regards, Christoph.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to