[
https://issues.apache.org/jira/browse/MAHOUT-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-747.
------------------------------
Resolution: Fixed
Fix Version/s: 0.6
Assignee: Sean Owen
OK I submitted this with moderate changes. Most of it is streamlining and using
some common code that simplifies this. Much of it was putting this all into its
own package. Some is small style changes. I had to fix it to ignore _SUCCESS
files found in Hadoop 0.22+ distros.
> Entropy implementation in Map/Reduce
> ------------------------------------
>
> Key: MAHOUT-747
> URL: https://issues.apache.org/jira/browse/MAHOUT-747
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.6
> Reporter: Christoph Nagel
> Assignee: Sean Owen
> Fix For: 0.6
>
> Attachments: MAHOUT-747.patch
>
>
> Hi again,
> because I got much to work with entropy and information gain ratio, I want to
> implement the following distributed algorithms:
> * Entropy
> (https://secure.wikimedia.org/wikipedia/en/wiki/Entropy_%28information_theory%29)
> * Conditional Entropy
> (https://secure.wikimedia.org/wikipedia/en/wiki/Conditional_entropy)
> * Information Gain
> * Information Gain Ratio
> (https://secure.wikimedia.org/wikipedia/en/wiki/Information_gain_ratio)
> This issue is at first only for entropy.
> Some questions:
> * In which package do the classes belong. I put them first at
> 'org.apache.mahout.math.stats', don't know if this is right, because they are
> components of information retrieval.
> * Entropy only reads a set of elements. As input i took a sequence file with
> keys of type Text and values anyone, because I only work with the keys. Is
> this the best practise?
> * Is there a generic solution, so that the type of keys can be anything
> inherited from Writable?
> In Hadoop is a TokenCounterMapper, which emits each value with an
> IntWritable(1). I added a KeyCounterMapper into
> 'org.apache.mahout.common.mapreduce' which does the same with the keys.
> Will append my patch soon.
> Regards, Christoph.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira