I created a repository on Github with the source code, the jar file and an example https://github.com/Nophiq/Mahout
I can provide you two use cases. In Bioinformatics feature selection is used to detect which genes mainly affects the result, and in Fraud Detection feature selection is used to build better models. These are some reasons for which the library was built and actually, in the beginning, this library was just mRMR in MapReduce (run on top of Hadoop), then it was rewritten for Mahout. 2013/4/2 Ted Dunning <ted.dunn...@gmail.com> > Can you suggest detailed use cases? > > In my experience explicit variable selection is not a common strategy in > machine learning at scale. If anything, the use of regularizers has driven > things in another direction entirely. > > > On Tue, Apr 2, 2013 at 10:34 AM, Claudio Reggiani <nop...@gmail.com> > wrote: > > > After one month I'd like to know if this new feature is interesting for > > Mahout, or I didn't get any reply because nobody noticed it. If it is not > > good enough I could first publish it on github on my account. > > > > > > 2013/3/6 Claudio Reggiani (JIRA) <j...@apache.org> > > > > > Claudio Reggiani created MAHOUT-1152: > > > ---------------------------------------- > > > > > > Summary: mRMR feature selection algorithm > > > Key: MAHOUT-1152 > > > URL: > https://issues.apache.org/jira/browse/MAHOUT-1152 > > > Project: Mahout > > > Issue Type: Improvement > > > Components: Integration > > > Affects Versions: 0.7 > > > Reporter: Claudio Reggiani > > > Priority: Minor > > > Fix For: 0.8 > > > > > > > > > Proposal Title: mRMR Feature Selection Algorithm on Map-Reduce. > > > > > > Student Name: Claudio Reggiani > > > > > > Student E-mail: nop...@gmail.com > > > > > > Proposal Abstract: > > > > > > The mRMR algorithm, described in [1], is a feature selection algorithm > > > that leverages mutual information evaluation to select features. At > each > > > iteration, mRMR selects a new feature based on both how much it's > > strongly > > > correlated to the target output and how much it's less correlated to > the > > > features already selected. The correlation is measured by means of > mutual > > > information. The project proposes to provide the mRMR algorithm in > > > MapReduce programming framework. > > > > > > Additional information: > > > > > > 1. *The code is already available* with some tests, because I'm working > > on > > > my master thesis an initial milestone of my research was to implement > > mRMR > > > algorithm in MapReduce. > > > 2. I'm figuring out if it's possible for me to apply at Google Summer > of > > > Code 2013. > > > > > > References: > > > > > > [1] Hanchuan Peng, Fuhui Long, and Chris Ding > > > IEEE Transactions on Pattern Analysis and Machine Intelligence, > > > Vol. 27, No. 8, pp.1226-1238, 2005. > > > Link: > http://penglab.janelia.org/papersall/docpdf/2005_TPAMI_FeaSel.pdf > > > > > > -- > > > This message is automatically generated by JIRA. > > > If you think it was sent incorrectly, please contact your JIRA > > > administrators > > > For more information on JIRA, see: > > http://www.atlassian.com/software/jira > > > > > >