Dear Mahout developers,

I am a Belgian student in Computer Science and I'd be very interested
in working on Mahout this summer! I am actually a soon-to-be PhD
student in large-scale machine learning. To be honest however, I never
had the chance to use Mahout yet, but I am very eager to test it out
and to contribute! I indeed think that this would be a very good start
to dive into the subject of my PhD thesis.

I am actually quite tempted by some of the GSoC projects you proposed,
including MAHOUT-327 and MAHOUT-342.

For MAHOUT-327, I propose to implement Extra-trees [1]. This method is
a tree-based ensemble method for supervised classification and
regression problems. It is actually quite close to Random Forests. The
main difference is that cut-points are drawn at random when splitting
a node, and then the best one is used to split the current node. (In
Random Forests, the actual best cut-points are computed for each of
the K random attributes, they are not drawn at random.) I know that a
RF module is already integrated in Mahout and I believe that it could
be a good complement in the algorithm toolbox. However, I wonder if
that task would be large enough to constitute a GSoC project in
itself, since the implementation could be quite close to the RF one.

Anyhow, I am also very interested in implementing Neural Networks over
Map/Reduce (Mahout-342). If I understand correctly [2], this is indeed
still lacking in Mahout.

[1]: http://www.montefiore.ulg.ac.be/~ernst/extremely-randomized-trees.pdf
[2]: http://cwiki.apache.org/MAHOUT/algorithms.html

>From a more legal point of view, I was also wondering if I would be
allowed to reuse what I'd have implemented in this GSoC for my own
research? Since Mahout is open-source, I guess this is perfectly fine,
but what about maybe writing a publication somehow related or things
like that?

Best regards,

Gilles Louppe
MSc student in Computer Sciences
Université de Liège (Belgium)

Reply via email to