Hi Federico,
Sean just sent out an excellent email, and I'd like to mention a few
things related to what he mentioned:
I think if you're looking at JIRA issues and constructing a good proposal
around that, you
have a good start.
JIRA tickets are the place to start. That is where you will find
everything we are currently working toward regarding Mahout, and is a
good place to officially post your proposals (obviously in addition to
the GSoC app page). That way, as we're perusing the open tickets, we can
leave you feedback there, and you can adjust specific aspects of the
proposal as necessary, just as you would any other open ticket.
However, I feel that one of the best things you can do in a proposal is
convince that you know how much work it is, you know what the steps are, and
you know you can finish it even accounting for unexpected difficulty.
+1
In glancing over your proposal, they are certainly good ideas from a
standpoint of theory, but I would also love to see some more
implementation specifics, as well as plans for how you intend to test,
and what the timelines would be. Have you looked over how Kmeans is
implemented in a map-reduce fashion? Do you understand how map-reduce
works? Can you envision how you would build the map-reduce paradigm into
your kernel smoother and LSH? Can you divide this project up into
phases, and assess how long each phase would take?
I will also say personally that I would prefer to see GSoC projects that
focus on architecture, refactoring, performance tuning and measurement,
tests, etc, rather than implementing another algorithm. Mahout needs the
former more, I think. But I speak for myself and I am not mentoring.
Another excellent point, which I too agree with (though I too speak only
for myself). At the very least, both new algorithms and tuning existing
code are very important and worthy of GSoC projects. Something to keep
in mind.
Shannon