Inline

On Jul 5, 2009, at 3:37 PM, Isabel Drost wrote:


People seem to slowly become aware that there is something named Hadoop that implements a framework for parallel programming once developed at Google. However the basic assumptions and implications (e.g. data locality) are known
only by few groups/ people at least in the IR and data mining domains.

This is always the case with new things. It is impossible to keep up with all the things happening. It's why it is important to keep trying to raise visibility like we are doing.

FWIW, I see the same here, although Hadoop has a lot of buzz right now.



Anytime I asked people using Apache software as to whether they are subscribed to the corresponding user mailinglist the answer was a questioning face and no as an answer. I tried to make clear why participation is important - I
guess we will see in the near future whether I was successful ;)

Participation takes a whole other level of commitment. People need to be able to quickly see the benefit or be willing to be on the cutting edge. It's hard to join a project in the early stages because it may very well be the case that the project doesn't make it. I think the ASF raises the chances of success, but it doesn't guarantee it.



I was surprised to see people only vaguely aware of the GSoC program. They knew that it does exist, but the general setup was not as widely known as I would have expected it to be. After all in our GSoC proposals there seemed to
be quite a few students co-supervised by their university.

GSOC is relatively small, so I don't find it that surprising. And, they cut back this year, too.



Concerning Mahout I got varying feedback: There were a few that had a look at
it last autumn that found it difficult to find the sourcecode and
documentation. Some students had a look shortly after Apache Con EU this year and found it hard to setup a demo application. I think having some JavaDoc, tutorial, setup sort of documentation for each release version on our website
might help people getting started easier?

I've been working on this a lot lately and agree it is important for us for 0.2. Some rework of the landing web page to include quicker links to source, etc. would be helpful.

Having some sites in production will also be useful, once we get there. All in good time. The key right now is for us committers to make sure we are reviewing patches, improving the code and helping new contributors feel welcome and help them become committers when appropriate.



Other than that general feedback seemed to be that we are doing "surprisingly well" both in terms of emerging community and in terms of implementation
progress over the first year.

+1


Last but not least: From DIMA at TU Berlin I received the offer to do
a "Mahout seminar". It would consist of two parts: A theoretical one where students read scientific publications, prepare a survey and give a talk by the end of the semester. The other part would be a project where they could work for instance on some algorithm implementation or integrate already existing implementations in a project. Goal would be to strengthen their
programming and project management skills and along the way make them
contribute back to the community.

cool.


My first thought was to prepare a task with the goal of building a new
blog "search engine". They could build a system that identifies clusters of blogs on a common topic, work on the link graph in the blogosphere, detect
new emerging topics and the like. Before preparing the final seminar
proposal, I would like to ask you whether there is anything you might want
those students to work on during their winter-term.


That sounds pretty involved to get done in a semester, but maybe it depends on the level of student. I could also see things like benchmarking, setting up clusters and running/tuning. Creating demos, etc. In other words, let them do a couple of projects.

Reply via email to