I'd be very interested in benchmark data for and/or performance increases of RecommenderJob (as well as ItemSimilarityJob and RowSimilarityJob which are used internally), if you feel like working on that.

A good starting point to get familiar with the functionality might be Sean's talk from Berlin Buzzwords ( http://berlinbuzzwords.blip.tv/file/3811036/ ) and my slides from Berlin's last Hadoop Get Together ( http://www.slideshare.net/sscdotopen/mahoutcf )

--sebastian

On 20.01.2011 09:08, Sean Owen wrote:
I think it's far from complete or done.

I think it would be interesting to take any of the MapReduce-based jobs, set
it up, run it, and benchmark/profile it to locate some bottlenecks, then
propose optimizations. It is a good way to get familiar with the packages.

You might also investigate suggested settings for Hadoop when running these
jobs.

These are just one type of way you could contribute. Looking into open
issues in JIRA, or adding unit tests, would be fine too.

On Thu, Jan 20, 2011 at 3:36 AM, Kasun Lakpriya
<kasun.lakpriy...@gmail.com>wrote:

Hi Sean,
Thanks for the immediate reply and sorry for my late response.

Our above mentioned project is in progress.

BTW I realized that Mahout is quite interesting and very active project. I
am just interested about contributing to Mahout. As understanding the
complete code base is not an easy task I would like to start from some
basic
point. After getting familiar with the code base I can think of your
suggestion about "improving its speed or reducing its memory/disk usage".

So that what would be a good starting point?

Thank you,
Kasun

On Thu, Dec 30, 2010 at 5:56 PM, Sean Owen<sro...@gmail.com>  wrote:

Hi Kasun,

If you want to get involved, you are free to discuss and propose your own
changes and algorithms. You can review the list of open issues here:
https://issues.apache.org/jira/browse/MAHOUT This contains some ideas
about
work that needs to be done.

One interesting project would be to benchmark the existing distributed
item-based recommender and find ways to improve its speed or reduce its
memory/disk usage. That's a fairly simple starter project and quite
useful.
Sean

On Wed, Dec 29, 2010 at 10:51 AM, Kasun Lakpriya<
kasun.lakpriy...@gmail.com
wrote:
Hi all,
I am Kasun Lakpriya from University of Moratuwa, Sri Lanka. I am
following
a
BSc in Computer Science and Engineering degree and now I am in my final
year.

In our degree program in order to complete the degree we need to do
some
kind of a research project approved by the university. The project I am
working on is about "Web Personalization". The task is to develop a
personalization module which is pluggable to any (theoretically) web
application. After some literature survey we found out that there are
some
existing open source tools we can use to implement this module
(personalization module). Specially what we are focusing on is
Collaborative
Filtering. I have already checked out the mahout trunk and
built successfully and tried this example I found on the web [1]. And I
went
through the wiki page related to Algorithms and found some nice
presentation
about "Distributed item based collaborative filtering" by Sebastian
Schelter. And I went through some similarity measure implementations in
Mahout.

What I want from you all is some guidance and helping hand to start
implementation on improving an algorithm already there in the Mahout or
what
are the other areas we can integrated to Mahout regarding to
Collaborative
Filtering. In the recent mail archives I couldn't find such a
discussion
regarding this thing. Any further reading or references would be
really appreciated.


Thanks and Regards,
Kasun

[1] -


http://philippeadjiman.com/blog/2009/11/11/flexible-collaborative-filtering-in-java-with-
mahout-taste/


Reply via email to