Re: Regarding Collaborative Filtering.

Sebastian Schelter Thu, 20 Jan 2011 00:16:36 -0800

I'd be very interested in benchmark data for and/or performanceincreases of RecommenderJob (as well as ItemSimilarityJob andRowSimilarityJob which are used internally), if you feel like working onthat.

A good starting point to get familiar with the functionality might beSean's talk from Berlin Buzzwords (http://berlinbuzzwords.blip.tv/file/3811036/ ) and my slides fromBerlin's last Hadoop Get Together (http://www.slideshare.net/sscdotopen/mahoutcf )


--sebastian

On 20.01.2011 09:08, Sean Owen wrote:

I think it's far from complete or done.

I think it would be interesting to take any of the MapReduce-based jobs, set
it up, run it, and benchmark/profile it to locate some bottlenecks, then
propose optimizations. It is a good way to get familiar with the packages.

You might also investigate suggested settings for Hadoop when running these
jobs.

These are just one type of way you could contribute. Looking into open
issues in JIRA, or adding unit tests, would be fine too.

On Thu, Jan 20, 2011 at 3:36 AM, Kasun Lakpriya
<[email protected]>wrote:

Hi Sean,
Thanks for the immediate reply and sorry for my late response.

Our above mentioned project is in progress.

BTW I realized that Mahout is quite interesting and very active project. I
am just interested about contributing to Mahout. As understanding the
complete code base is not an easy task I would like to start from some
basic
point. After getting familiar with the code base I can think of your
suggestion about "improving its speed or reducing its memory/disk usage".

So that what would be a good starting point?

Thank you,
Kasun

On Thu, Dec 30, 2010 at 5:56 PM, Sean Owen<[email protected]>  wrote:

Hi Kasun,

If you want to get involved, you are free to discuss and propose your own
changes and algorithms. You can review the list of open issues here:
https://issues.apache.org/jira/browse/MAHOUT This contains some ideas
about
work that needs to be done.

One interesting project would be to benchmark the existing distributed
item-based recommender and find ways to improve its speed or reduce its
memory/disk usage. That's a fairly simple starter project and quite

useful.

Sean

On Wed, Dec 29, 2010 at 10:51 AM, Kasun Lakpriya<
[email protected]

wrote:
Hi all,
I am Kasun Lakpriya from University of Moratuwa, Sri Lanka. I am

following

a
BSc in Computer Science and Engineering degree and now I am in my final
year.

In our degree program in order to complete the degree we need to do

some

kind of a research project approved by the university. The project I am
working on is about "Web Personalization". The task is to develop a
personalization module which is pluggable to any (theoretically) web
application. After some literature survey we found out that there are

some

existing open source tools we can use to implement this module
(personalization module). Specially what we are focusing on is
Collaborative
Filtering. I have already checked out the mahout trunk and
built successfully and tried this example I found on the web [1]. And I
went
through the wiki page related to Algorithms and found some nice
presentation
about "Distributed item based collaborative filtering" by Sebastian
Schelter. And I went through some similarity measure implementations in
Mahout.

What I want from you all is some guidance and helping hand to start
implementation on improving an algorithm already there in the Mahout or
what
are the other areas we can integrated to Mahout regarding to

Collaborative

Filtering. In the recent mail archives I couldn't find such a

discussion

regarding this thing. Any further reading or references would be
really appreciated.


Thanks and Regards,
Kasun

[1] -

http://philippeadjiman.com/blog/2009/11/11/flexible-collaborative-filtering-in-java-with-

mahout-taste/

Re: Regarding Collaborative Filtering.

Reply via email to