Another great contribution would be small or mid-sized datasets and "gold master" output sets for some of the standard computations. This problem requires both gold masters and evaluation algorithms for numerical variations against the masters.
This would be >very< educational about how Recommenders, Matrix arithmetic, Classifiers etc. work. Hell, I should do it. On Thu, Jan 20, 2011 at 9:58 AM, Kasun Lakpriya <[email protected]> wrote: > Thanks Sean and Sebastian. > > Yes, it's still far away, just finished documentation stuff. > > I will go though these stuff (Thanks for the links Sebastian) and try to get > familiar with Mahout. After that I can go in to your suggestions one by > one. > > On Thu, Jan 20, 2011 at 1:46 PM, Sebastian Schelter <[email protected]> wrote: > >> I'd be very interested in benchmark data for and/or performance increases >> of RecommenderJob (as well as ItemSimilarityJob and RowSimilarityJob which >> are used internally), if you feel like working on that. >> >> A good starting point to get familiar with the functionality might be >> Sean's talk from Berlin Buzzwords ( >> http://berlinbuzzwords.blip.tv/file/3811036/ ) and my slides from Berlin's >> last Hadoop Get Together ( http://www.slideshare.net/sscdotopen/mahoutcf ) >> >> --sebastian >> >> >> On 20.01.2011 09:08, Sean Owen wrote: >> >>> I think it's far from complete or done. >>> >>> I think it would be interesting to take any of the MapReduce-based jobs, >>> set >>> it up, run it, and benchmark/profile it to locate some bottlenecks, then >>> propose optimizations. It is a good way to get familiar with the packages. >>> >>> You might also investigate suggested settings for Hadoop when running >>> these >>> jobs. >>> >>> These are just one type of way you could contribute. Looking into open >>> issues in JIRA, or adding unit tests, would be fine too. >>> >>> On Thu, Jan 20, 2011 at 3:36 AM, Kasun Lakpriya >>> <[email protected]>wrote: >>> >>> Hi Sean, >>>> Thanks for the immediate reply and sorry for my late response. >>>> >>>> Our above mentioned project is in progress. >>>> >>>> BTW I realized that Mahout is quite interesting and very active project. >>>> I >>>> am just interested about contributing to Mahout. As understanding the >>>> complete code base is not an easy task I would like to start from some >>>> basic >>>> point. After getting familiar with the code base I can think of your >>>> suggestion about "improving its speed or reducing its memory/disk usage". >>>> >>>> So that what would be a good starting point? >>>> >>>> Thank you, >>>> Kasun >>>> >>>> On Thu, Dec 30, 2010 at 5:56 PM, Sean Owen<[email protected]> wrote: >>>> >>>> Hi Kasun, >>>>> >>>>> If you want to get involved, you are free to discuss and propose your >>>>> own >>>>> changes and algorithms. You can review the list of open issues here: >>>>> https://issues.apache.org/jira/browse/MAHOUT This contains some ideas >>>>> about >>>>> work that needs to be done. >>>>> >>>>> One interesting project would be to benchmark the existing distributed >>>>> item-based recommender and find ways to improve its speed or reduce its >>>>> memory/disk usage. That's a fairly simple starter project and quite >>>>> >>>> useful. >>>> >>>>> Sean >>>>> >>>>> On Wed, Dec 29, 2010 at 10:51 AM, Kasun Lakpriya< >>>>> [email protected] >>>>> >>>>>> wrote: >>>>>> Hi all, >>>>>> I am Kasun Lakpriya from University of Moratuwa, Sri Lanka. I am >>>>>> >>>>> following >>>>> >>>>>> a >>>>>> BSc in Computer Science and Engineering degree and now I am in my final >>>>>> year. >>>>>> >>>>>> In our degree program in order to complete the degree we need to do >>>>>> >>>>> some >>>> >>>>> kind of a research project approved by the university. The project I am >>>>>> working on is about "Web Personalization". The task is to develop a >>>>>> personalization module which is pluggable to any (theoretically) web >>>>>> application. After some literature survey we found out that there are >>>>>> >>>>> some >>>>> >>>>>> existing open source tools we can use to implement this module >>>>>> (personalization module). Specially what we are focusing on is >>>>>> Collaborative >>>>>> Filtering. I have already checked out the mahout trunk and >>>>>> built successfully and tried this example I found on the web [1]. And I >>>>>> went >>>>>> through the wiki page related to Algorithms and found some nice >>>>>> presentation >>>>>> about "Distributed item based collaborative filtering" by Sebastian >>>>>> Schelter. And I went through some similarity measure implementations in >>>>>> Mahout. >>>>>> >>>>>> What I want from you all is some guidance and helping hand to start >>>>>> implementation on improving an algorithm already there in the Mahout or >>>>>> what >>>>>> are the other areas we can integrated to Mahout regarding to >>>>>> >>>>> Collaborative >>>>> >>>>>> Filtering. In the recent mail archives I couldn't find such a >>>>>> >>>>> discussion >>>> >>>>> regarding this thing. Any further reading or references would be >>>>>> really appreciated. >>>>>> >>>>>> >>>>>> Thanks and Regards, >>>>>> Kasun >>>>>> >>>>>> [1] - >>>>>> >>>>>> >>>>>> >>>> http://philippeadjiman.com/blog/2009/11/11/flexible-collaborative-filtering-in-java-with- >>>> >>>>> mahout-taste/ >>>>>> >>>>>> >> > -- Lance Norskog [email protected]
