Hi Saikat,

Glad you're excited. Paritosh offered one suggestion below. You could look at TestKmeansClustering for patterns you could use to test the ClusterClassificationMapper and Driver in MR mode. That should be straightforward, but please coordinate with Paritosh so you don't duplicate efforts.

Another place you might look into would be the KMeansDriver and MAHOUT-930. You could work on refactoring KMeansDriver to use the new ClusterClassificationDriver in MAHOUT-929. That would exercise both its sequential and MR options. It will be interesting to see how much code can be removed.

Finally, you could see if you can wrap your mind around the ClusterIterator and how it could be used for further refactoring of the KMeansDriver. See TestClusterClassifier for insight.

That enough reading and doing for now?
Jeff

On 2/22/12 10:06 AM, Saikat Kanjilal wrote:
Jeff,I'm pretty excited to help out with this, so as a starter can you point me 
to where I should begin my readings of the code, I havent looked too closely 
but are there certain classes in the clustering area where this refactoring 
effort is centered around.
Regards

Date: Wed, 22 Feb 2012 08:56:23 -0700
From: [email protected]
To: [email protected]
Subject: Re: Helping out with the .7 release

Hi Saikat,

I agree with Paritosh, that a great place to begin would be to write
some unit tests. This will familiarize you with the code base and help
us a lot with our 0.7 housekeeping release. The new clustering
classification components are going to unify many - but not all - of the
existing clustering algorithms to reduce their complexity by factoring
out duplication and streamlining their integration into semi-supervised
classification engines.

Please feel free to post any questions you may have in reading through
this code. This is a major refactoring effort and we will need all the
help we can get. Thanks for the offer,

Jeff

On 2/21/12 10:46 PM, Saikat Kanjilal wrote:
Hi Paritosh,Yes creating the test case would be a great first start, however 
are there other tasks you guys need help with before I can do before the test 
creation, I will sync trunk and start reading through the code in the 
meantime.Regards

Date: Wed, 22 Feb 2012 10:57:51 +0530
From: [email protected]
To: [email protected]
Subject: Re: Helping out with the .7 release

We are creating clustering as classification components which will help
in moving clustering out. Once the component is ready, then the
clustering algorithms would need refactoring.
The clustering as classification component and the outlier removal
component has been created.

Most of it is committed, and rest is available as a patch. See
https://issues.apache.org/jira/browse/MAHOUT-929
If you will apply the latest patch available on Mahout-929 you can see
all that is available now.

If you want, you can help with the test case of
ClusterClassificationMapper available in the patch.

On 22-02-2012 10:27, Saikat Kanjilal wrote:
Hi Guys,I was interested in helping out with the clustering component of 
mahout, I looked through the JIRA items below and was wondering if there is a 
specific one that would be good to start with:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+MAHOUT+AND+resolution+%3D+Unresolved+AND+component+%3D+Clustering+ORDER+BY+priority+DESC&mode=hide

I initially was thinking to work on Mahout-930 or Mahout-931 but could work on 
others if needed.
Best Regards                                    
                                        
                                        

Reply via email to