Thank you, I'll get started on this over the weekend.
> Date: Thu, 23 Feb 2012 13:33:42 +0530 > From: [email protected] > To: [email protected] > Subject: Re: Helping out with the .7 release > > Saikat, > > I have created https://issues.apache.org/jira/browse/MAHOUT-981 for > refactoring KMeansDriver to use the new ClusterClassificationDriver. > > You can provide your patches on this issue. See this to know how to > provide a patch > https://cwiki.apache.org/MAHOUT/how-to-contribute.html#HowToContribute-Generatingapatch. > > Before KMeans refactoring, we are expecting the > ClusterClassificationMapperTest from you ( for Mahout-929 ). That test > case would complete the development of ClusterClassificationDriver and > the refactoring can start. > > Paritosh > > On 23-02-2012 04:55, Jeff Eastman wrote: > > Hi Saikat, > > > > Glad you're excited. Paritosh offered one suggestion below. You could > > look at TestKmeansClustering for patterns you could use to test the > > ClusterClassificationMapper and Driver in MR mode. That should be > > straightforward, but please coordinate with Paritosh so you don't > > duplicate efforts. > > > > Another place you might look into would be the KMeansDriver and > > MAHOUT-930. You could work on refactoring KMeansDriver to use the new > > ClusterClassificationDriver in MAHOUT-929. That would exercise both > > its sequential and MR options. It will be interesting to see how much > > code can be removed. > > > > Finally, you could see if you can wrap your mind around the > > ClusterIterator and how it could be used for further refactoring of > > the KMeansDriver. See TestClusterClassifier for insight. > > > > That enough reading and doing for now? > > Jeff > > > > On 2/22/12 10:06 AM, Saikat Kanjilal wrote: > >> Jeff,I'm pretty excited to help out with this, so as a starter can > >> you point me to where I should begin my readings of the code, I > >> havent looked too closely but are there certain classes in the > >> clustering area where this refactoring effort is centered around. > >> Regards > >> > >>> Date: Wed, 22 Feb 2012 08:56:23 -0700 > >>> From: [email protected] > >>> To: [email protected] > >>> Subject: Re: Helping out with the .7 release > >>> > >>> Hi Saikat, > >>> > >>> I agree with Paritosh, that a great place to begin would be to write > >>> some unit tests. This will familiarize you with the code base and help > >>> us a lot with our 0.7 housekeeping release. The new clustering > >>> classification components are going to unify many - but not all - of > >>> the > >>> existing clustering algorithms to reduce their complexity by factoring > >>> out duplication and streamlining their integration into semi-supervised > >>> classification engines. > >>> > >>> Please feel free to post any questions you may have in reading through > >>> this code. This is a major refactoring effort and we will need all the > >>> help we can get. Thanks for the offer, > >>> > >>> Jeff > >>> > >>> On 2/21/12 10:46 PM, Saikat Kanjilal wrote: > >>>> Hi Paritosh,Yes creating the test case would be a great first > >>>> start, however are there other tasks you guys need help with before > >>>> I can do before the test creation, I will sync trunk and start > >>>> reading through the code in the meantime.Regards > >>>> > >>>>> Date: Wed, 22 Feb 2012 10:57:51 +0530 > >>>>> From: [email protected] > >>>>> To: [email protected] > >>>>> Subject: Re: Helping out with the .7 release > >>>>> > >>>>> We are creating clustering as classification components which will > >>>>> help > >>>>> in moving clustering out. Once the component is ready, then the > >>>>> clustering algorithms would need refactoring. > >>>>> The clustering as classification component and the outlier removal > >>>>> component has been created. > >>>>> > >>>>> Most of it is committed, and rest is available as a patch. See > >>>>> https://issues.apache.org/jira/browse/MAHOUT-929 > >>>>> If you will apply the latest patch available on Mahout-929 you can > >>>>> see > >>>>> all that is available now. > >>>>> > >>>>> If you want, you can help with the test case of > >>>>> ClusterClassificationMapper available in the patch. > >>>>> > >>>>> On 22-02-2012 10:27, Saikat Kanjilal wrote: > >>>>>> Hi Guys,I was interested in helping out with the clustering > >>>>>> component of mahout, I looked through the JIRA items below and > >>>>>> was wondering if there is a specific one that would be good to > >>>>>> start with: > >>>>>> > >>>>>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+MAHOUT+AND+resolution+%3D+Unresolved+AND+component+%3D+Clustering+ORDER+BY+priority+DESC&mode=hide > >>>>>> > >>>>>> > >>>>>> > >>>>>> I initially was thinking to work on Mahout-930 or Mahout-931 but > >>>>>> could work on others if needed. > >>>>>> Best Regards > >>>> > >> > > >
