Which env are you using for development, windows or linux? All tests
don't pass on windows with cygwin ( at least for me). I even arranged a
different linux machine for myself, just to run the test cases ( virtual
box would also have worked, just a matter of choice ).
The way you submit patch is up to you.
ClusterClassification is used after buildClusters phase of the
Clustering algorithms ( for which the refactoring is being done ). It
will replace the clusterData phase.
ClusterClassificationDriver can classify vectors either sequentially or
in a mapreduce way. There is already a test case of sequential one. The
logic of classification via mapreduce is in the mapper. So, it should be
tested there.
Try writing a simple test first, something which can test whether the
vectors were classified correctly. Look at
ClusterClassificationDriverTest for assertions.
Later on, we can add more scenarios to it.
Issue centric discussions can also be done on jira, you can create an
account on jira and can add comments on jira issues also, if you wish.
On 24-02-2012 19:03, Saikat Kanjilal wrote:
Paritosh/Jeff,Before I begin the effort of writing the
ClusterClassificationMapperTest I had a few questions, pardon my newbieness
here:
1) I synched the trunk down and started building and noticed that we have some
errors in the tests, is this ok , let me know if I am missing something here in
getting the build going, I believe my build environment is setup correctly
(maven 2.2.1) with Java 62) I was wondering if I should create a github branch
for the code and work off of that, I could then sync my changes when I'm done
into trunk and go through the patching process, do you guys see any issues with
that3) For the ClusterClassificationMapperTest can I get some more context
around this, should we call this ClusterClassificationDriverTest instead, also
should the unit tests basically test all of the bulleted points in Mahout-929
or just pass in parameters into the run method and test that by itself
Regards
Date: Thu, 23 Feb 2012 13:33:42 +0530
From: [email protected]
To: [email protected]
Subject: Re: Helping out with the .7 release
Saikat,
I have created https://issues.apache.org/jira/browse/MAHOUT-981 for
refactoring KMeansDriver to use the new ClusterClassificationDriver.
You can provide your patches on this issue. See this to know how to
provide a patch
https://cwiki.apache.org/MAHOUT/how-to-contribute.html#HowToContribute-Generatingapatch.
Before KMeans refactoring, we are expecting the
ClusterClassificationMapperTest from you ( for Mahout-929 ). That test
case would complete the development of ClusterClassificationDriver and
the refactoring can start.
Paritosh
On 23-02-2012 04:55, Jeff Eastman wrote:
Hi Saikat,
Glad you're excited. Paritosh offered one suggestion below. You could
look at TestKmeansClustering for patterns you could use to test the
ClusterClassificationMapper and Driver in MR mode. That should be
straightforward, but please coordinate with Paritosh so you don't
duplicate efforts.
Another place you might look into would be the KMeansDriver and
MAHOUT-930. You could work on refactoring KMeansDriver to use the new
ClusterClassificationDriver in MAHOUT-929. That would exercise both
its sequential and MR options. It will be interesting to see how much
code can be removed.
Finally, you could see if you can wrap your mind around the
ClusterIterator and how it could be used for further refactoring of
the KMeansDriver. See TestClusterClassifier for insight.
That enough reading and doing for now?
Jeff
On 2/22/12 10:06 AM, Saikat Kanjilal wrote:
Jeff,I'm pretty excited to help out with this, so as a starter can
you point me to where I should begin my readings of the code, I
havent looked too closely but are there certain classes in the
clustering area where this refactoring effort is centered around.
Regards
Date: Wed, 22 Feb 2012 08:56:23 -0700
From: [email protected]
To: [email protected]
Subject: Re: Helping out with the .7 release
Hi Saikat,
I agree with Paritosh, that a great place to begin would be to write
some unit tests. This will familiarize you with the code base and help
us a lot with our 0.7 housekeeping release. The new clustering
classification components are going to unify many - but not all - of
the
existing clustering algorithms to reduce their complexity by factoring
out duplication and streamlining their integration into semi-supervised
classification engines.
Please feel free to post any questions you may have in reading through
this code. This is a major refactoring effort and we will need all the
help we can get. Thanks for the offer,
Jeff
On 2/21/12 10:46 PM, Saikat Kanjilal wrote:
Hi Paritosh,Yes creating the test case would be a great first
start, however are there other tasks you guys need help with before
I can do before the test creation, I will sync trunk and start
reading through the code in the meantime.Regards
Date: Wed, 22 Feb 2012 10:57:51 +0530
From: [email protected]
To: [email protected]
Subject: Re: Helping out with the .7 release
We are creating clustering as classification components which will
help
in moving clustering out. Once the component is ready, then the
clustering algorithms would need refactoring.
The clustering as classification component and the outlier removal
component has been created.
Most of it is committed, and rest is available as a patch. See
https://issues.apache.org/jira/browse/MAHOUT-929
If you will apply the latest patch available on Mahout-929 you can
see
all that is available now.
If you want, you can help with the test case of
ClusterClassificationMapper available in the patch.
On 22-02-2012 10:27, Saikat Kanjilal wrote:
Hi Guys,I was interested in helping out with the clustering
component of mahout, I looked through the JIRA items below and
was wondering if there is a specific one that would be good to
start with:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+MAHOUT+AND+resolution+%3D+Unresolved+AND+component+%3D+Clustering+ORDER+BY+priority+DESC&mode=hide
I initially was thinking to work on Mahout-930 or Mahout-931 but
could work on others if needed.
Best Regards