How many iterations of FuzzyKMeans you are running? Here is my observation- When I ran for few iterations,the cluster centroids are far off. However, when I ran for more than 50 iterations or so, the cluster points are still different but they are very much near by as if they are same. By the way, I am using m=3 in membership function.
Thanks Pallavi -----Original Message----- From: Robin Anil [mailto:robin.a...@gmail.com] Sent: Wednesday, February 17, 2010 8:10 PM To: mahout-dev@lucene.apache.org Subject: Re: Fuzzy K Means Tests are passing fine. But Not when testing reuters. On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti < pallavi.pall...@corp.aol.com> wrote: > If we just need to verify with some sample dataset, we already have > the data in TestFuzzyKMeansClustering code. won't that suffice? > Otherwise, I need to manually generate some sample dataset as I don't > have this small dataset with me. I am actually running on movielens > data using movie ratings as vector (movie as dimension , rating as coefficient) and user as point. > > > Thanks > Pallavi > > Robin Anil wrote: > >> I tracked the versions back to before the change to Writables were done. >> There is nothing significant change in the code. >> >> Can you give me a small dataset 10 points maybe 5 dimensions. I can >> verify the trunk in Case? >> >> Robin >> >> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti < >> pallavi.pall...@corp.aol.com> wrote: >> >> >> >>> I have a local version which I have submitted long back and I am >>> using it on real data and is not giving same point for all clusters. >>> However, I haven't tried with latest mahout code. I have kept my >>> code to output data as text so that it is easy for me to verify. >>> However, current mahout code outputs it as binary data (as >>> sequencefile). So, it is difficult to verify. >>> >>> >>> Thanks >>> Pallavi >>> >>> Robin Anil wrote: >>> >>> >>> >>>> Have you verified the trunk code on some real data. I am getting >>>> same point for all clusters regardless of the distnce measure >>>> >>>> Robin >>>> >>>> >>>> >>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti < >>>> pallavi.pall...@corp.aol.com> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Yes. It shouldn't be a problem. My point was that we are extending >>>>> numpoints as part of ClusterBase, though we are not using it in >>>>> SoftCluster. >>>>> Other that that, I don't see any issue w.r.t. functionality. >>>>> >>>>> >>>>> Thanks >>>>> Pallavi >>>>> >>>>> Robin Anil wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> In the impl of SoftClusters on writeOut it calculates the >>>>>> centroid and writes it and when read(in) it reads the centroid in to the center. >>>>>> >>>>>> In ClusterDumper it reads into the ClusterBase and does >>>>>> value.getCenter(); It should work normally right >>>>>> >>>>>> Robin >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti < >>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Yes. But not the total number of points. So, the numpoints from >>>>>>> ClusterBase will not be used in SoftCluster. numpoints is >>>>>>> specific to Kmeans similar to weightedpoint total for fuzzy >>>>>>> kmeans. >>>>>>> >>>>>>> >>>>>>> Robin Anil wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> the center is still the averaged out centroid right? >>>>>>>> weightedtotalvector/totalprobWeight >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti < >>>>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase >>>>>>>>> would be having number of points to average out >>>>>>>>> (pointTotal/numPoints as per >>>>>>>>> kmeans) >>>>>>>>> where >>>>>>>>> as SoftCluster will have weighted point total. So, I am >>>>>>>>> wondering how can we reuse ClusterBase here? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Pallavi >>>>>>>>> >>>>>>>>> Robin Anil wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> yes. So that cluster dumper can print it out. >>>>>>>>>> >>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti < >>>>>>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi Robin, >>>>>>>>>>> >>>>>>>>>>> when you meant by reusing ClusterBase, are you planning to >>>>>>>>>>> extend ClusterBase in SoftCluster? For example, SoftCluster >>>>>>>>>>> extends ClusterBase? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Pallavi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Robin Anil wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which >>>>>>>>>>>> should be ideally be named FuzzyKmeansCluster) to use the >>>>>>>>>>>> ClusterBase. >>>>>>>>>>>> >>>>>>>>>>>> I am getting* the same center* for all the clusters. To aid >>>>>>>>>>>> the conversion all i did was remove the center vector from >>>>>>>>>>>> the SoftCluster class and reuse the same from the >>>>>>>>>>>> ClusterBase. These are essentially making no change in the >>>>>>>>>>>> tests which passes correctly. >>>>>>>>>>>> >>>>>>>>>>>> So I am questioning whether the implementation keeps the >>>>>>>>>>>> average center at all ? Anyone who has used FuzzyKMeans >>>>>>>>>>>> experiencing this? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Robin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>> >>> >> >> >