Yes, I am seeing the same behaviour with m=2 but the convergence is faster On Wed, Feb 17, 2010 at 11:21 PM, Palleti, Pallavi < pallavi.pall...@corp.aol.com> wrote:
> How many iterations of FuzzyKMeans you are running? Here is my > observation- When I ran for few iterations,the cluster centroids are far > off. However, when I ran for more than 50 iterations or so, the cluster > points are still different but they are very much near by as if they are > same. By the way, I am using m=3 in membership function. > > Thanks > Pallavi > > -----Original Message----- > From: Robin Anil [mailto:robin.a...@gmail.com] > Sent: Wednesday, February 17, 2010 8:10 PM > To: mahout-dev@lucene.apache.org > Subject: Re: Fuzzy K Means > > Tests are passing fine. But Not when testing reuters. > > On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti < > pallavi.pall...@corp.aol.com> wrote: > > > If we just need to verify with some sample dataset, we already have > > the data in TestFuzzyKMeansClustering code. won't that suffice? > > Otherwise, I need to manually generate some sample dataset as I don't > > have this small dataset with me. I am actually running on movielens > > data using movie ratings as vector (movie as dimension , rating as > coefficient) and user as point. > > > > > > Thanks > > Pallavi > > > > Robin Anil wrote: > > > >> I tracked the versions back to before the change to Writables were > done. > >> There is nothing significant change in the code. > >> > >> Can you give me a small dataset 10 points maybe 5 dimensions. I can > >> verify the trunk in Case? > >> > >> Robin > >> > >> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti < > >> pallavi.pall...@corp.aol.com> wrote: > >> > >> > >> > >>> I have a local version which I have submitted long back and I am > >>> using it on real data and is not giving same point for all clusters. > > >>> However, I haven't tried with latest mahout code. I have kept my > >>> code to output data as text so that it is easy for me to verify. > >>> However, current mahout code outputs it as binary data (as > >>> sequencefile). So, it is difficult to verify. > >>> > >>> > >>> Thanks > >>> Pallavi > >>> > >>> Robin Anil wrote: > >>> > >>> > >>> > >>>> Have you verified the trunk code on some real data. I am getting > >>>> same point for all clusters regardless of the distnce measure > >>>> > >>>> Robin > >>>> > >>>> > >>>> > >>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti < > >>>> pallavi.pall...@corp.aol.com> wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> Yes. It shouldn't be a problem. My point was that we are extending > > >>>>> numpoints as part of ClusterBase, though we are not using it in > >>>>> SoftCluster. > >>>>> Other that that, I don't see any issue w.r.t. functionality. > >>>>> > >>>>> > >>>>> Thanks > >>>>> Pallavi > >>>>> > >>>>> Robin Anil wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> In the impl of SoftClusters on writeOut it calculates the > >>>>>> centroid and writes it and when read(in) it reads the centroid in > to the center. > >>>>>> > >>>>>> In ClusterDumper it reads into the ClusterBase and does > >>>>>> value.getCenter(); It should work normally right > >>>>>> > >>>>>> Robin > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti < > >>>>>> pallavi.pall...@corp.aol.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Yes. But not the total number of points. So, the numpoints from > >>>>>>> ClusterBase will not be used in SoftCluster. numpoints is > >>>>>>> specific to Kmeans similar to weightedpoint total for fuzzy > >>>>>>> kmeans. > >>>>>>> > >>>>>>> > >>>>>>> Robin Anil wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> the center is still the averaged out centroid right? > >>>>>>>> weightedtotalvector/totalprobWeight > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti < > >>>>>>>> pallavi.pall...@corp.aol.com> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase > >>>>>>>>> would be having number of points to average out > >>>>>>>>> (pointTotal/numPoints as per > >>>>>>>>> kmeans) > >>>>>>>>> where > >>>>>>>>> as SoftCluster will have weighted point total. So, I am > >>>>>>>>> wondering how can we reuse ClusterBase here? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> Pallavi > >>>>>>>>> > >>>>>>>>> Robin Anil wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> yes. So that cluster dumper can print it out. > >>>>>>>>>> > >>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti < > >>>>>>>>>> pallavi.pall...@corp.aol.com> wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Hi Robin, > >>>>>>>>>>> > >>>>>>>>>>> when you meant by reusing ClusterBase, are you planning to > >>>>>>>>>>> extend ClusterBase in SoftCluster? For example, SoftCluster > >>>>>>>>>>> extends ClusterBase? > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> Pallavi > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Robin Anil wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which > > >>>>>>>>>>>> should be ideally be named FuzzyKmeansCluster) to use the > >>>>>>>>>>>> ClusterBase. > >>>>>>>>>>>> > >>>>>>>>>>>> I am getting* the same center* for all the clusters. To aid > > >>>>>>>>>>>> the conversion all i did was remove the center vector from > >>>>>>>>>>>> the SoftCluster class and reuse the same from the > >>>>>>>>>>>> ClusterBase. These are essentially making no change in the > >>>>>>>>>>>> tests which passes correctly. > >>>>>>>>>>>> > >>>>>>>>>>>> So I am questioning whether the implementation keeps the > >>>>>>>>>>>> average center at all ? Anyone who has used FuzzyKMeans > >>>>>>>>>>>> experiencing this? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Robin > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>> > >>> > >> > >> > > >