Apologies. My observation were with m=2 where the points were all near by. However, when I tried with m=3, I found the clusters much better than what we see when m=2. Also, I am using the cluster initialization patch for initializing the clusters.

Thanks
Pallavi

Robin Anil wrote:
Yes, I am seeing the same behaviour with m=2 but the convergence is faster

On Wed, Feb 17, 2010 at 11:21 PM, Palleti, Pallavi <
pallavi.pall...@corp.aol.com> wrote:

How many iterations of FuzzyKMeans you are running? Here is my
observation- When I ran for few iterations,the cluster centroids are far
off. However, when I ran for more than 50 iterations or so, the cluster
points are still different but they are very much near by as if they are
same. By the way, I am using m=3 in membership function.

Thanks
Pallavi

-----Original Message-----
From: Robin Anil [mailto:robin.a...@gmail.com]
Sent: Wednesday, February 17, 2010 8:10 PM
To: mahout-dev@lucene.apache.org
Subject: Re: Fuzzy K Means

Tests are passing fine. But Not when testing reuters.

On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:

If we just need to verify with some sample dataset, we already have
the data in TestFuzzyKMeansClustering code. won't that suffice?
Otherwise, I need to manually generate some sample dataset as I don't
have this small dataset with me. I am actually running on movielens
data using movie ratings as vector (movie as dimension , rating as
coefficient) and user as point.
Thanks
Pallavi

Robin Anil wrote:

I tracked the versions back to before the change to Writables were
done.
There is nothing significant change in the code.

Can you give me a small dataset 10 points maybe 5 dimensions. I can
verify the trunk in Case?

Robin

On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:



I have a local version which I have submitted long back and I am
using it on real data and is not giving same point for all clusters.
However, I haven't tried with latest mahout code. I have kept my
code to output data as text so that it is easy for me to verify.
However, current mahout code outputs it as binary data (as
sequencefile). So, it is difficult to verify.


Thanks
Pallavi

Robin Anil wrote:



Have you verified the trunk code on some real data. I am getting
same point for all clusters regardless of the distnce measure

Robin



On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:





Yes. It shouldn't be a problem. My point was that we are extending
numpoints as part of ClusterBase, though we are not using it in
SoftCluster.
Other that that, I don't see any issue w.r.t. functionality.


Thanks
Pallavi

Robin Anil wrote:





In the impl of SoftClusters on writeOut it calculates the
centroid and writes it and when read(in) it reads the centroid in
to the center.
In ClusterDumper it reads into the ClusterBase and does
value.getCenter(); It should work normally right

Robin



On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:







Yes. But not the total number of points. So, the numpoints from
ClusterBase will not be used in SoftCluster. numpoints is
specific to Kmeans similar to weightedpoint total for fuzzy
kmeans.


Robin Anil wrote:







the center is still the averaged out centroid right?
weightedtotalvector/totalprobWeight



On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:









I haven't yet gone thru ClusterDumper. However, ClusterBase
would be having number of points to average out
(pointTotal/numPoints as per
kmeans)
where
as SoftCluster will have weighted point total. So, I am
wondering how can we reuse ClusterBase here?


Thanks
Pallavi

Robin Anil wrote:









yes. So that cluster dumper can print it out.

On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:











Hi Robin,

when you meant by reusing ClusterBase, are you planning to
extend ClusterBase in SoftCluster? For example, SoftCluster
extends ClusterBase?

Thanks
Pallavi


Robin Anil wrote:











I have been trying to convert FuzzyKMeans SoftCluster(which
should be ideally be named FuzzyKmeansCluster) to use the
ClusterBase.

I am getting* the same center* for all the clusters. To aid
the conversion all i did was remove the center vector from
the SoftCluster class and reuse the same from the
ClusterBase. These are essentially making no change in the
tests which passes correctly.

So I am questioning whether the implementation keeps the
average center at all ? Anyone who has used FuzzyKMeans
experiencing this?


Robin














Reply via email to