How many iterations of FuzzyKMeans you are running? Here is my
observation- When I ran for few iterations,the cluster centroids are far
off. However, when I ran for more than 50 iterations or so, the cluster
points are still different but they are very much near by as if they are
same. By the way, I am using m=3 in membership function.

Thanks
Pallavi

-----Original Message-----
From: Robin Anil [mailto:robin.a...@gmail.com] 
Sent: Wednesday, February 17, 2010 8:10 PM
To: mahout-dev@lucene.apache.org
Subject: Re: Fuzzy K Means

Tests are passing fine. But Not when testing reuters.

On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:

> If we just need to verify with some sample dataset, we already have 
> the data in TestFuzzyKMeansClustering code. won't that suffice? 
> Otherwise, I need to manually generate some sample dataset as I don't 
> have this small dataset with me. I am actually running on movielens 
> data using movie ratings as vector (movie as dimension , rating as
coefficient) and user as point.
>
>
> Thanks
> Pallavi
>
> Robin Anil wrote:
>
>> I tracked the versions back to before the change to Writables were
done.
>> There is nothing significant change in the code.
>>
>> Can you give me a small dataset 10 points maybe 5 dimensions. I can 
>> verify the trunk in Case?
>>
>> Robin
>>
>> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti < 
>> pallavi.pall...@corp.aol.com> wrote:
>>
>>
>>
>>> I have a local version which I have submitted long back and I am 
>>> using it on real data and is not giving same point for all clusters.

>>> However, I haven't tried with latest mahout code. I have kept my 
>>> code to output data as text so that it is easy for me to verify. 
>>> However, current mahout code outputs it as binary data (as 
>>> sequencefile). So, it is difficult to verify.
>>>
>>>
>>> Thanks
>>> Pallavi
>>>
>>> Robin Anil wrote:
>>>
>>>
>>>
>>>> Have you verified the trunk code on some real data. I am getting 
>>>> same point for all clusters regardless of the distnce measure
>>>>
>>>> Robin
>>>>
>>>>
>>>>
>>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti < 
>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Yes. It shouldn't be a problem. My point was that we are extending

>>>>> numpoints as part of ClusterBase, though we are not using it in 
>>>>> SoftCluster.
>>>>> Other that that, I don't see any issue w.r.t. functionality.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Pallavi
>>>>>
>>>>> Robin Anil wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> In the impl of SoftClusters on writeOut it calculates the 
>>>>>> centroid and writes it and when read(in) it reads the centroid in
to the center.
>>>>>>
>>>>>> In ClusterDumper it reads into the ClusterBase and does 
>>>>>> value.getCenter(); It should work normally right
>>>>>>
>>>>>> Robin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti < 
>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Yes. But not the total number of points. So, the numpoints from 
>>>>>>> ClusterBase will not be used in SoftCluster. numpoints is 
>>>>>>> specific to Kmeans similar to weightedpoint total for fuzzy 
>>>>>>> kmeans.
>>>>>>>
>>>>>>>
>>>>>>> Robin Anil wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> the center is still the averaged out centroid right?
>>>>>>>> weightedtotalvector/totalprobWeight
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti < 
>>>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase 
>>>>>>>>> would be having number of points to average out 
>>>>>>>>> (pointTotal/numPoints as per
>>>>>>>>> kmeans)
>>>>>>>>> where
>>>>>>>>> as SoftCluster will have weighted point total. So, I am 
>>>>>>>>> wondering how can we reuse ClusterBase here?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Pallavi
>>>>>>>>>
>>>>>>>>> Robin Anil wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> yes. So that cluster dumper can print it out.
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti < 
>>>>>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi Robin,
>>>>>>>>>>>
>>>>>>>>>>> when you meant by reusing ClusterBase, are you planning to 
>>>>>>>>>>> extend ClusterBase in SoftCluster? For example, SoftCluster 
>>>>>>>>>>> extends ClusterBase?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Pallavi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Robin Anil wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which

>>>>>>>>>>>> should be ideally be named FuzzyKmeansCluster) to use the 
>>>>>>>>>>>> ClusterBase.
>>>>>>>>>>>>
>>>>>>>>>>>> I am getting* the same center* for all the clusters. To aid

>>>>>>>>>>>> the conversion all i did was remove the center vector from 
>>>>>>>>>>>> the SoftCluster class and reuse the same from the 
>>>>>>>>>>>> ClusterBase. These are essentially making no change in the 
>>>>>>>>>>>> tests which passes correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> So I am questioning whether the implementation keeps the 
>>>>>>>>>>>> average center at all ? Anyone who has used FuzzyKMeans 
>>>>>>>>>>>> experiencing this?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Robin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>
>>>
>>
>>
>

Reply via email to