[ 
https://issues.apache.org/jira/browse/MAHOUT-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman resolved MAHOUT-766.
---------------------------------

       Resolution: Not A Problem
    Fix Version/s: 0.6
         Assignee: Jeff Eastman

I think the problem here is using the default distance measure 
(EuclideanSquared) with fuzzyk. I added 

-dm org.apache.mahout.common.distance.CosineDistanceMeasure \

to the script and it produced clusters that differ somewhat from each other but 
still have a high degree of similarity in their terms and weights. Then I 
decreased m to 1.1 and, predictably, the clusters diverged to be more like the 
kmeans results.

It does seem like there is a lot of sensitivity to the values of m and the 
range 1 < m <= 2 has a large impact on the clusters.

I'm going to resolve this as not a problem.
                
>  fuzzy kmeans - all cluster with the same top terms
> ---------------------------------------------------
>
>                 Key: MAHOUT-766
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-766
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering, Examples
>    Affects Versions: 0.6
>         Environment: tested in OSX and linux
>            Reporter: Paulo Magalhaes
>            Assignee: Jeff Eastman
>             Fix For: 0.6
>
>
>  believe there is something wrong with fkmeans in trunk. 
> I am using code from trunk (last checkout 6/30/11). To recreate is very 
> simple:
> 1) change examples/bin/build-reuters.sh to use fkmeans and set -m 2
> 2) run build-reuters.sh
> 3) Dump the cluster. I'm doing: ../../bin/mahout clusterdump -dt sequencefile 
> -s ./mahout-work/reuters-kmeans/clusters-6 -b 100 -o 
> ./reuters-clusterdump.txt  -d 
> ./mahout-work/reuters-out-seqdir-sparse-kmeans/dictionary.file-0
> here is what the clusters look like:
> SV-15898{n=34 c=[0:0.020, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.7254762602900604
>               mln                                     =>  1.2510936664951733
>               dlrs                                    =>  1.1340145215097008
>               3                                       =>  1.0643797240793276
>               pct                                     =>  1.0422760712239152
>               reuter                                  =>  1.0202689935247569
>               its                                     =>  0.9997771992646881
>               from                                    =>  0.9903731234557381
>               year                                    =>  0.8855389859684145
>               vs                                      =>  0.8291746545786391
> :SV-14766{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6406710289350412
>               mln                                     =>  1.2174993414858022
>               dlrs                                    =>  1.0937941570322955
>               3                                       =>  1.0334420773050856
>               pct                                     =>   0.991539915235039
>               reuter                                  =>   0.990042452019326
>               its                                     =>  0.9508638527143669
>               from                                    =>  0.9403885495991262
>               vs                                      =>   0.865437130369746
>               year                                    =>  0.8463503194752994
> :SV-14854{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.641260962665307
>               mln                                     =>   1.217806578134094
>               dlrs                                    =>  1.0941157210136143
>               3                                       =>  1.0336934328877394
>               pct                                     =>   0.991895013999163
>               reuter                                  =>  0.9902889592990656
>               its                                     =>  0.9512076670014483
>               from                                    =>  0.9407384847445094
>               vs                                      =>  0.8653426311034671
>               year                                    =>  0.8466407590692175
> :SV-14890{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6410352907185948
>               mln                                     =>    1.21769021136256
>               dlrs                                    =>  1.0939933408434481
>               3                                       =>  1.0335977297579235
>               pct                                     =>   0.991759193577722
>               reuter                                  =>  0.9901951250301172
>               its                                     =>  0.9510761761632947
>               from                                    =>  0.9406047832581563
>               vs                                      =>  0.8653814488835572
>               year                                    =>  0.8465301083353372
> :SV-14972{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.640981249652196
>               mln                                     =>  1.2176595452829564
>               dlrs                                    =>   1.093962519439548
>               3                                       =>  1.0335737897463568
>               pct                                     =>  0.9917266257955816
>               reuter                                  =>  0.9901715950801396
>               its                                     =>  0.9510446208123859
>               from                                    =>  0.9405723357372776
>               vs                                      =>  0.8653843699725567
>               year                                    =>   0.846502466267153
> :SV-15023{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6399319888551425
>               mln                                     =>   1.217099157115808
>               dlrs                                    =>  1.0933830369192543
>               3                                       =>   1.033121271434882
>               pct                                     =>   0.991094828319561
>               reuter                                  =>  0.9897275313905611
>               its                                     =>  0.9504327303592046
>               from                                    =>  0.9399480272494183
>               vs                                      =>  0.8655203514280634
>               year                                    =>  0.8459804922897428
> :SV-15330{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6411480082558068
>               mln                                     =>   1.217746071140758
>               dlrs                                    =>  1.0940532425506244
>               3                                       =>  1.0336447143638317
>               pct                                     =>  0.9918269975797083
>               reuter                                  =>   0.990241145450359
>               its                                     =>  0.9511417993006985
>               from                                    =>  0.9406712099799636
>               vs                                      =>  0.8653569180999117
>               year                                    =>  0.8465844425179013
> :SV-15403{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6493270418577013
>               mln                                     =>   1.221708475489808
>               dlrs                                    =>  1.0983489300320377
>               3                                       =>  1.0370024996153944
>               pct                                     =>  0.9967446058994232
>               reuter                                  =>   0.993528974793619
>               its                                     =>  0.9558988111209523
>               from                                    =>  0.9454911460774864
>               vs                                      =>  0.8633642497287671
>               year                                    =>  0.8505083085439775
> :SV-15514{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6414524586689534
>               mln                                     =>  1.2179029815366167
>               dlrs                                    =>   1.094218299808865
>               3                                       =>   1.033773769117182
>               pct                                     =>  0.9920102286561391
>               reuter                                  =>  0.9903676795676004
>               its                                     =>  0.9513191861395162
>               from                                    =>  0.9408515920762511
>               vs                                      =>   0.865304353452142
>               year                                    =>  0.8467337135094862
> :SV-15549{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.640632892454694
>               mln                                     =>  1.2174764812983898
>               dlrs                                    =>  1.0937717467869699
>               3                                       =>   1.033424727632325
>               pct                                     =>    0.99151691360307
>               reuter                                  =>  0.9900253758026865
>               its                                     =>  0.9508415534060888
>               from                                    =>  0.9403654699584985
>               vs                                      =>   0.865436402399392
>               year                                    =>  0.8463303217162843
> :SV-15616{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6402745961421197
>               mln                                     =>   1.217287104215781
>               dlrs                                    =>  1.0935749393200054
>               3                                       =>  1.0332709291683844
>               pct                                     =>  0.9913012005612369
>               reuter                                  =>  0.9898744911012118
>               its                                     =>  0.9506326562835085
>               from                                    =>  0.9401525895225771
>               vs                                      =>  0.8654873596392523
>               year                                    =>  0.8461528918952358
> :SV-15674{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6402335213893247
>               mln                                     =>  1.2172651791725515
>               dlrs                                    =>  1.0935522610806727
>               3                                       =>  1.0332532137000938
>               pct                                     =>   0.991276468108388
>               reuter                                  =>  0.9898571070574692
>               its                                     =>  0.9506087026962596
>               from                                    =>  0.9401281555632803
>               vs                                      =>  0.8654927058873914
>               year                                    =>  0.8461324681573653
> :SV-15720{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.641454220566282
>               mln                                     =>  1.2179063418879368
>               dlrs                                    =>  1.0942205822099829
>               3                                       =>  1.0337754035575257
>               pct                                     =>  0.9920113271819195
>               reuter                                  =>  0.9903693325123661
>               its                                     =>  0.9513202705619623
>               from                                    =>  0.9408530174807668
>               vs                                      =>  0.8653096216062077
>               year                                    =>  0.8467355860669477
> :SV-15732{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6418679366988789
>               mln                                     =>   1.218118262616823
>               dlrs                                    =>  1.0944441677361394
>               3                                       =>  1.0339502052648608
>               pct                                     =>  0.9922602967957669
>               reuter                                  =>  0.9905406967751569
>               its                                     =>  0.9515612774046113
>               from                                    =>   0.941098001639954
>               vs                                      =>   0.865235154416334
>               year                                    =>  0.8469379811534101
> :SV-15825{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6403540331112847
>               mln                                     =>  1.2173302824011656
>               dlrs                                    =>  1.0936192179118565
>               3                                       =>  1.0333054698476525
>               pct                                     =>  0.9913490440255205
>               reuter                                  =>  0.9899084014354236
>               its                                     =>  0.9506790000021428
>               from                                    =>  0.9401999656754023
>               vs                                      =>  0.8654787849286104
>               year                                    =>  0.8461927112339609
> :SV-15888{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.641852069569193
>               mln                                     =>   1.218106579705691
>               dlrs                                    =>  1.0944336674208315
>               3                                       =>  1.0339422184421034
>               pct                                     =>  0.9922506923700831
>               reuter                                  =>  0.9905327937543529
>               its                                     =>   0.951551949990525
>               from                                    =>  0.9410880514065464
>               vs                                      =>  0.8652299423273659
>               year                                    =>  0.8469287549740471
> :SV-15944{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6406094746503062
>               mln                                     =>  1.2174640910103491
>               dlrs                                    =>  1.0937588768380255
>               3                                       =>  1.0334146735611798
>               pct                                     =>  0.9915028147402405
>               reuter                                  =>  0.9900155118531778
>               its                                     =>  0.9508279001565995
>               from                                    =>  0.9403515526055797
>               vs                                      =>   0.865439705916966
>               year                                    =>   0.846318717539638
> :SV-15952{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.641608350634413
>               mln                                     =>  1.2179827157677379
>               dlrs                                    =>   1.094302484756082
>               3                                       =>   1.033839606583586
>               pct                                     =>  0.9921040410110572
>               reuter                                  =>   0.990432219413613
>               its                                     =>  0.9514099986904929
>               from                                    =>  0.9409438763575203
>               vs                                      =>  0.8652760331837802
>               year                                    =>  0.8468099163160301
> :SV-15954{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6429205353451672
>               mln                                     =>  1.2186434984636658
>               dlrs                                    =>  1.0950054459143779
>               3                                       =>  1.0343894404834142
>               pct                                     =>   0.992893505149969
>               reuter                                  =>  0.9909710261706427
>               its                                     =>  0.9521740690117075
>               from                                    =>  0.9417194634871013
>               vs                                      =>  0.8650137662755684
>               year                                    =>  0.8474476266423354
> :SV-16007{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>  1.6401767760282457
>               mln                                     =>  1.2172339691485916
>               dlrs                                    =>   1.093520432998812
>               3                                       =>  1.0332284013507513
>               pct                                     =>  0.9912422858233993
>               reuter                                  =>  0.9898327402827573
>               its                                     =>  0.9505755879363272
>               from                                    =>  0.9400942591120444
>               vs                                      =>  0.8654979916098049
>               year                                    =>  0.8461038772989482
> :SV-16037{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004, 
> 0.02:0.002, 0.03:0.001, 0.046:0.0
>       Top Terms: 
>               said                                    =>   1.640610618380475
>               mln                                     =>  1.2174645746382695
>               dlrs                                    =>  1.0937594396319776
>               3                                       =>  1.0334151203058977
>               pct                                     =>  0.9915035014016228
>               reuter                                  =>  0.9900159476830741
>               its                                     =>  0.9508285640147016
>               from                                    =>  0.9403522136131415
>               vs                                      =>  0.8654392679742507
>               year                                    =>   0.846319234572972

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to