2015-01-06 15:03 GMT+08:00 Lee S <sle...@gmail.com>:
> But parameters and distance measure is the same. Only difference: Mahout > kmeans convergence is based on whether every cluster is convergenced. > scikit-learn is based on within-cluster sum of squared criterion. > > 2015-01-06 14:15 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>: > >> I don't think that data is sufficiently clusterable to expect a unique >> solution. >> >> Mean squared error would be a better measure of quality. >> >> >> >> On Mon, Jan 5, 2015 at 10:07 PM, Lee S <sle...@gmail.com> wrote: >> >> > Data in thie link: >> > >> > >> http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data >> > . >> > I convert it to sequencefile with InputDriver. >> > >> > 2015-01-06 14:04 GMT+08:00 Ted Dunning <ted.dunn...@gmail.com>: >> > >> > > What kind of synthetic data did you use? >> > > >> > > >> > > >> > > On Mon, Jan 5, 2015 at 8:29 PM, Lee S <sle...@gmail.com> wrote: >> > > >> > > > Hi, I used the synthetic data to test the kmeans method. >> > > > And I write the code own to convert center points to sequecefiles. >> > > > Then I ran the kmeans with parameter( -i input -o output -c center >> -x 3 >> > > -cd >> > > > 1 -cl) , >> > > > I compared the dumped clusteredPoints with the result of >> scikit-learn >> > > kmens >> > > > result, it's totally different. I'm very confused. >> > > > >> > > > Does anybody ever run kmeans with center points provided and compare >> > the >> > > > result with other ml-library? >> > > > >> > > >> > >> > >