KMean clustering resulting Skewed Issue

Reth RM Fri, 24 Mar 2017 22:57:03 -0700

Hi,

  I am using spark k mean for clustering records that consist of news
documents, vectors are created by applying tf-idf. Dataset that I am using
for testing right now is the gold-truth classified
http://qwone.com/~jason/20Newsgroups/


Issue is all the documents are getting assigned to same cluster and others
just have the vector(doc) picked as cluster center(skewed clustering). What
could be the possible reasons for the issue, any suggestions? Should I be
retuning the epsilon?

KMean clustering resulting Skewed Issue

Reply via email to