Hi,

  I am using spark k mean for clustering records that consist of news
documents, vectors are created by applying tf-idf. Dataset that I am using
for testing right now is the gold-truth classified
http://qwone.com/~jason/20Newsgroups/

Issue is all the documents are getting assigned to same cluster and others
just have the vector(doc) picked as cluster center(skewed clustering). What
could be the possible reasons for the issue, any suggestions? Should I be
retuning the epsilon?

Reply via email to