Hi, I am using spark k mean for clustering records that consist of news documents, vectors are created by applying tf-idf. Dataset that I am using for testing right now is the gold-truth classified http://qwone.com/~jason/20Newsgroups/
Issue is all the documents are getting assigned to same cluster and others just have the vector(doc) picked as cluster center(skewed clustering). What could be the possible reasons for the issue, any suggestions? Should I be retuning the epsilon?