On Jan 2, 2010, at 1:27 PM, Bogdan Vatkov wrote: > Thanks for the Luke hint, I will try it out but now I noticed something else > which is very very strange - I ran k-means on 23K+ docs and with 50 clusters > which all seem to be very very strange as top term collection - I would say > for 90% of the top terms I get some words which I barely recognize. > I did a short check and for one particular term, which anyway sounded > strange and which appeared in top terms for 9 of the 50 clusters, I found > that it has "doc freq" = 2 in the Solr dictionary. > How is this even possible - for 23, 000 docs and for a term which is > mentioned only 2 times I have it as a top term in 9 clusters? I definitely > did something wrong, do you have an idea what that could be?
What commands are you running? Can you share more about your setup or try to reproduce in a much smaller environment?
