On Jan 2, 2010, at 1:27 PM, Bogdan Vatkov wrote:

> Thanks for the Luke hint, I will try it out but now I noticed something else
> which is very very strange - I ran k-means on 23K+ docs and with 50 clusters
> which all seem to be very very strange as top term collection - I would say
> for 90% of the top terms I get some words which I barely recognize.
> I did a short check and for one particular term, which anyway sounded
> strange and which appeared in top terms for 9 of the 50 clusters, I found
> that it has "doc freq" = 2 in the Solr dictionary.
> How is this even possible - for 23, 000 docs and for a term which is
> mentioned only 2 times I have it as a top term in 9 clusters? I definitely
> did something wrong, do you have an idea what that could be?

What commands are you running?

Can you share more about your setup or try to reproduce in a much smaller 
environment?

Reply via email to