Andrey Davydov created MAHOUT-1130:
--------------------------------------

             Summary: Wrong logic in 
org.apache.mahout.clustering.kmeans.RandomSeedGenerator
                 Key: MAHOUT-1130
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1130
             Project: Mahout
          Issue Type: Bug
         Environment: mahout 0.7 from maven central
            Reporter: Andrey Davydov


There is following code in line 101:

              } else if (random.nextInt(currentSize + 1) != 0) { // with chance 
1/(currentSize+1) pick new element

but it actually means pick new element with chance currentSize/(currentSize+1)
so generator takes initial centers from the end of source data file.

It seems that chance of replace vector in output set should decrease with 
number of processed input vectors


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to