[ 
https://issues.apache.org/jira/browse/MAHOUT-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Anil resolved MAHOUT-1117.
--------------------------------

    Resolution: Won't Fix
    
> Vectors are not hashable
> ------------------------
>
>                 Key: MAHOUT-1117
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1117
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Dan Filimon
>            Priority: Minor
>
> No *Vector classes (DenseVector, WeightedVector, etc.) implement hashCode().
> In working on improving clustering in Mahout, Ted Dunning wrote prototype 
> code for Streaming KMeans and Ball KMeans, that I'm working with him on. 
> These need to be used together in the MapReduce version.
> However, in Ball KMeans, we initialize the clusters using a probabilistic 
> approach similar to k-means++. This however requires a 
> Multinomial<WeightedVector> distribution of the points we want to cluster to 
> pick the centroids.
> Internally, the Multinomial<T> uses a HashMap to keep track of the values it 
> can sample from.
> Since Vectors don't override Object's hashCode(), it is possible to get the 
> same value multiple times in the map (as long as the references differ).
> This is less of an issue because of how we're adding the vectors to the 
> multinomial (we can guarantee that the references will be unique) and once 
> MAHOUT-1116 is resolved the hashing will work okay for our needs.
> It still seems that it would be useful to have hashable vectors.
> What do you think? And what would a hash function look like?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to