[ https://issues.apache.org/jira/browse/MAHOUT-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Anil resolved MAHOUT-1117. -------------------------------- Resolution: Won't Fix > Vectors are not hashable > ------------------------ > > Key: MAHOUT-1117 > URL: https://issues.apache.org/jira/browse/MAHOUT-1117 > Project: Mahout > Issue Type: Improvement > Affects Versions: 1.0 > Reporter: Dan Filimon > Priority: Minor > > No *Vector classes (DenseVector, WeightedVector, etc.) implement hashCode(). > In working on improving clustering in Mahout, Ted Dunning wrote prototype > code for Streaming KMeans and Ball KMeans, that I'm working with him on. > These need to be used together in the MapReduce version. > However, in Ball KMeans, we initialize the clusters using a probabilistic > approach similar to k-means++. This however requires a > Multinomial<WeightedVector> distribution of the points we want to cluster to > pick the centroids. > Internally, the Multinomial<T> uses a HashMap to keep track of the values it > can sample from. > Since Vectors don't override Object's hashCode(), it is possible to get the > same value multiple times in the map (as long as the references differ). > This is less of an issue because of how we're adding the vectors to the > multinomial (we can guarantee that the references will be unique) and once > MAHOUT-1116 is resolved the hashing will work okay for our needs. > It still seems that it would be useful to have hashable vectors. > What do you think? And what would a hash function look like? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira