Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Jeff Eastman Sat, 17 Apr 2010 09:05:58 -0700

Looking at the KMeansClusterer.outputPointWithClusterInfo it seems thiscode will have to change in the patch but I haven't yet looked:


    String name = point.getName();

String key = (name != null) && (name.length() != 0) ? name :point.asFormatString();output.collect(new Text(key), newText(String.valueOf(nearestCluster.getId())));

Seems to me we need to rethink this step anyway if we are going toimplement the CDbw cluster evaluation algorithm. For that we need a jobstep that outputs [clusterId:Vector_as_Writable] sequence files so thatwe can iterate over them to find representative points. Is anybody usingthe current format who would be impacted by such a change?


Jeff

On 4/17/10 8:14 AM, Robin Anil (JIRA) wrote:

     [ 
https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858153#action_12858153
 ]

Robin Anil commented on MAHOUT-379:
-----------------------------------

If the id from the vector is removed, I believe it will affect all clustering 
algorithms. The final stage is generating the vector_id, cluster_id pair.  will 
have to verify if this doesn't affect that step

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Reply via email to