[ 
https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463610#comment-13463610
 ] 

Smita Wadhwa commented on MAHOUT-1080:
--------------------------------------

I have created a WeightedTextVectorWritable(vector,distance-from-the-centre, 
vectorId) to hold the output having vectorId as text . I have made it text for 
future use if its int/double or text - we can output as text.

PFA the patch for this fix having output vectorId given in the input. Updated 
the test cases and verified the output with unit test cases as well on haddop 
cluster.

The changes are done for both sequential and MR job both.
                
> Kmeans clustered output losses vectorId given in the input
> ----------------------------------------------------------
>
>                 Key: MAHOUT-1080
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1080
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>            Reporter: Smita Wadhwa
>         Attachments: kMeansClusterVectorId.diff
>
>
> The input to the Kmeans is Intwritable and vectorWritable 
> and the output of clustered points is clusterId 
> WeightedVectorWitable(vector,distance-from-the-centre)
> The information the id of the vector is lost in this processing . 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to