If the Vector->MSCanopy pre-job outputs all of its canopies then each of
those canopies would contain the generated canopyId and its canopy
center would contain the original vector with its docId. Seems like one
could use that data set to get the membership information in a separate
post-processing step. Certainly the post-processing job should be for
later, after the List<Vector> -> List<canopyId> optimization.
Robin Anil (JIRA) wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-304:
------------------------------
Attachment: MAHOUT-304.patch
Jeff, Meanshift uses only ids generated by the mapper to keep vector membership.
I dont yet see how you can get the membership information i.e Vector docid =>
Canopy Id. Isnt that job missing? Maybe for later 0.4?
MeanShift doesn't read from VectorWritable
------------------------------------------
Key: MAHOUT-304
URL: https://issues.apache.org/jira/browse/MAHOUT-304
Project: Mahout
Issue Type: Improvement
Components: Clustering
Affects Versions: 0.3
Reporter: Robin Anil
Assignee: Robin Anil
Fix For: 0.3
Attachments: MAHOUT-304.patch, MAHOUT-304.patch, MAHOUT-304.patch
Need an M/R job for converting sequence file containing VectorWritable to MeanShiftCanopy before the MeanShift M/R