[
https://issues.apache.org/jira/browse/MAHOUT-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018122#comment-13018122
]
Jeff Eastman commented on MAHOUT-552:
-------------------------------------
The initial processing step (createCanopyFromVectors) was using the default
MeanShiftCanopy constructor which inherited the AbstractCanopy default
constructor which converts all centers to RandomAccessSparseVectors. Since the
clustering (classification) step is done from these initial canopies rather
than from the original input vectors, this resulted in the type of the incoming
vectors to be lost. This is especially problematic when the input vector is
NamedVector.
I've created a static method initialCanopy() to use for this initial step which
retains the original input vector center type. I've added a unit test and
verified that the type is retained. Committing shortly.
> AbstractCluster eliminates NamedVectors by replacing them with
> RandomAccessSparseVector always
> ----------------------------------------------------------------------------------------------
>
> Key: MAHOUT-552
> URL: https://issues.apache.org/jira/browse/MAHOUT-552
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.5
> Reporter: Pere Ferrera Bertran
> Assignee: Jeff Eastman
> Priority: Minor
> Fix For: 0.5
>
> Attachments: MAHOUT-552.patch
>
>
> When clustering using NamedVectors as input - after running seq2sparse with
> patch https://issues.apache.org/jira/browse/MAHOUT-401 - names are lost
> because AbstractCluster replaces vectors coming in the constructor with
> RandomAccessSparseVector.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira