[
https://issues.apache.org/jira/browse/MAHOUT-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935923#action_12935923
]
Ted Dunning commented on MAHOUT-552:
------------------------------------
{quote}
May I propose instead that the Name feature be promoted to the Vector interface?
{quote}
We started there and decided it was a bad thing (tm).
The rationale was that we wanted to allow existing non-Mahout vector
implementations to implement a simpler interface that was purely numerically
oriented. It was also desirable to have a very simple semantic for matrix
multiplication and pairwise operations while still having really high
performance. This is hard to do coherently with labels other than a dense
collection of integers.
There was also some controversy whether string labels were sufficient.
Ultimately, the solution was a named vector and named matrix that wraps an
ordinary matrix. This avoids forcing a tax on all implementations, but gives
the flexibility to use named objects.
For reference, I was one of the ones pushing original for labels all up and
down. My current position is the opposite.
> AbstractCluster eliminates NamedVectors by replacing them with
> RandomAccessSparseVector always
> ----------------------------------------------------------------------------------------------
>
> Key: MAHOUT-552
> URL: https://issues.apache.org/jira/browse/MAHOUT-552
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.5
> Reporter: Pere Ferrera Bertran
> Assignee: Jeff Eastman
> Fix For: 0.5
>
> Attachments: MAHOUT-552.patch
>
>
> When clustering using NamedVectors as input - after running seq2sparse with
> patch https://issues.apache.org/jira/browse/MAHOUT-401 - names are lost
> because AbstractCluster replaces vectors coming in the constructor with
> RandomAccessSparseVector.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.