I will file and work on a patch for SSVD to propagate named vectors
(if present). This is trivial. + USigma output. Will publish in a few
in my github.

On Wed, Sep 12, 2012 at 9:24 AM, Pat Ferrel <[email protected]> wrote:
> There appears to be a gap in the pipeline SSVD-->Clustering. It can be 
> patched in a couple ways so can the devs please advise before we make a patch:
>
> The Issues:
>   * There is currently no output from clustering that maps input vectors to 
> clusters, unless you input NamedVectors to clustering.
>   * SSVD does not output NamedVectors even if they are input.
>
> Solutions:
>   1. We could modify clustering to output in the file 
> clusteredPoints/part-xxxx ID-Vector pairs, Where IDs are Keys of the original 
> input vectors and the Vector would be the original input VectorWritable. This 
> might be done by replacing the WeightedVectorWritable with a 
> WeightedPropertyVectorWritable and putting the ID in properties. This would 
> require a change in the clustering classifier but no change to SSVD or the 
> rest of clustering. This would impact anyone using clusteredPoints since they 
> would have to deal with a new output vector type (actually wasn't this file 
> using WeightedPropertyVectorWritable before the mahout 0.7 refactoring?)
>   2. We could alter SSVD to output NamedVectors and Clustering would simply 
> pass them through without modification as it does today. This would require a 
> change to SSVD but not to Clustering. Since NamedVectors seems to be the only 
> way to perform this mapping now, there would be very little impact on current 
> users.
>
> Afaict one of these has to be done and they are not mutually exclusive. Any 
> advice?
>

Reply via email to