I will file and work on a patch for SSVD to propagate named vectors (if present). This is trivial. + USigma output. Will publish in a few in my github.
On Wed, Sep 12, 2012 at 9:24 AM, Pat Ferrel <[email protected]> wrote: > There appears to be a gap in the pipeline SSVD-->Clustering. It can be > patched in a couple ways so can the devs please advise before we make a patch: > > The Issues: > * There is currently no output from clustering that maps input vectors to > clusters, unless you input NamedVectors to clustering. > * SSVD does not output NamedVectors even if they are input. > > Solutions: > 1. We could modify clustering to output in the file > clusteredPoints/part-xxxx ID-Vector pairs, Where IDs are Keys of the original > input vectors and the Vector would be the original input VectorWritable. This > might be done by replacing the WeightedVectorWritable with a > WeightedPropertyVectorWritable and putting the ID in properties. This would > require a change in the clustering classifier but no change to SSVD or the > rest of clustering. This would impact anyone using clusteredPoints since they > would have to deal with a new output vector type (actually wasn't this file > using WeightedPropertyVectorWritable before the mahout 0.7 refactoring?) > 2. We could alter SSVD to output NamedVectors and Clustering would simply > pass them through without modification as it does today. This would require a > change to SSVD but not to Clustering. Since NamedVectors seems to be the only > way to perform this mapping now, there would be very little impact on current > users. > > Afaict one of these has to be done and they are not mutually exclusive. Any > advice? >
