+1 my distinct favorite too

On 9/12/12 12:55 PM, Pat Ferrel wrote:
This is my personally favored solution. I wish NamedVectors were used in 
RowSimilarity too, and may submit a patch for it. If you output NamedVectors 
then they would enable the RowSimilarity patch too.

If you want someone to do some ad hoc testing with real world data, I'm in. 
I'll follow your github.

On Sep 12, 2012, at 9:42 AM, Dmitriy Lyubimov <[email protected]> wrote:

I will file and work on a patch for SSVD to propagate named vectors
(if present). This is trivial. + USigma output. Will publish in a few
in my github.

On Wed, Sep 12, 2012 at 9:24 AM, Pat Ferrel <[email protected]> wrote:
There appears to be a gap in the pipeline SSVD-->Clustering. It can be patched 
in a couple ways so can the devs please advise before we make a patch:

The Issues:
  * There is currently no output from clustering that maps input vectors to 
clusters, unless you input NamedVectors to clustering.
  * SSVD does not output NamedVectors even if they are input.

Solutions:
  1. We could modify clustering to output in the file clusteredPoints/part-xxxx 
ID-Vector pairs, Where IDs are Keys of the original input vectors and the 
Vector would be the original input VectorWritable. This might be done by 
replacing the WeightedVectorWritable with a WeightedPropertyVectorWritable and 
putting the ID in properties. This would require a change in the clustering 
classifier but no change to SSVD or the rest of clustering. This would impact 
anyone using clusteredPoints since they would have to deal with a new output 
vector type (actually wasn't this file using WeightedPropertyVectorWritable 
before the mahout 0.7 refactoring?)
  2. We could alter SSVD to output NamedVectors and Clustering would simply 
pass them through without modification as it does today. This would require a 
change to SSVD but not to Clustering. Since NamedVectors seems to be the only 
way to perform this mapping now, there would be very little impact on current 
users.

Afaict one of these has to be done and they are not mutually exclusive. Any 
advice?




Reply via email to