I appreciate it and believe that this will help others too. I also agree that 
we should think this one through to see if it is the correct approach.

I need to figure out why the row ids/Keys of the input matrix are not getting 
through clustering. Key/row ids are getting through rowsimilarity when applied 
to U*S so why not clustering? In other words with rowsimilarity I can map the 
results back to the original input rows (documents in my case).

As to the U*S option, I agree. I modified the code to take a new --uSigma but 
it is mutually exclusive of --uHalfSigma and that indicates that neither option 
should be a boolean. They both also imply calculation of U. I assume this is 
what you meant below.

Another thing to consider and here my ignorance shows through… The U*S 
(equivalent to A*V) transform to V-space must be reversible so that humans can 
see results in terms of of the original term-space. Weights base on the new 
basis are not human understandable really. But setting me straight here may be 
another conversation.

On Sep 7, 2012, at 4:52 PM, Dmitriy Lyubimov <[email protected]> wrote:

I can do a patch to propagate names of named vectors from A to U too
if that's a requirement for what you do. But we need to make sure it
solves your problem. i am still not sure what are IDs in your
definition and what is required for k-means.

Thinking of that, it's probably a worthy patch anyway. I'll write
something up along with API changes for A*Sigma outputs. I think since
there are so many output options, they should be redesigned not to be
mutually exclusive.

On Fri, Sep 7, 2012 at 4:37 PM, Pat Ferrel <[email protected]> wrote:
> Yes, I would love to use namedvectors. But no matter doing a key to row 
> lookup is easy enough.
> 
> I'm not getting any id at all in the cluster data, not even a key for a row.
> 
> I'm beginning to think this is a clustering problem since rowsimilarity at 
> least gives me row keys to identify objects associated with an object.
> 
> On Sep 7, 2012, at 2:59 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> yeah seq2sparse seems to have -nv option now to churn out named
> vectors too. It doesn't seem to be listed in the MIA book though.
> 
> On Fri, Sep 7, 2012 at 2:55 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> On Fri, Sep 7, 2012 at 2:27 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> Sequence file keys, on the other hand, is
>>> what populated by seq2sparse output, so they are useful for mapping
>>> results to original documents.
>> 
>> Although honestly i am not so sure about seq2sparse anymore. There has
>> been some time since i looked at this for the last time.
> 

Reply via email to