On Mon, Sep 10, 2012 at 8:00 AM, Dmitriy Lyubimov <[email protected]> wrote:
> On Sep 10, 2012 7:26 AM, "Pat Ferrel" <[email protected]> wrote:
>> The problem with the options is that when you set --uHalfSigma OR
>> --vHalfSigma it sets the sVectors to sqrt and that will cause it to be
>> applied to U and V since UJob and VJob only check to see if the sVectors
>> exist and then they both apply them. In other words, either --uHalfSigma is
>> set OR --vHalfSigma will apply sqrt Sigma to BOTH U and V.
>
> I dont think this statement is correct. Ill check for it but i am pretty
> sure this is not how it works. Left and right singular vectors scaled to
> similar space individually on an explicit option to do so.
>

Yes i checked it and it is what i said.

>>
>> To do U*Sigma*V^t  the SSVD code would have to be changed or the U * Sigma
>> would have to be calculated outside SSVD (an ugly alternative).
>>
>> But please correct me where I'm wrong.
>>
>>
>> On Sep 8, 2012, at 10:52 AM, Pat Ferrel <[email protected]> wrote:
>>
>> I appreciate it and believe that this will help others too. I also agree
>> that we should think this one through to see if it is the correct approach.
>>
>> I need to figure out why the row ids/Keys of the input matrix are not
>> getting through clustering. Key/row ids are getting through rowsimilarity
>> when applied to U*S so why not clustering? In other words with rowsimilarity
>> I can map the results back to the original input rows (documents in my
>> case).
>>
>> As to the U*S option, I agree. I modified the code to take a new --uSigma
>> but it is mutually exclusive of --uHalfSigma and that indicates that neither
>> option should be a boolean. They both also imply calculation of U. I assume
>> this is what you meant below.
>>
>> Another thing to consider and here my ignorance shows through… The U*S
>> (equivalent to A*V) transform to V-space must be reversible so that humans
>> can see results in terms of of the original term-space. Weights base on the
>> new basis are not human understandable really. But setting me straight here
>> may be another conversation.
>>
>> On Sep 7, 2012, at 4:52 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>
>> I can do a patch to propagate names of named vectors from A to U too
>> if that's a requirement for what you do. But we need to make sure it
>> solves your problem. i am still not sure what are IDs in your
>> definition and what is required for k-means.
>>
>> Thinking of that, it's probably a worthy patch anyway. I'll write
>> something up along with API changes for A*Sigma outputs. I think since
>> there are so many output options, they should be redesigned not to be
>> mutually exclusive.
>>
>> On Fri, Sep 7, 2012 at 4:37 PM, Pat Ferrel <[email protected]> wrote:
>> > Yes, I would love to use namedvectors. But no matter doing a key to row
>> > lookup is easy enough.
>> >
>> > I'm not getting any id at all in the cluster data, not even a key for a
>> > row.
>> >
>> > I'm beginning to think this is a clustering problem since rowsimilarity
>> > at least gives me row keys to identify objects associated with an object.
>> >
>> > On Sep 7, 2012, at 2:59 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> >
>> > yeah seq2sparse seems to have -nv option now to churn out named
>> > vectors too. It doesn't seem to be listed in the MIA book though.
>> >
>> > On Fri, Sep 7, 2012 at 2:55 PM, Dmitriy Lyubimov <[email protected]>
>> > wrote:
>> >> On Fri, Sep 7, 2012 at 2:27 PM, Dmitriy Lyubimov <[email protected]>
>> >> wrote:
>> >> Sequence file keys, on the other hand, is
>> >>> what populated by seq2sparse output, so they are useful for mapping
>> >>> results to original documents.
>> >>
>> >> Although honestly i am not so sure about seq2sparse anymore. There has
>> >> been some time since i looked at this for the last time.
>> >
>>
>>

Reply via email to