I dont undestand this. What is ahat and how is it different from the input
with subtacted mean?

Ssvd pca doesnt form A-M becuase that thing is dense and that's the whole
point of pca option, i.e to avoid intermediate ugly and big dense product
and svd operation on it. The space required for A-M will be several orders
of magnitide bigger than already sufficiently large input and running svd
on it will take firever because ssvd flops are f^3. Bottom line ssvd pca
cannot produce A-M simply because it never computes it.

If you need A-M, i guess you could form it yourself, but in any sound
architecture it d be much better to form rows of A-M on the fly instead of
computing and storing them.

Still though i havent yet figured what you are trying to accomplish.

On Sep 10, 2012 7:26 AM, "Pat Ferrel" <[email protected]> wrote:
>
> Another issue with the SSVD job options is that if you want to use SSVD
but keep the input in the original dimention-space (term-space in many
cases) you would do the following
> * create input matrix A based on input dimensions (terms)
> * calculate the full transform, which retains the output in "term-space"
AHat = U*Sigma*V^t
> * at the end AHat should be in term-space but transformed by DR and Sigma
weighted for PCA, right?
> * then AHat can be substituted for A where analysis or examination needs
the original dimension definitions (terms).
>
> The problem with the options is that when you set --uHalfSigma OR
--vHalfSigma it sets the sVectors to sqrt and that will cause it to be
applied to U and V since UJob and VJob only check to see if the sVectors
exist and then they both apply them. In other words, either --uHalfSigma is
set OR --vHalfSigma will apply sqrt Sigma to BOTH U and V.

I dont think this statement is correct. Ill check for it but i am pretty
sure this is not how it works. Left and right singular vectors scaled to
similar space individually on an explicit option to do so.

>
> To do U*Sigma*V^t  the SSVD code would have to be changed or the U *
Sigma would have to be calculated outside SSVD (an ugly alternative).
>
> But please correct me where I'm wrong.
>
>
> On Sep 8, 2012, at 10:52 AM, Pat Ferrel <[email protected]> wrote:
>
> I appreciate it and believe that this will help others too. I also agree
that we should think this one through to see if it is the correct approach.
>
> I need to figure out why the row ids/Keys of the input matrix are not
getting through clustering. Key/row ids are getting through rowsimilarity
when applied to U*S so why not clustering? In other words with
rowsimilarity I can map the results back to the original input rows
(documents in my case).
>
> As to the U*S option, I agree. I modified the code to take a new --uSigma
but it is mutually exclusive of --uHalfSigma and that indicates that
neither option should be a boolean. They both also imply calculation of U.
I assume this is what you meant below.
>
> Another thing to consider and here my ignorance shows through… The U*S
(equivalent to A*V) transform to V-space must be reversible so that humans
can see results in terms of of the original term-space. Weights base on the
new basis are not human understandable really. But setting me straight here
may be another conversation.
>
> On Sep 7, 2012, at 4:52 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
> I can do a patch to propagate names of named vectors from A to U too
> if that's a requirement for what you do. But we need to make sure it
> solves your problem. i am still not sure what are IDs in your
> definition and what is required for k-means.
>
> Thinking of that, it's probably a worthy patch anyway. I'll write
> something up along with API changes for A*Sigma outputs. I think since
> there are so many output options, they should be redesigned not to be
> mutually exclusive.
>
> On Fri, Sep 7, 2012 at 4:37 PM, Pat Ferrel <[email protected]> wrote:
> > Yes, I would love to use namedvectors. But no matter doing a key to row
lookup is easy enough.
> >
> > I'm not getting any id at all in the cluster data, not even a key for a
row.
> >
> > I'm beginning to think this is a clustering problem since rowsimilarity
at least gives me row keys to identify objects associated with an object.
> >
> > On Sep 7, 2012, at 2:59 PM, Dmitriy Lyubimov <[email protected]> wrote:
> >
> > yeah seq2sparse seems to have -nv option now to churn out named
> > vectors too. It doesn't seem to be listed in the MIA book though.
> >
> > On Fri, Sep 7, 2012 at 2:55 PM, Dmitriy Lyubimov <[email protected]>
wrote:
> >> On Fri, Sep 7, 2012 at 2:27 PM, Dmitriy Lyubimov <[email protected]>
wrote:
> >> Sequence file keys, on the other hand, is
> >>> what populated by seq2sparse output, so they are useful for mapping
> >>> results to original documents.
> >>
> >> Although honestly i am not so sure about seq2sparse anymore. There has
> >> been some time since i looked at this for the last time.
> >
>
>

Reply via email to