Re: problem Interpreting SVD values

prasenjit mukherjee Sun, 18 Oct 2009 20:20:05 -0700

thanks for the quick response.  your results make much more sense. May
be I am doing something wrong with the lingpipe package and thats why
found it difficult to interpret the lingpipe output :


 And my U,V matrices are :
>> U:
         0      1
0    -0.69     0.65
1   0.0539   -0.205
2  -0.721    0.728


>> V:
         0      1
0    -0.74    -0.141
1   0.35   -0.613
2  -0.494    0.776
3  0.271   0.0207

-Prasen

On Sun, Oct 18, 2009 at 10:58 PM, Ted Dunning <[email protected]> wrote:
> I have not worked with lingpipe, but ...
>
> When I follow the steps you are taking using R, I get this:
>
> *> docs=data.frame(d0=c(2,2,0,0), d1=c(2,2,0,0), d2=c(0,0,2,2),
> row.names=c("t0","t1","t2","t3"))
>> docs
>   d0 d1 d2
> t0  2  2  0
> t1  2  2  0
> t2  0  0  2
> t3  0  0  2
>> svd(docs)
> $d
> [1] 4.000000 2.828427 0.000000
>
> $u
>           [,1]       [,2]       [,3]
> [1,] -0.7071068  0.0000000 -0.7071068
> [2,] -0.7071068  0.0000000  0.7071068
> [3,]  0.0000000 -0.7071068  0.0000000
> [4,]  0.0000000 -0.7071068  0.0000000
>
> $v
>           [,1] [,2]       [,3]
> [1,] -0.7071068    0 -0.7071068
> [2,] -0.7071068    0  0.7071068
> [3,]  0.0000000   -1  0.0000000
> *
>
> Note how my document matrix differs substantially from yours, but that is
> simply because we are using different representations.  You have lines that
> have triples containing document number, term number and count, I have the
> resulting matrix.
>
> As far as my results are concerned, the diagonal component of the svd
> (labeled $d above) clearly shows that there are only 2 singular values.
> This means that the first two columns of u and v are the only ones necessary
> for reconstructing my docs matrix.  The third vector in each represents the
> null space of the document matrix.
>
> Moreover, if you look at the first two columns of my u vector, you see a
> representation that show that documents tend to contain t0 and t1 in equal
> number or they contain t2 and t3 in equal number but they don't tend to
> contain any other pattern.  Singular vectors are not normally so easy to
> interpret.
>
> For reference, I normally prefer document x term matrices.  Here is that
> form of the computation:
>
> *> docs=data.frame(t0=c(2,2,0), t1=c(2,2,0), t2=c(0,0,2), t3=c(0,0,2),
> row.names=c("d0","d1","d2"))
>> docs
>   t0 t1 t2 t3
> d0  2  2  0  0
> d1  2  2  0  0
> d2  0  0  2  2
>> svd(docs)
> $d
> [1] 4.000000 2.828427 0.000000
>
> $u
>          [,1] [,2]       [,3]
> [1,] 0.7071068    0 -0.7071068
> [2,] 0.7071068    0  0.7071068
> [3,] 0.0000000    1  0.0000000
>
> $v
>          [,1]      [,2]       [,3]
> [1,] 0.7071068 0.0000000 -0.7071068
> [2,] 0.7071068 0.0000000  0.7071068
> [3,] 0.0000000 0.7071068  0.0000000
> [4,] 0.0000000 0.7071068  0.0000000
>
> *
> The results are the same, of course with some names changed.
>
> On Sun, Oct 18, 2009 at 12:46 AM, prasenjit mukherjee
> <[email protected]>wrote:
>
>> Apologies, as I know the question is actually for lingpipe, but was
>> hoping if I could get some response from mahout users as well ( who
>> has probably worked with  lingpipe )
>>
>>
>> ---------- Forwarded message ----------
>> From: prasenjit mukherjee <[email protected]>
>> Date: Sun, Oct 18, 2009 at 12:39 PM
>> Subject: problem Interpreting SVD values
>> To: lingpipe <[email protected]>
>>
>>
>> I am trying to evaluate  partialSvd() on a smaller matrix and this is
>> what my findings are. Below is my input matrix, assuming 4 terms and 3
>> docs.
>>
>> doc0 => (2,t0) (2,t1)
>> doc1 => (2,t0) (2,t1)
>> doc2 => (2,t2) (2,t3)
>>
>> As one can see docs d0,d1 are exactly same containing 4 terms  with 2
>> from t0,t1 each.  3rd doc is different containing 4 terms with 2 from
>> t2,t3 each. Below is their matrix representation  ( in TXD form ) :
>>
>> 0,0,2
>> 0,1,2
>> 1,0,2
>> 1,1,2
>> 2,2,2
>> 2,3,2
>>
>> I ran with maxOrder =2 and following input  params :
>>        double featureInit = 0.01;
>>        double initialLearningRate = 0.005;
>>        int annealingRate = 1000;
>>        double regularization = 0.00;
>>        double minImprovement = 0.0001;
>>        int minEpochs = 2;
>>        int maxEpochs = 100;//50000;
>> and was expecting to get d0,d1 in 1 cluster and d2 in another.
>> Contrary to my expectation I am getting the following output ( See U,V
>> values) :
>>
>>     [java]       :00 Start
>>     [java]       :00   Factor=0
>>     [java]       :00     epoch=0 rmse=1.9999848100360043
>>     [java]       :00     epoch=1 rmse=1.9999835637692873
>>     [java]       :00     epoch=2 rmse=1.999982296871324
>>     [java]       :00 Converged in epoch=2 rmse=1.999982296871324
>> relDiff=3.167271940722782E-7
>>     [java]       :00 Order=0 RMSE=1.9999835637692873
>>     [java]       :00   Factor=1
>>     [java]       :00     epoch=0 rmse=1.9999522133829444
>>     [java]       :00     epoch=1 rmse=1.9999506819096369
>>     [java]       :00     epoch=2 rmse=1.99994912138043
>>     [java]       :00 Converged in epoch=2 rmse=1.99994912138043
>> relDiff=3.901420744799641E-7
>>     [java]       :00 Order=1 RMSE=1.9999506819096369
>>     [java] SVD Computation Done. Singular Values:
>>     [java]     2.796903874825226E-4  2.536844759290206E-4
>>     [java] Output U_Matrix: ./rundir/U_out.matrix
>>     [java] Output V_Matrix: ./rundir/V_out.matrix
>>
>>
>> And my U,V matrices are :
>> U:
>> 0,0,-0.690807182791581
>> 0,1,0.6535363126818338
>> 1,0,0.053924014251416
>> 1,1,-0.2055548955329534
>> 2,0,-0.7210254065499858
>> 2,1,0.7284486755624372
>>
>> Shouldn't the coeffs of 0 and 1s be the same in U, because they refer
>> to d0 and d1  ?
>>
>> V:
>> 0,0,-0.7473523845369358
>> 0,1,-0.14168050325102471
>> 1,0,0.35114591804331297
>> 1,1,0.6137947267695599
>> 2,0,-0.4945242525093567
>> 2,1,0.776371576839163
>> 3,0,0.27130558646761577
>> 3,1,0.02073265696164584
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: problem Interpreting SVD values

Reply via email to