thanks for the quick response. your results make much more sense. May
be I am doing something wrong with the lingpipe package and thats why
found it difficult to interpret the lingpipe output :
And my U,V matrices are :
>> U:
0 1
0 -0.69 0.65
1 0.0539 -0.205
2 -0.721 0.728
>> V:
0 1
0 -0.74 -0.141
1 0.35 -0.613
2 -0.494 0.776
3 0.271 0.0207
-Prasen
On Sun, Oct 18, 2009 at 10:58 PM, Ted Dunning <[email protected]> wrote:
> I have not worked with lingpipe, but ...
>
> When I follow the steps you are taking using R, I get this:
>
> *> docs=data.frame(d0=c(2,2,0,0), d1=c(2,2,0,0), d2=c(0,0,2,2),
> row.names=c("t0","t1","t2","t3"))
>> docs
> d0 d1 d2
> t0 2 2 0
> t1 2 2 0
> t2 0 0 2
> t3 0 0 2
>> svd(docs)
> $d
> [1] 4.000000 2.828427 0.000000
>
> $u
> [,1] [,2] [,3]
> [1,] -0.7071068 0.0000000 -0.7071068
> [2,] -0.7071068 0.0000000 0.7071068
> [3,] 0.0000000 -0.7071068 0.0000000
> [4,] 0.0000000 -0.7071068 0.0000000
>
> $v
> [,1] [,2] [,3]
> [1,] -0.7071068 0 -0.7071068
> [2,] -0.7071068 0 0.7071068
> [3,] 0.0000000 -1 0.0000000
> *
>
> Note how my document matrix differs substantially from yours, but that is
> simply because we are using different representations. You have lines that
> have triples containing document number, term number and count, I have the
> resulting matrix.
>
> As far as my results are concerned, the diagonal component of the svd
> (labeled $d above) clearly shows that there are only 2 singular values.
> This means that the first two columns of u and v are the only ones necessary
> for reconstructing my docs matrix. The third vector in each represents the
> null space of the document matrix.
>
> Moreover, if you look at the first two columns of my u vector, you see a
> representation that show that documents tend to contain t0 and t1 in equal
> number or they contain t2 and t3 in equal number but they don't tend to
> contain any other pattern. Singular vectors are not normally so easy to
> interpret.
>
> For reference, I normally prefer document x term matrices. Here is that
> form of the computation:
>
> *> docs=data.frame(t0=c(2,2,0), t1=c(2,2,0), t2=c(0,0,2), t3=c(0,0,2),
> row.names=c("d0","d1","d2"))
>> docs
> t0 t1 t2 t3
> d0 2 2 0 0
> d1 2 2 0 0
> d2 0 0 2 2
>> svd(docs)
> $d
> [1] 4.000000 2.828427 0.000000
>
> $u
> [,1] [,2] [,3]
> [1,] 0.7071068 0 -0.7071068
> [2,] 0.7071068 0 0.7071068
> [3,] 0.0000000 1 0.0000000
>
> $v
> [,1] [,2] [,3]
> [1,] 0.7071068 0.0000000 -0.7071068
> [2,] 0.7071068 0.0000000 0.7071068
> [3,] 0.0000000 0.7071068 0.0000000
> [4,] 0.0000000 0.7071068 0.0000000
>
> *
> The results are the same, of course with some names changed.
>
> On Sun, Oct 18, 2009 at 12:46 AM, prasenjit mukherjee
> <[email protected]>wrote:
>
>> Apologies, as I know the question is actually for lingpipe, but was
>> hoping if I could get some response from mahout users as well ( who
>> has probably worked with lingpipe )
>>
>>
>> ---------- Forwarded message ----------
>> From: prasenjit mukherjee <[email protected]>
>> Date: Sun, Oct 18, 2009 at 12:39 PM
>> Subject: problem Interpreting SVD values
>> To: lingpipe <[email protected]>
>>
>>
>> I am trying to evaluate partialSvd() on a smaller matrix and this is
>> what my findings are. Below is my input matrix, assuming 4 terms and 3
>> docs.
>>
>> doc0 => (2,t0) (2,t1)
>> doc1 => (2,t0) (2,t1)
>> doc2 => (2,t2) (2,t3)
>>
>> As one can see docs d0,d1 are exactly same containing 4 terms with 2
>> from t0,t1 each. 3rd doc is different containing 4 terms with 2 from
>> t2,t3 each. Below is their matrix representation ( in TXD form ) :
>>
>> 0,0,2
>> 0,1,2
>> 1,0,2
>> 1,1,2
>> 2,2,2
>> 2,3,2
>>
>> I ran with maxOrder =2 and following input params :
>> double featureInit = 0.01;
>> double initialLearningRate = 0.005;
>> int annealingRate = 1000;
>> double regularization = 0.00;
>> double minImprovement = 0.0001;
>> int minEpochs = 2;
>> int maxEpochs = 100;//50000;
>> and was expecting to get d0,d1 in 1 cluster and d2 in another.
>> Contrary to my expectation I am getting the following output ( See U,V
>> values) :
>>
>> [java] :00 Start
>> [java] :00 Factor=0
>> [java] :00 epoch=0 rmse=1.9999848100360043
>> [java] :00 epoch=1 rmse=1.9999835637692873
>> [java] :00 epoch=2 rmse=1.999982296871324
>> [java] :00 Converged in epoch=2 rmse=1.999982296871324
>> relDiff=3.167271940722782E-7
>> [java] :00 Order=0 RMSE=1.9999835637692873
>> [java] :00 Factor=1
>> [java] :00 epoch=0 rmse=1.9999522133829444
>> [java] :00 epoch=1 rmse=1.9999506819096369
>> [java] :00 epoch=2 rmse=1.99994912138043
>> [java] :00 Converged in epoch=2 rmse=1.99994912138043
>> relDiff=3.901420744799641E-7
>> [java] :00 Order=1 RMSE=1.9999506819096369
>> [java] SVD Computation Done. Singular Values:
>> [java] 2.796903874825226E-4 2.536844759290206E-4
>> [java] Output U_Matrix: ./rundir/U_out.matrix
>> [java] Output V_Matrix: ./rundir/V_out.matrix
>>
>>
>> And my U,V matrices are :
>> U:
>> 0,0,-0.690807182791581
>> 0,1,0.6535363126818338
>> 1,0,0.053924014251416
>> 1,1,-0.2055548955329534
>> 2,0,-0.7210254065499858
>> 2,1,0.7284486755624372
>>
>> Shouldn't the coeffs of 0 and 1s be the same in U, because they refer
>> to d0 and d1 ?
>>
>> V:
>> 0,0,-0.7473523845369358
>> 0,1,-0.14168050325102471
>> 1,0,0.35114591804331297
>> 1,1,0.6137947267695599
>> 2,0,-0.4945242525093567
>> 2,1,0.776371576839163
>> 3,0,0.27130558646761577
>> 3,1,0.02073265696164584
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>