Apologies, as I know the question is actually for lingpipe, but was hoping if I could get some response from mahout users as well ( who has probably worked with lingpipe )
---------- Forwarded message ---------- From: prasenjit mukherjee <[email protected]> Date: Sun, Oct 18, 2009 at 12:39 PM Subject: problem Interpreting SVD values To: lingpipe <[email protected]> I am trying to evaluate partialSvd() on a smaller matrix and this is what my findings are. Below is my input matrix, assuming 4 terms and 3 docs. doc0 => (2,t0) (2,t1) doc1 => (2,t0) (2,t1) doc2 => (2,t2) (2,t3) As one can see docs d0,d1 are exactly same containing 4 terms with 2 from t0,t1 each. 3rd doc is different containing 4 terms with 2 from t2,t3 each. Below is their matrix representation ( in TXD form ) : 0,0,2 0,1,2 1,0,2 1,1,2 2,2,2 2,3,2 I ran with maxOrder =2 and following input params : double featureInit = 0.01; double initialLearningRate = 0.005; int annealingRate = 1000; double regularization = 0.00; double minImprovement = 0.0001; int minEpochs = 2; int maxEpochs = 100;//50000; and was expecting to get d0,d1 in 1 cluster and d2 in another. Contrary to my expectation I am getting the following output ( See U,V values) : [java] :00 Start [java] :00 Factor=0 [java] :00 epoch=0 rmse=1.9999848100360043 [java] :00 epoch=1 rmse=1.9999835637692873 [java] :00 epoch=2 rmse=1.999982296871324 [java] :00 Converged in epoch=2 rmse=1.999982296871324 relDiff=3.167271940722782E-7 [java] :00 Order=0 RMSE=1.9999835637692873 [java] :00 Factor=1 [java] :00 epoch=0 rmse=1.9999522133829444 [java] :00 epoch=1 rmse=1.9999506819096369 [java] :00 epoch=2 rmse=1.99994912138043 [java] :00 Converged in epoch=2 rmse=1.99994912138043 relDiff=3.901420744799641E-7 [java] :00 Order=1 RMSE=1.9999506819096369 [java] SVD Computation Done. Singular Values: [java] 2.796903874825226E-4 2.536844759290206E-4 [java] Output U_Matrix: ./rundir/U_out.matrix [java] Output V_Matrix: ./rundir/V_out.matrix And my U,V matrices are : U: 0,0,-0.690807182791581 0,1,0.6535363126818338 1,0,0.053924014251416 1,1,-0.2055548955329534 2,0,-0.7210254065499858 2,1,0.7284486755624372 Shouldn't the coeffs of 0 and 1s be the same in U, because they refer to d0 and d1 ? V: 0,0,-0.7473523845369358 0,1,-0.14168050325102471 1,0,0.35114591804331297 1,1,0.6137947267695599 2,0,-0.4945242525093567 2,1,0.776371576839163 3,0,0.27130558646761577 3,1,0.02073265696164584
