Hi, Deb.

Thanks for your idea to use ALS for PLSA training. I discussed it with our
engineers and it seems it's better to use EM. We have the following points:

1. We have some doubts that ALS is applicable to the problem. By its
definition, PLSA is a matrix decomposition with respect to Kullback–Leibler
divergence. Unlike matrix decomposition with respect to Frobenius' distance,
this problem lacks convexity. Actually, PLSA has multiple solutions and
EM-algorithm is experimentally proven to obtain "good" local maxima. By the
way, Kullback–Leibler divergence is not symmetric, does not satisfy to
triangle inequality and the objects it's defined on do not form the linear
space -- would not it be a problem for ALS? For instance, does not ALS rely
on the fact that L_2 is defined on the object that form self-dual space?  

2. Using EM-algorithm is a common way for PLSA training described in most
known papers. "Contributing to Spark" page says, "a new algorithm" "should
be used and accepted" and "widely known". We have found no publications
describing PLSA training via ALS. So, we are not sure if it will provide
results of comparable quality to EM-algorithm. That could be an issue for
research.

3. PLSA objective function is non-differentiable in some points (e.g.
\phi_{wt} = 0 \forall t), EM-algorithm is theoretically proven to avoid such
points. We are afraid, ALS is likely to fall into this trap.

4. We also suppose that EM is faster for this problem. PLSA has D*T + W*T
parameters (where D stand for the number of documents, T - for the number of
topics, W - for the size of the alphabet). Thus, Hessian matrix has size
(D*T + W*T)^2 -- that's a lot. Note, EM-algorithm has O(D*T*W) complexity
for every iteration. Probably, it's possible to use a diagonalized Hessian,
but it will increase the number of iterations needed for convergence and we
think EM-algorithm will outperform it due to the fact that we have a very
simple analytical solution for E-step. (There was a story in 80's about
EM-algorithms and second-order methods. Second-order methods were believed
to outperform EM-algorithm unless someone has proven that it's not necessary
to conduct precise optimization during E-step -- it's enough to increase the
objective function. This idea allowed to speed up E-step and EM-algorithm
became superior to second-order methods. Once again, we have a very simple
analytical solution for E-step -- it's very fast).

5. As far as we can understand, RALS handles only quadratic regularizers 
(maybe it's possible to substitute a quadratic approximation at every
iteration, but we've no idea why it must work). In PSLA we want to allow
user to define every regularizer she wants.

6. There are only a few broadcasts in our implementation. Only \phi matrix
is  broadcasted (yes, we have to call broadcast(...) method in three
different places, but we broadcast only once in iteration). 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/PLSA-tp7170p7194.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to