Hi, Deb. Thanks for your idea to use ALS for PLSA training. I discussed it with our engineers and it seems it's better to use EM. We have the following points:
1. We have some doubts that ALS is applicable to the problem. By its definition, PLSA is a matrix decomposition with respect to Kullback–Leibler divergence. Unlike matrix decomposition with respect to Frobenius' distance, this problem lacks convexity. Actually, PLSA has multiple solutions and EM-algorithm is experimentally proven to obtain "good" local maxima. By the way, Kullback–Leibler divergence is not symmetric, does not satisfy to triangle inequality and the objects it's defined on do not form the linear space -- would not it be a problem for ALS? For instance, does not ALS rely on the fact that L_2 is defined on the object that form self-dual space? 2. Using EM-algorithm is a common way for PLSA training described in most known papers. "Contributing to Spark" page says, "a new algorithm" "should be used and accepted" and "widely known". We have found no publications describing PLSA training via ALS. So, we are not sure if it will provide results of comparable quality to EM-algorithm. That could be an issue for research. 3. PLSA objective function is non-differentiable in some points (e.g. \phi_{wt} = 0 \forall t), EM-algorithm is theoretically proven to avoid such points. We are afraid, ALS is likely to fall into this trap. 4. We also suppose that EM is faster for this problem. PLSA has D*T + W*T parameters (where D stand for the number of documents, T - for the number of topics, W - for the size of the alphabet). Thus, Hessian matrix has size (D*T + W*T)^2 -- that's a lot. Note, EM-algorithm has O(D*T*W) complexity for every iteration. Probably, it's possible to use a diagonalized Hessian, but it will increase the number of iterations needed for convergence and we think EM-algorithm will outperform it due to the fact that we have a very simple analytical solution for E-step. (There was a story in 80's about EM-algorithms and second-order methods. Second-order methods were believed to outperform EM-algorithm unless someone has proven that it's not necessary to conduct precise optimization during E-step -- it's enough to increase the objective function. This idea allowed to speed up E-step and EM-algorithm became superior to second-order methods. Once again, we have a very simple analytical solution for E-step -- it's very fast). 5. As far as we can understand, RALS handles only quadratic regularizers (maybe it's possible to substitute a quadratic approximation at every iteration, but we've no idea why it must work). In PSLA we want to allow user to define every regularizer she wants. 6. There are only a few broadcasts in our implementation. Only \phi matrix is broadcasted (yes, we have to call broadcast(...) method in three different places, but we broadcast only once in iteration). -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/PLSA-tp7170p7194.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.