Hi, I have implemented T. Hoffmann's PLSI based on EM algorithm in pig. The E/M login was implemented in pig in ~ 30-35 lines of pig-latin statements. The implementation is available in mahout as a part of the following patch : https://issues.apache.org/jira/browse/MAHOUT-106.
Though the code works fine, would appreciate any feedback on the scalability aspects of the pig implementation, as there are some joins/cogroups used to compute the estimated probabilities of p(s|z) and p(z|u). -Prasen
