Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-69236610 @akopich I had hoped to get this into MLlib, but after more consideration, I believe it is too complex. Currently, what would be ideal is a simple implementation of LDA since that is all that most users need. While generalizations like robust PLSA may outperform LDA with proper tuning, itâs somewhat of a research area, and it may be better to go with LDA since it has been very widely tested and used. However, I am sure some users would want to use your implementation of Robust PLSA, so it would be valuable for you to make it available as a package for Spark. The best path right now, I believe, will be to create a simple PR with a minimal public API, where that API should be extensible with (a) extra parameters/features and (b) alternate optimization/learning algorithms. I've posted a public design doc on the LDA JIRA [here](https://issues.apache.org/jira/browse/SPARK-1405), and Iâm going to submit such a PR. I would of course appreciate your feedback on it. Thanks very much for your understanding. When we merge the initial LDA PR, @mengxr will be sure to include all of those who have participated as authors of Spark LDA PRs: @akopich @witgo @yinxusen @dlwh @EntilZha @jegonzal CC: @mengxr
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org