Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67427124 I've been looking at the various topic modeling PRs (3 currently) to try to get a sense of how they compare in terms of accuracy and speed. By "scaling," I really meant speed, or comparing running times across implementations to get a sense of what is fastest & why. I'm envisioning the comparison on a small cluster at least; I'm hoping to run some such tests myself. Computing scaling curves as the # of machines increases would be awesome but should probably come later. For the test failures, I'll wait a little bit and then re-run the tests.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org