Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-66563244 @tgaloppo Thanks very much for the PR, and sincere apologies for the slow response about it! @manishamde was right about people being too preoccupied with the 1.2 release. It will be great to get GMMs into MLlib though! Iâve added some inline comments, and have put a few general comments below. Weâre moving away from some of the old API conventions, and it would be nice to try to fit this to match newer APIs (especially the experimental spark.ml branch). In particular, the static train() methods are part of the old API: Itâs becoming hard to maintain train() methods with explicit lists of parameters (as we add more parameters to existing algorithms). Weâd prefer to stick with the builder pattern you have implemented in GMMExpectationMaximization, where you can call setter methods (setK, etc.) to set parameters before calling run(). * I would recommend eliminating object GMMExpectationMaximization and keeping the class API basically as is. * Could you please add getter methods (getK, etc.) to the class GMMExpectationMaximization? Tests: Iâd recommend adding more tests. Itâs good to test a single cluster, as you did. It might also be good to test multiple clusters, where you take precautions to make sure the test will almost certainly succeed (e.g., use pre-selected random seeds or enough trials). Iâll think more about possible tests. Scaling: It will be important to get a sense of how efficient/scalable the implementation is. Would you be able to run tests on a small cluster? If not, the community might be able to help. Side note: I noticed this is a PR from your master branch. Itâs generally easier to create a separate branch for each PR you plan to contribute.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org