Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/3022#issuecomment-66563244
  
    @tgaloppo  Thanks very much for the PR, and sincere apologies for the slow 
response about it!  @manishamde was right about people being too preoccupied 
with the 1.2 release.  It will be great to get GMMs into MLlib though!
    
    I’ve added some inline comments, and have put a few general comments 
below.
    
    We’re moving away from some of the old API conventions, and it would be 
nice to try to fit this to match newer APIs (especially the experimental 
spark.ml branch).  In particular, the static train() methods are part of the 
old API: It’s becoming hard to maintain train() methods with explicit lists 
of parameters (as we add more parameters to existing algorithms).  We’d 
prefer to stick with the builder pattern you have implemented in 
GMMExpectationMaximization, where you can call setter methods (setK, etc.) to 
set parameters before calling run().
      * I would recommend eliminating object GMMExpectationMaximization and 
keeping the class API basically as is.
      * Could you please add getter methods (getK, etc.) to the class 
GMMExpectationMaximization?
    
    Tests: I’d recommend adding more tests.  It’s good to test a single 
cluster, as you did.  It might also be good to test multiple clusters, where 
you take precautions to make sure the test will almost certainly succeed (e.g., 
use pre-selected random seeds or enough trials).  I’ll think more about 
possible tests.
    
    Scaling: It will be important to get a sense of how efficient/scalable the 
implementation is.  Would you be able to run tests on a small cluster?  If not, 
the community might be able to help.
    
    Side note: I noticed this is a PR from your master branch.  It’s 
generally easier to create a separate branch for each PR you plan to contribute.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to