zhengruifeng opened a new pull request #27784: [SPARK-31032][ML] GMM compute 
summary and update distributions in one pass
URL: https://github.com/apache/spark/pull/27784
 
 
   ### What changes were proposed in this pull request?
   1, compute summary and update distributions in one pass;
   2, remove logic related to check `shouldDistributeGaussians`
   
   ### Why are the changes needed?
   In current impl, GMM need to trigger two jobs at one iteration:
   1, one to compute summary;
   2, if `shouldDistributeGaussians = ((k - 1.0) / k) * numFeatures > 25.0`, 
trigger another to update distributions;
   
   `shouldDistributeGaussians` is almost true in practice, since numFeatures is 
likely to be greater than 25.
   
   We can use only one job to impl above computation.
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to