luhenry edited a comment on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832285023


   Some runs with various values of `max_iter`:
   __100:__
   
![image](https://user-images.githubusercontent.com/660779/117075564-4fe1c480-ad35-11eb-889a-6df72eecacbc.png)
   
   __300:__
   
![image](https://user-images.githubusercontent.com/660779/117075724-96cfba00-ad35-11eb-9831-6c0640c735ea.png)
   
   __1,000:__
   
![image](https://user-images.githubusercontent.com/660779/117075621-6be56600-ad35-11eb-81e2-a537936593cd.png)
   
   However, when playing with the parallelism provided by Spark, that is where 
I can easily reproduce the huge variation for `summary.logLikelihood`.
   
   __`pyspark.SparkContext('local[1]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076248-5f154200-ad36-11eb-8c47-38676dabede2.png)
   
   __`pyspark.SparkContext('local[2]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076284-705e4e80-ad36-11eb-8bd1-291b7bd2a21d.png)
   
   __`pyspark.SparkContext('local[3]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076424-a6033780-ad36-11eb-88c3-be61700a45fd.png)
   
   __`pyspark.SparkContext('local[4]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076684-15792700-ad37-11eb-88c2-5b6c70f4e9b4.png)
   
   That is all with Spark 3.1.1 and so unrelated to my changes. I would say 
that the flakiness observed in this PR is explained by this change as if it's a 
potential race in the implementation, changing the performance profile of the 
underlying operations can make the race more likely to happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to