[GitHub] [spark] luhenry edited a comment on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

GitBox Tue, 04 May 2021 15:16:41 -0700


luhenry edited a comment on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832285023



   Some runs with various values of `max_iter`:
   __100:__
   
![image](https://user-images.githubusercontent.com/660779/117075564-4fe1c480-ad35-11eb-889a-6df72eecacbc.png)
   
   __300:__
   
![image](https://user-images.githubusercontent.com/660779/117075724-96cfba00-ad35-11eb-9831-6c0640c735ea.png)
   
   __1,000:__
   
![image](https://user-images.githubusercontent.com/660779/117075621-6be56600-ad35-11eb-81e2-a537936593cd.png)
   
   However, when playing with the parallelism provided by Spark, that is where 
I can easily reproduce the huge variation for `summary.logLikelihood`.
   
   __`pyspark.SparkContext('local[1]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076248-5f154200-ad36-11eb-8c47-38676dabede2.png)
   
   __`pyspark.SparkContext('local[2]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076284-705e4e80-ad36-11eb-8bd1-291b7bd2a21d.png)
   
   __`pyspark.SparkContext('local[3]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076424-a6033780-ad36-11eb-88c3-be61700a45fd.png)
   
   __`pyspark.SparkContext('local[4]')`:__
   
![image](https://user-images.githubusercontent.com/660779/117076684-15792700-ad37-11eb-88c2-5b6c70f4e9b4.png)
   
   That is all with Spark 3.1.1 and so unrelated to my changes. I would say 
that the flakiness observed in this PR is explained by this change as if it's a 
potential race in the implementation, changing the performance profile of the 
underlying operations can make the race more likely to happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] luhenry edited a comment on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

Reply via email to