[ 
https://issues.apache.org/jira/browse/SPARK-26387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26387.
-------------------------------
    Resolution: Not A Problem

It shouldn't have any effect. But, you might get different results on different 
runs if you don't fix a seed for k-fold cross validation. Reopen if that's not 
it, and you can maybe show a reproducer vs 2.4 or master.

> Parallelism seems to cause difference in CrossValidation model metrics
> ----------------------------------------------------------------------
>
>                 Key: SPARK-26387
>                 URL: https://issues.apache.org/jira/browse/SPARK-26387
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Evan Zamir
>            Priority: Major
>
> I can only reproduce this issue when running Spark on different Amazon EMR 
> versions, but it seems that between Spark 2.3.1 and 2.3.2 (corresponding to 
> EMR versions 5.17/5.18) the presence of the parallelism parameter was causing 
> AUC metric to increase. Literally, I run the same exact code with and without 
> parallelism and the AUC of my models (logistic regression) are changing 
> significantly. I can't find a previous bug report relating to this, so I'm 
> posting this as new.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to