[ https://issues.apache.org/jira/browse/SPARK-40704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616168#comment-17616168 ]
Hyukjin Kwon commented on SPARK-40704: -------------------------------------- [~ggbaker] Please go ahead and upgrade cloudpickle. See also https://github.com/apache/spark/pull/34705 > Pyspark incompatible with Pypy > ------------------------------ > > Key: SPARK-40704 > URL: https://issues.apache.org/jira/browse/SPARK-40704 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Greg Baker > Priority: Major > > Starting Spark with a recent Pypy (>3.6) fails because of an incompatibility > between their pickle implementation and cloudpickle: > {quote}{{% PYSPARK_PYTHON=pypy3 ./bin/pyspark}} > {{...}} > {{ModuleNotFoundError: No module named '_pickle'}} > {quote} > > It seems to be related to [this cloudpickle > issue|https://github.com/cloudpipe/cloudpickle/issues/455], which has been > fixed upstream. I was able to work around by replacing the Spark-provided > cloudpickle (python/pyspark/cloudpickle) with the code from their git repo > (and deleting pyspark.zip to purge that copy). > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org