[ 
https://issues.apache.org/jira/browse/SPARK-32414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cyrille cazenave updated SPARK-32414:
-------------------------------------
    Attachment: spark.py

> pyspark crashes in cluster mode with kafka structured streaming
> ---------------------------------------------------------------
>
>                 Key: SPARK-32414
>                 URL: https://issues.apache.org/jira/browse/SPARK-32414
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>         Environment: * spark version 3.0.0 from mac brew
>  * kubernetes Kind 18+
>  * kafka cluster: strimzi/kafka:0.18.0-kafka-2.5.0
>  * kafka package: org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0
>            Reporter: cyrille cazenave
>            Priority: Major
>         Attachments: fulllogs.txt, spark.py
>
>
> Hello,
> {{I have been trying to run a pyspark script on Spark on Kubernetes and I 
> have this error that crashed the application:}}
> {{java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD)}}
>  
> I followed those steps:
>  * for spark on kubernetes: 
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that 
> include building the image using docker-image-tool.sh on mac with -p flag)
>  * Tried to use the image by the dev on 
> GoogleCloudPlatform/spark-on-k8s-operator 
> (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue
>  * for kafka streaming: 
> [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying]
>  * {{When running the script manually in a jupyter notebook 
> (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with 
> PYSPARK_SUBMIT_ARGS=--packages 
> org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran 
> without issue}}
>  * the command ran from the laptop is:
> spark-submit --master 
> k8s://[https://127.0.0.1:53979|https://127.0.0.1:53979/] --name spark-pi 
> --deploy-mode cluster --packages 
> org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf 
> spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.kubernetes.executor.request.cores=1 --conf 
> spark.kubernetes.driver.request.cores=1 --conf 
> spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py
>  
> {{full logs on the error in the attachements}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to