[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788582#comment-17788582 ]
Bobby Wang commented on SPARK-46032: ------------------------------------ h1. *Submit the spark application via pyspark* I installed the pyspark connect dependencies by *`pip install pyspark[connect]`* h2. pyspark location ``` *(pyspark-connect) xxx@spark-bobby:~/Desktop/spark-connect$ which pyspark* /home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/bin/pyspark ``` *(pyspark-connect) xxx@spark-bobby:~/Desktop/spark-connect$ pyspark --remote sc://localhost* Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.5.0 /_/ Using Python version 3.10.0 (default, Mar 3 2022 09:58:08) Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. >>> spark.range(100).filter("id > 2").collect() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/bobwang/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", line 1645, in collect table, schema = self._session.client.to_table(query) File "/home/bobwang/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", line 858, in to_table table, schema, _, _, _ = self._execute_and_fetch(req) File "/home/bobwang/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", line 1282, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/bobwang/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", line 1263, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/bobwang/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", line 1502, in _handle_error self._handle_rpc_error(error) File "/home/bobwang/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", line 1538, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 33) (192.168.31.236 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD at java.base/java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2096) at java.base/java.io.ObjectStreamClass$FieldReflector.checkObjectFieldValueTypes(ObjectStreamClass.java:2060) at java.base/java.io.ObjectStreamClass.checkObjFieldValueTypes(ObjectStreamClass.java:1347) at java.base/java.io.ObjectInputStream$FieldValues.defaultCheckFieldValues(ObjectInputStream.java:2679) at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2486) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733) at java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2606) at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2457) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:509) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:467) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErr... >>> > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > --------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect > Affects Versions: 3.5.0 > Reporter: Bobby Wang > Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _____ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "<stdin>", line 1, in <module>_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_ > _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)_ > _at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)_ > _at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)_ > _at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)_ > _at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)_ > _at org.apache.spark.scheduler.Task.run(Task.scala:141)_ > _at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)_ > _at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)_ > _at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)_ > _at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)_ > _at org.apache.spark.executor.Executor$TaskRunner..._ > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org