[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

Bobby Wang (Jira) Tue, 21 Nov 2023 16:45:05 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788581#comment-17788581
 ]


Bobby Wang commented on SPARK-46032:
------------------------------------

h1. Start spark-connect-server which will start Spark Driver connecting to 
Spark Standalone Cluster
{code:java}
start-connect-server.sh \
    --master spark://192.168.31.236:7077 \
    --packages org.apache.spark:spark-connect_2.12:3.5.0 \
    --conf spark.executor.cores=12 \
    --conf spark.task.cpus=1 \
    --executor-memory 30G \
    --conf spark.executor.resource.gpu.amount=1 \
    --conf spark.task.resource.gpu.amount=0.08 \
    --driver-memory 1G \ {code}
h2. Log 
{code:java}
Spark Command: /usr/lib/jvm/java-17-openjdk-amd64/bin/java -cp 
/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/conf/:/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/jars/*:/etc/hadoop
 -Xmx1G -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit 
--master spark://192.168.31.236:7077 --conf 
spark.executor.resource.gpu.amount=1 --conf spark.driver.memory=1G --conf 
spark.task.cpus=1 --conf spark.executor.cores=12 --conf 
spark.task.resource.gpu.amount=0.08 --class 
org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect 
server --packages org.apache.spark:spark-connect_2.12:3.5.0 --executor-memory 
30G spark-internal
========================================
23/11/22 08:41:59 WARN Utils: Your hostname, spark-xxx resolves to a loopback 
address: 127.0.1.1; using 192.168.31.236 instead (on interface wlp82s0)
23/11/22 08:41:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
:: loading settings :: url = 
jar:file:/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/xxx/.ivy2/cache
The jars for the packages stored in: /home/xxx/.ivy2/jars
org.apache.spark#spark-connect_2.12 added as a dependency
:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-e418a548-30e7-4001-8807-db0a39f1de7b;1.0
    confs: [default]
    found org.apache.spark#spark-connect_2.12;3.5.0 in central
    found org.spark-project.spark#unused;1.0.0 in local-m2-cache
:: resolution report :: resolve 153ms :: artifacts dl 3ms
    :: modules in use:
    org.apache.spark#spark-connect_2.12;3.5.0 from central in [default]
    org.spark-project.spark#unused;1.0.0 from local-m2-cache in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: 
org.apache.spark#spark-submit-parent-e418a548-30e7-4001-8807-db0a39f1de7b
    confs: [default]
    0 artifacts copied, 2 already retrieved (0kB/3ms)
23/11/22 08:41:59 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/11/22 08:41:59 INFO SparkConnectServer: Starting Spark session.
23/11/22 08:41:59 INFO SparkContext: Running Spark version 3.5.0
23/11/22 08:41:59 INFO SparkContext: OS info Linux, 6.2.0-36-generic, amd64
23/11/22 08:41:59 INFO SparkContext: Java version 17.0.8.1
23/11/22 08:41:59 INFO ResourceUtils: 
==============================================================
23/11/22 08:41:59 INFO ResourceUtils: No custom resources configured for 
spark.driver.
23/11/22 08:41:59 INFO ResourceUtils: 
==============================================================
23/11/22 08:41:59 INFO SparkContext: Submitted application: Spark Connect server
23/11/22 08:41:59 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 12, script: , vendor: , 
memory -> name: memory, amount: 30720, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: , gpu -> name: gpu, amount: 1, script: , 
vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0, gpu -> name: 
gpu, amount: 0.08)
23/11/22 08:41:59 INFO ResourceProfile: Limiting resource is cpus at 12 tasks 
per executor
23/11/22 08:41:59 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/11/22 08:41:59 INFO SecurityManager: Changing view acls to: xxx
23/11/22 08:41:59 INFO SecurityManager: Changing modify acls to: xxx
23/11/22 08:41:59 INFO SecurityManager: Changing view acls groups to: 
23/11/22 08:41:59 INFO SecurityManager: Changing modify acls groups to: 
23/11/22 08:41:59 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: xxx; groups with view 
permissions: EMPTY; users with modify permissions: xxx; groups with modify 
permissions: EMPTY
23/11/22 08:42:00 INFO Utils: Successfully started service 'sparkDriver' on 
port 41331.
23/11/22 08:42:00 INFO SparkEnv: Registering MapOutputTracker
23/11/22 08:42:00 INFO SparkEnv: Registering BlockManagerMaster
23/11/22 08:42:00 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/11/22 08:42:00 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/11/22 08:42:00 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/11/22 08:42:00 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-b608325b-7761-4ce4-b37e-28a7318d22c8
23/11/22 08:42:00 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
23/11/22 08:42:00 INFO SparkEnv: Registering OutputCommitCoordinator
23/11/22 08:42:00 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
23/11/22 08:42:00 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
23/11/22 08:42:00 INFO SparkContext: Added JAR 
file:///home/xxx/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.5.0.jar at 
spark://192.168.31.236:41331/jars/org.apache.spark_spark-connect_2.12-3.5.0.jar 
with timestamp 1700613719780
23/11/22 08:42:00 INFO SparkContext: Added JAR 
file:///home/xxx/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at 
spark://192.168.31.236:41331/jars/org.spark-project.spark_unused-1.0.0.jar with 
timestamp 1700613719780
23/11/22 08:42:00 INFO SparkContext: Added file 
file:///home/xxx/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.5.0.jar at 
spark://192.168.31.236:41331/files/org.apache.spark_spark-connect_2.12-3.5.0.jar
 with timestamp 1700613719780
23/11/22 08:42:00 INFO Utils: Copying 
/home/xxx/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.5.0.jar to 
/tmp/spark-1e350d33-d3b0-49ae-b4d4-17f672ce35e6/userFiles-ad0049a8-550a-4c7c-ae08-5b4be23ae221/org.apache.spark_spark-connect_2.12-3.5.0.jar
23/11/22 08:42:00 INFO SparkContext: Added file 
file:///home/xxx/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at 
spark://192.168.31.236:41331/files/org.spark-project.spark_unused-1.0.0.jar 
with timestamp 1700613719780
23/11/22 08:42:00 INFO Utils: Copying 
/home/xxx/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to 
/tmp/spark-1e350d33-d3b0-49ae-b4d4-17f672ce35e6/userFiles-ad0049a8-550a-4c7c-ae08-5b4be23ae221/org.spark-project.spark_unused-1.0.0.jar
23/11/22 08:42:00 INFO StandaloneAppClient$ClientEndpoint: Connecting to master 
spark://192.168.31.236:7077...
23/11/22 08:42:00 INFO TransportClientFactory: Successfully created connection 
to /192.168.31.236:7077 after 15 ms (0 ms spent in bootstraps)
23/11/22 08:42:00 INFO StandaloneSchedulerBackend: Connected to Spark cluster 
with app ID app-20231122084200-0000
23/11/22 08:42:00 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 44929.
23/11/22 08:42:00 INFO NettyBlockTransferService: Server created on 
192.168.31.236:44929
23/11/22 08:42:00 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
23/11/22 08:42:00 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 192.168.31.236, 44929, None)
23/11/22 08:42:00 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.31.236:44929 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.31.236, 
44929, None)
23/11/22 08:42:00 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 192.168.31.236, 44929, None)
23/11/22 08:42:00 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 192.168.31.236, 44929, None)
23/11/22 08:42:00 INFO StandaloneAppClient$ClientEndpoint: Executor added: 
app-20231122084200-0000/0 on worker-20231122083708-192.168.31.236-44911 
(192.168.31.236:44911) with 12 core(s)
23/11/22 08:42:00 INFO StandaloneSchedulerBackend: Granted executor ID 
app-20231122084200-0000/0 on hostPort 192.168.31.236:44911 with 12 core(s), 
30.0 GiB RAM
23/11/22 08:42:00 INFO SingleEventLogFileWriter: Logging events to 
file:/home/xxx/github/mytools/spark.home/spark-events/app-20231122084200-0000.inprogress
23/11/22 08:42:00 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20231122084200-0000/0 is now RUNNING
23/11/22 08:42:00 INFO StandaloneSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
23/11/22 08:42:00 INFO SparkConnectServer: Spark Connect server started at: 
0:0:0:0:0:0:0:0%0:15002
23/11/22 08:42:02 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.31.236:51630) with ID 0,  ResourceProfileId 0
23/11/22 08:42:02 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.31.236:43955 with 17.8 GiB RAM, BlockManagerId(0, 192.168.31.236, 
43955, None)
 {code}

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-46032
>                 URL: https://issues.apache.org/jira/browse/SPARK-46032
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 3.5.0
>            Reporter: Bobby Wang
>            Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _____              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "<stdin>", line 1, in <module>_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_
> _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)_
> _at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)_
> _at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)_
> _at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)_
> _at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)_
> _at org.apache.spark.scheduler.Task.run(Task.scala:141)_
> _at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)_
> _at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)_
> _at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)_
> _at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)_
> _at org.apache.spark.executor.Executor$TaskRunner..._
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

Reply via email to