[jira] [Created] (SPARK-47917) Accounting the impact of failures in spark jobs
Faiz Halde created SPARK-47917: -- Summary: Accounting the impact of failures in spark jobs Key: SPARK-47917 URL: https://issues.apache.org/jira/browse/SPARK-47917 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 3.5.1 Reporter: Faiz Halde Hello, In my organization, we have an accounting system for spark jobs that uses the task execution time to determine how much time a spark job uses the executors for and we use it as a way to segregate cost. We sum all the task times per job and apply proportions. Our clusters follow a 1 task per core model & this works well. A job goes through several failures during its run, due to executor failure, node failure ( spot interruptions ), and spark retries tasks & sometimes entire stages. We now want to account for this failure and determine what % of a job's total task time is due to these retries. Basically, if a job with failures & retries has a total task time of X, there is a X' representing the goodput of this job – i.e. a hypothetical run of the job with 0 failures & retries. In this case, ( X-X' ) / X quantifies the cost of failures. This form of accounting requires tracking execution history of each task i.e. tasks that compute the same logical partition of some RDD. This was quite easy with AQE disabled as stage ids never changed, but with AQE enabled that's no longer the case. Do you have any suggestions on how I can use the Spark event system? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794376#comment-17794376 ] Faiz Halde commented on SPARK-46032: So we had a dire need to fix this and the following patch worked for us [https://github.com/apache/spark/pull/44240] Not sure if it's the right root cause though > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ >
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791996#comment-17791996 ] Faiz Halde edited comment on SPARK-45255 at 12/1/23 10:40 AM: -- it seems to be failing on another issue now a fresh project with build.sbt ``` {{resolvers += "snaps" at "https://repository.apache.org/snapshots/"}} {{libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.1-SNAPSHOT"}} ``` ``` {{scala> val spark = SparkSession.builder().remote("sc://localhost").build()}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, char, java.lang.Object)'}} {{ at org.sparkproject.io.grpc.Metadata$Key.validateName(Metadata.java:754)}} {{ at org.sparkproject.io.grpc.Metadata$Key.(Metadata.java:762)}} {{ at org.sparkproject.io.grpc.Metadata$Key.(Metadata.java:671)}} {{ at org.sparkproject.io.grpc.Metadata$AsciiKey.(Metadata.java:971)}} {{ at org.sparkproject.io.grpc.Metadata$AsciiKey.(Metadata.java:966)}} {{ at org.sparkproject.io.grpc.Metadata$Key.of(Metadata.java:708)}} {{ at org.sparkproject.io.grpc.Metadata$Key.of(Metadata.java:704)}} {{ at org.apache.spark.sql.connect.client.SparkConnectClient$.(SparkConnectClient.scala:329)}} {{ at org.apache.spark.sql.connect.client.SparkConnectClient$.(SparkConnectClient.scala)}} {{ at org.apache.spark.sql.SparkSession$Builder.(SparkSession.scala:789)}} {{ at org.apache.spark.sql.SparkSession$.builder(SparkSession.scala:777)}} {{ ... 55 elided}} ``` was (Author: JIRAUSER300204): it seems to be failing on another issue now a fresh project with build.sbt ``` {{resolvers += "snaps" at "https://repository.apache.org/snapshots/"}} {{libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.1-SNAPSHOT"}} ``` ``` {{scala> import org.apache.spark.sql.SparkSession}} {{{}import org.apache.spark.sql.SparkSession{}}}{{{}scala> val spark = SparkSession.builder().remote("sc://localhost").build(){}}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator}} {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method)}} {{ at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)}} {{ at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)}} {{ at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)}} {{ at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ at io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49)}} {{ at org.apache.arrow.memory.NettyAllocationManager.(NettyAllocationManager.java:51)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerFactory.(DefaultAllocationManagerFactory.java:26)}} {{ at java.base/java.lang.Class.forName0(Native Method)}} {{ at java.base/java.lang.Class.forName(Class.java:375)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)}} {{ at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)}} {{ at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:47)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)}} {{ at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:61)}} {{ at org.apache.spark.sql.SparkSession.(SparkSession.scala:75)}} {{ at org.apache.spark.sql.SparkSession$.create(SparkSession.scala:737)}} {{ at org.apache.spark.sql.SparkSession$Builder.$anonfun$create$1(SparkSession.scala:805)}} {{ at scala.Option.getOrElse(Option.scala:189)}} {{ at
[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791996#comment-17791996 ] Faiz Halde commented on SPARK-45255: it seems to be failing on another issue now a fresh project with build.sbt ``` resolvers += "snaps" at "https://repository.apache.org/snapshots/; libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT" libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0-SNAPSHOT" ``` ``` scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> val spark = SparkSession.builder().remote("sc://localhost").build() warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation' java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49) at org.apache.arrow.memory.NettyAllocationManager.(NettyAllocationManager.java:51) at org.apache.arrow.memory.DefaultAllocationManagerFactory.(DefaultAllocationManagerFactory.java:26) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:375) at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108) at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98) at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772) at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24) at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83) at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:47) at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:24) at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485) at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:61) at org.apache.spark.sql.SparkSession.(SparkSession.scala:75) at org.apache.spark.sql.SparkSession$.create(SparkSession.scala:737) at org.apache.spark.sql.SparkSession$Builder.$anonfun$create$1(SparkSession.scala:805) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.create(SparkSession.scala:805) at org.apache.spark.sql.SparkSession$Builder.build(SparkSession.scala:795) ... 55 elided Caused by: java.lang.ClassNotFoundException: io.netty.buffer.PooledByteBufAllocator at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 85 more ``` > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Assignee: Herman van Hövell >Priority: Major > Fix For: 4.0.0, 3.5.1 > > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791996#comment-17791996 ] Faiz Halde edited comment on SPARK-45255 at 12/1/23 10:39 AM: -- it seems to be failing on another issue now a fresh project with build.sbt ``` {{resolvers += "snaps" at "https://repository.apache.org/snapshots/"}} {{libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0-SNAPSHOT"}} ``` ``` {{scala> import org.apache.spark.sql.SparkSession}} {{{}import org.apache.spark.sql.SparkSession{}}}{{{}scala> val spark = SparkSession.builder().remote("sc://localhost").build(){}}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator}} {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method)}} {{ at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)}} {{ at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)}} {{ at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)}} {{ at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ at io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49)}} {{ at org.apache.arrow.memory.NettyAllocationManager.(NettyAllocationManager.java:51)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerFactory.(DefaultAllocationManagerFactory.java:26)}} {{ at java.base/java.lang.Class.forName0(Native Method)}} {{ at java.base/java.lang.Class.forName(Class.java:375)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)}} {{ at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)}} {{ at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:47)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)}} {{ at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:61)}} {{ at org.apache.spark.sql.SparkSession.(SparkSession.scala:75)}} {{ at org.apache.spark.sql.SparkSession$.create(SparkSession.scala:737)}} {{ at org.apache.spark.sql.SparkSession$Builder.$anonfun$create$1(SparkSession.scala:805)}} {{ at scala.Option.getOrElse(Option.scala:189)}} {{ at org.apache.spark.sql.SparkSession$Builder.create(SparkSession.scala:805)}} {{ at org.apache.spark.sql.SparkSession$Builder.build(SparkSession.scala:795)}} {{ ... 55 elided}} {{Caused by: java.lang.ClassNotFoundException: io.netty.buffer.PooledByteBufAllocator}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ ... 85 more}} ``` was (Author: JIRAUSER300204): it seems to be failing on another issue now a fresh project with build.sbt ``` {{{}resolvers += "snaps" at "https://repository.apache.org/snapshots/"{}}}{{{}libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"{}}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0-SNAPSHOT"}} ``` ``` {{scala> import org.apache.spark.sql.SparkSession}} {{{}import org.apache.spark.sql.SparkSession{}}}{{{}scala> val spark = SparkSession.builder().remote("sc://localhost").build(){}}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator}} {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method)}} {{ at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)}} {{ at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)}} {{ at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)}} {{ at
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791996#comment-17791996 ] Faiz Halde edited comment on SPARK-45255 at 12/1/23 10:39 AM: -- it seems to be failing on another issue now a fresh project with build.sbt ``` {{resolvers += "snaps" at "https://repository.apache.org/snapshots/"}} {{libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.1-SNAPSHOT"}} ``` ``` {{scala> import org.apache.spark.sql.SparkSession}} {{{}import org.apache.spark.sql.SparkSession{}}}{{{}scala> val spark = SparkSession.builder().remote("sc://localhost").build(){}}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator}} {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method)}} {{ at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)}} {{ at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)}} {{ at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)}} {{ at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ at io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49)}} {{ at org.apache.arrow.memory.NettyAllocationManager.(NettyAllocationManager.java:51)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerFactory.(DefaultAllocationManagerFactory.java:26)}} {{ at java.base/java.lang.Class.forName0(Native Method)}} {{ at java.base/java.lang.Class.forName(Class.java:375)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)}} {{ at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)}} {{ at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:47)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)}} {{ at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:61)}} {{ at org.apache.spark.sql.SparkSession.(SparkSession.scala:75)}} {{ at org.apache.spark.sql.SparkSession$.create(SparkSession.scala:737)}} {{ at org.apache.spark.sql.SparkSession$Builder.$anonfun$create$1(SparkSession.scala:805)}} {{ at scala.Option.getOrElse(Option.scala:189)}} {{ at org.apache.spark.sql.SparkSession$Builder.create(SparkSession.scala:805)}} {{ at org.apache.spark.sql.SparkSession$Builder.build(SparkSession.scala:795)}} {{ ... 55 elided}} {{Caused by: java.lang.ClassNotFoundException: io.netty.buffer.PooledByteBufAllocator}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ ... 85 more}} ``` was (Author: JIRAUSER300204): it seems to be failing on another issue now a fresh project with build.sbt ``` {{resolvers += "snaps" at "https://repository.apache.org/snapshots/"}} {{libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0-SNAPSHOT"}} ``` ``` {{scala> import org.apache.spark.sql.SparkSession}} {{{}import org.apache.spark.sql.SparkSession{}}}{{{}scala> val spark = SparkSession.builder().remote("sc://localhost").build(){}}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator}} {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method)}} {{ at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)}} {{ at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)}} {{ at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)}} {{ at
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791996#comment-17791996 ] Faiz Halde edited comment on SPARK-45255 at 12/1/23 10:38 AM: -- it seems to be failing on another issue now a fresh project with build.sbt ``` {{{}resolvers += "snaps" at "https://repository.apache.org/snapshots/"{}}}{{{}libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT"{}}} {{libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0-SNAPSHOT"}} ``` ``` {{scala> import org.apache.spark.sql.SparkSession}} {{{}import org.apache.spark.sql.SparkSession{}}}{{{}scala> val spark = SparkSession.builder().remote("sc://localhost").build(){}}} {{warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation'}} {{java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator}} {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method)}} {{ at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)}} {{ at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)}} {{ at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)}} {{ at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)}} {{ at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ at io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49)}} {{ at org.apache.arrow.memory.NettyAllocationManager.(NettyAllocationManager.java:51)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerFactory.(DefaultAllocationManagerFactory.java:26)}} {{ at java.base/java.lang.Class.forName0(Native Method)}} {{ at java.base/java.lang.Class.forName(Class.java:375)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)}} {{ at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)}} {{ at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)}} {{ at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:47)}} {{ at org.apache.arrow.memory.ImmutableConfig.(ImmutableConfig.java:24)}} {{ at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)}} {{ at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:61)}} {{ at org.apache.spark.sql.SparkSession.(SparkSession.scala:75)}} {{ at org.apache.spark.sql.SparkSession$.create(SparkSession.scala:737)}} {{ at org.apache.spark.sql.SparkSession$Builder.$anonfun$create$1(SparkSession.scala:805)}} {{ at scala.Option.getOrElse(Option.scala:189)}} {{ at org.apache.spark.sql.SparkSession$Builder.create(SparkSession.scala:805)}} {{ at org.apache.spark.sql.SparkSession$Builder.build(SparkSession.scala:795)}} {{ ... 55 elided}} {{Caused by: java.lang.ClassNotFoundException: io.netty.buffer.PooledByteBufAllocator}} {{ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)}} {{ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)}} {{ ... 85 more}} ``` was (Author: JIRAUSER300204): it seems to be failing on another issue now a fresh project with build.sbt ``` resolvers += "snaps" at "https://repository.apache.org/snapshots/; libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "3.5.1-SNAPSHOT" libraryDependencies += "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0-SNAPSHOT" ``` ``` scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> val spark = SparkSession.builder().remote("sc://localhost").build() warning: one deprecation (since 3.5.0); for details, enable `:setting -deprecation' or `:replay -deprecation' java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427) at
[jira] [Updated] (SPARK-46046) Isolated classloader per spark session
[ https://issues.apache.org/jira/browse/SPARK-46046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-46046: --- Description: Hello, We use spark 3.5.0 and were wondering if the following is achievable using spark-core Our use case involves spinning up a spark cluster wherein the driver application loads user jars on-the-fly ( the user jar is not the spark driver/application ) but merely a catalog of transformations. A single spark application can load multiple jars in its lifetime with potential of classpath conflict if care is not taken by the framework The driver needs to load the jar, add the jar to the executor & call a predefined class.method to trigger the transformation Each transformation runs in its own spark session inside the same spark application AFAIK, on the executor side, isolated classloader per session is only possible when using the spark-connect facilities. Is it possible to do this without using spark connect? Spark connect is the only facility that adds the jar into a sessionUUID directory of executor and when an executor runs tasks of a job from that session, it sets a ChildFirstClassLoader pointing to the sessionUUID directory Thank you was: Hello, We use spark 3.5.0 and were wondering if the following is achievable using spark-core Our use case involves spinning up a spark cluster wherein the driver application loads user jars on-the-fly ( the user jar is not the spark driver/application ) but merely a catalog of transformations. A single spark application can load multiple jars in its lifetime with potential of classpath conflict if care is not taken by the framework The driver needs to load the jar, add the jar to the executor & call a predefined class.method to trigger the transformation Each transformation runs in its own spark session inside the same spark application AFAIK, on the executor side, isolated classloader per session is only possible when using the spark-connect facilities. Is it possible to do this without using spark connect? Spark connect is the only facility that adds the jar into a sessionUUID directory of executor and when an executor runs a job from that session, it sets a childfirstclassloader pointing to the sessionUUID directory Thank you > Isolated classloader per spark session > -- > > Key: SPARK-46046 > URL: https://issues.apache.org/jira/browse/SPARK-46046 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Hello, > We use spark 3.5.0 and were wondering if the following is achievable using > spark-core > Our use case involves spinning up a spark cluster wherein the driver > application loads user jars on-the-fly ( the user jar is not the spark > driver/application ) but merely a catalog of transformations. A single spark > application can load multiple jars in its lifetime with potential of > classpath conflict if care is not taken by the framework > The driver needs to load the jar, add the jar to the executor & call a > predefined class.method to trigger the transformation > Each transformation runs in its own spark session inside the same spark > application > AFAIK, on the executor side, isolated classloader per session is only > possible when using the spark-connect facilities. Is it possible to do this > without using spark connect? > Spark connect is the only facility that adds the jar into a sessionUUID > directory of executor and when an executor runs tasks of a job from that > session, it sets a ChildFirstClassLoader pointing to the sessionUUID directory > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46046) Isolated classloader per spark session
[ https://issues.apache.org/jira/browse/SPARK-46046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-46046: --- Description: Hello, We use spark 3.5.0 and were wondering if the following is achievable using spark-core Our use case involves spinning up a spark cluster wherein the driver application loads user jars on-the-fly ( the user jar is not the spark driver/application ) but merely a catalog of transformations. A single spark application can load multiple jars in its lifetime with potential of classpath conflict if care is not taken by the framework The driver needs to load the jar, add the jar to the executor & call a predefined class.method to trigger the transformation Each transformation runs in its own spark session inside the same spark application AFAIK, on the executor side, isolated classloader per session is only possible when using the spark-connect facilities. Is it possible to do this without using spark connect? Spark connect is the only facility that adds the jar into a sessionUUID directory of executor and when an executor runs a job from that session, it sets a childfirstclassloader pointing to the sessionUUID directory Thank you was: Hello, We use spark 3.5.0 and were wondering if the following is achievable using spark-core Our use case involves spinning up a spark cluster wherein the driver application loads user jars on-the-fly ( the user jar is not the spark driver/application ) but merely a catalog of transformations. A single spark application can load multiple jars in its lifetime with potential of classpath conflict if care is not taken by the framework The driver needs to load the jar, add the jar to the executor & calls a predefined class.method to trigger the transformation Each transformation runs in its own spark session inside the same spark application AFAIK, on the executor side, isolated classloader per session is only possible when using the spark-connect facilities. Is it possible to do this without using spark connect? Spark connect is the only facility that adds the jar into a sessionUUID directory of executor and when an executor runs a job from that session, it sets a childfirstclassloader pointing to the sessionUUID directory Thank you > Isolated classloader per spark session > -- > > Key: SPARK-46046 > URL: https://issues.apache.org/jira/browse/SPARK-46046 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Hello, > We use spark 3.5.0 and were wondering if the following is achievable using > spark-core > Our use case involves spinning up a spark cluster wherein the driver > application loads user jars on-the-fly ( the user jar is not the spark > driver/application ) but merely a catalog of transformations. A single spark > application can load multiple jars in its lifetime with potential of > classpath conflict if care is not taken by the framework > The driver needs to load the jar, add the jar to the executor & call a > predefined class.method to trigger the transformation > Each transformation runs in its own spark session inside the same spark > application > AFAIK, on the executor side, isolated classloader per session is only > possible when using the spark-connect facilities. Is it possible to do this > without using spark connect? > Spark connect is the only facility that adds the jar into a sessionUUID > directory of executor and when an executor runs a job from that session, it > sets a childfirstclassloader pointing to the sessionUUID directory > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46046) Isolated classloader per spark session
Faiz Halde created SPARK-46046: -- Summary: Isolated classloader per spark session Key: SPARK-46046 URL: https://issues.apache.org/jira/browse/SPARK-46046 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 3.5.0 Reporter: Faiz Halde Hello, We use spark 3.5.0 and were wondering if the following is achievable using spark-core Our use case involves spinning up a spark cluster wherein the driver application loads user jars on-the-fly ( the user jar is not the spark driver/application ) but merely a catalog of transformations. A single spark application can load multiple jars in its lifetime with potential of classpath conflict if care is not taken by the framework The driver needs to load the jar, add the jar to the executor & calls a predefined class.method to trigger the transformation Each transformation runs in its own spark session inside the same spark application AFAIK, on the executor side, isolated classloader per session is only possible when using the spark-connect facilities. Is it possible to do this without using spark connect? Spark connect is the only facility that adds the jar into a sessionUUID directory of executor and when an executor runs a job from that session, it sets a childfirstclassloader pointing to the sessionUUID directory Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42471) Distributed ML <> spark connect
[ https://issues.apache.org/jira/browse/SPARK-42471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786292#comment-17786292 ] Faiz Halde edited comment on SPARK-42471 at 11/15/23 10:22 AM: --- Hello, our use-case requires us to use spark-connect and we have some of our jobs that used spark ML ( scala ). Is this Umbrella tracking the work required to make spark ml compatible with spark-connect? Because so far we've been struggling with this. May I know if this is already done and if there are docs on how to make this work? Thanks! was (Author: JIRAUSER300204): Hello, our use-case requires us to use spark-connect and we have some of our jobs that used spark ML. Is this Umbrella tracking the work required to make spark ml compatible with spark-connect? Because so far we've been struggling with this. May I know if this is already done and if there are docs on how to make this work? Thanks! > Distributed ML <> spark connect > --- > > Key: SPARK-42471 > URL: https://issues.apache.org/jira/browse/SPARK-42471 > Project: Spark > Issue Type: Umbrella > Components: Connect, ML >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Weichen Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45598: --- Description: Spark version 3.5.0 Spark Connect version 3.5.0 Delta table 3.0.0 Spark connect server was started using *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0}}* --{*}{{conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" --conf 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}{*} {{Connect client depends on}} *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0"* *and the connect libraries* When trying to run a simple job that writes to a delta table {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{val data = spark.read.json("profiles.json")}} {{data.write.format("delta").save("/tmp/delta")}} {{Error log in connect client}} {{Exception in thread "main" org.apache.spark.SparkException: io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} {{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} {{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{...}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)}} {{ at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)}} {{ at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.to(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)}} {{ at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.toBuffer(GrpcExceptionConverter.scala:46)}} {{ at
[jira] [Updated] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45598: --- Description: Spark version 3.5.0 Spark Connect version 3.5.0 Delta table 3.0.0 Spark connect server was started using *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0}}* *{{--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" --conf 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* {{Connect client depends on}} *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0"* *and the connect libraries* When trying to run a simple job that writes to a delta table {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{val data = spark.read.json("profiles.json")}} {{data.write.format("delta").save("/tmp/delta")}} {{Error log in connect client}} {{Exception in thread "main" org.apache.spark.SparkException: io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} {{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} {{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{...}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)}} {{ at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)}} {{ at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.to(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)}} {{ at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.toBuffer(GrpcExceptionConverter.scala:46)}} {{ at
[jira] [Updated] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45598: --- Description: Spark version 3.5.0 Spark Connect version 3.5.0 Delta table 3.0.0 Spark connect server was started using *{{./sbin/start-connect-server.sh -master spark://localhost:7077 --packages org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0}}* *{{-conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" --conf 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* {{Connect client depends on}} *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0"* *and the connect libraries* When trying to run a simple job that writes to a delta table {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{val data = spark.read.json("profiles.json")}} {{data.write.format("delta").save("/tmp/delta")}} {{Error log in connect client}} {{Exception in thread "main" org.apache.spark.SparkException: io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} {{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} {{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{...}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)}} {{ at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)}} {{ at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.to(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)}} {{ at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.toBuffer(GrpcExceptionConverter.scala:46)}} {{ at
[jira] [Commented] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779843#comment-17779843 ] Faiz Halde commented on SPARK-45598: Hi, do we have any updates here? Happy to help > Delta table 3.0.0 not working with Spark Connect 3.5.0 > -- > > Key: SPARK-45598 > URL: https://issues.apache.org/jira/browse/SPARK-45598 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Spark version 3.5.0 > Spark Connect version 3.5.0 > Delta table 3.0-rc2 > Spark connect server was started using > *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages > org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 > --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" > --conf > 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* > {{Connect client depends on}} > *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"* > *and the connect libraries* > > When trying to run a simple job that writes to a delta table > {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{val data = spark.read.json("profiles.json")}} > {{data.write.format("delta").save("/tmp/delta")}} > > {{Error log in connect client}} > {{Exception in thread "main" org.apache.spark.SparkException: > io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: > cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 > in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} > {{ at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} > {{ at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{...}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} > {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} > {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} > {{ at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} > {{ at >
[jira] [Comment Edited] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1246#comment-1246 ] Faiz Halde edited comment on SPARK-45598 at 10/19/23 4:04 PM: -- Hi [~sdaberdaku] , corrected the title. I tested it with 3.0.0 delta. What I meant was, delta table does not work with {*}spark connect{*}. It does work with vanilla spark 3.5.0 otherwise was (Author: JIRAUSER300204): Hi [~sdaberdaku] , corrected the title. I tested it with 3.0.0 delta. What I meant was, delta table does not work with {*}spark connect{*}. It does work with spark 3.5.0 otherwise > Delta table 3.0.0 not working with Spark Connect 3.5.0 > -- > > Key: SPARK-45598 > URL: https://issues.apache.org/jira/browse/SPARK-45598 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Spark version 3.5.0 > Spark Connect version 3.5.0 > Delta table 3.0-rc2 > Spark connect server was started using > *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages > org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 > --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" > --conf > 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* > {{Connect client depends on}} > *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"* > *and the connect libraries* > > When trying to run a simple job that writes to a delta table > {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{val data = spark.read.json("profiles.json")}} > {{data.write.format("delta").save("/tmp/delta")}} > > {{Error log in connect client}} > {{Exception in thread "main" org.apache.spark.SparkException: > io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: > cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 > in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} > {{ at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} > {{ at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{...}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} > {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} > {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} > {{ at >
[jira] [Commented] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1246#comment-1246 ] Faiz Halde commented on SPARK-45598: Hi [~sdaberdaku] , corrected the title. I tested it with 3.0.0 delta. What I meant was, delta table does not work with {*}spark connect{*}. It does work with spark 3.5.0 otherwise > Delta table 3.0.0 not working with Spark Connect 3.5.0 > -- > > Key: SPARK-45598 > URL: https://issues.apache.org/jira/browse/SPARK-45598 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Spark version 3.5.0 > Spark Connect version 3.5.0 > Delta table 3.0-rc2 > Spark connect server was started using > *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages > org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 > --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" > --conf > 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* > {{Connect client depends on}} > *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"* > *and the connect libraries* > > When trying to run a simple job that writes to a delta table > {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{val data = spark.read.json("profiles.json")}} > {{data.write.format("delta").save("/tmp/delta")}} > > {{Error log in connect client}} > {{Exception in thread "main" org.apache.spark.SparkException: > io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: > cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 > in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} > {{ at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} > {{ at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{...}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} > {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} > {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} > {{ at
[jira] [Updated] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45598: --- Summary: Delta table 3.0.0 not working with Spark Connect 3.5.0 (was: Delta table 3.0-rc2 not working with Spark Connect 3.5.0) > Delta table 3.0.0 not working with Spark Connect 3.5.0 > -- > > Key: SPARK-45598 > URL: https://issues.apache.org/jira/browse/SPARK-45598 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Spark version 3.5.0 > Spark Connect version 3.5.0 > Delta table 3.0-rc2 > Spark connect server was started using > *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages > org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 > --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" > --conf > 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* > {{Connect client depends on}} > *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"* > *and the connect libraries* > > When trying to run a simple job that writes to a delta table > {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{val data = spark.read.json("profiles.json")}} > {{data.write.format("delta").save("/tmp/delta")}} > > {{Error log in connect client}} > {{Exception in thread "main" org.apache.spark.SparkException: > io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: > cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 > in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} > {{ at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} > {{ at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{...}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} > {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} > {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} > {{ at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} > {{ at
[jira] [Updated] (SPARK-45598) Delta table 3.0-rc2 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45598: --- Description: Spark version 3.5.0 Spark Connect version 3.5.0 Delta table 3.0-rc2 Spark connect server was started using *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" --conf 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* {{Connect client depends on}} *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"* *and the connect libraries* When trying to run a simple job that writes to a delta table {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{val data = spark.read.json("profiles.json")}} {{data.write.format("delta").save("/tmp/delta")}} {{Error log in connect client}} {{Exception in thread "main" org.apache.spark.SparkException: io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} {{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} {{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{...}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)}} {{ at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)}} {{ at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.to(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)}} {{ at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.toBuffer(GrpcExceptionConverter.scala:46)}} {{ at
[jira] [Updated] (SPARK-44526) Porting k8s PVC reuse logic to spark standalone
[ https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-44526: --- Affects Version/s: 3.4.1 (was: 3.5.0) > Porting k8s PVC reuse logic to spark standalone > --- > > Key: SPARK-44526 > URL: https://issues.apache.org/jira/browse/SPARK-44526 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.4.1 >Reporter: Faiz Halde >Priority: Major > > Hi, > This ticket is meant to understand the work that would be involved in porting > the k8s PVC reuse feature onto the spark standalone cluster manager which > reuses the shuffle files present locally in the disk > We are a heavy user of spot instances and we suffer from spot terminations > impacting our long running jobs > The logic in `KubernetesLocalDiskShuffleExecutorComponents` itself is not > that much. However when I tried this on the > `LocalDiskShuffleExecutorComponents` it was not a successful experiment which > suggests there is more to recovering shuffle files > I'd like to understand what will be the work involved for this. We'll be more > than happy to contribute -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44526) Porting k8s PVC reuse logic to spark standalone
[ https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-44526: --- Affects Version/s: 3.5.0 (was: 3.4.1) > Porting k8s PVC reuse logic to spark standalone > --- > > Key: SPARK-44526 > URL: https://issues.apache.org/jira/browse/SPARK-44526 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Hi, > This ticket is meant to understand the work that would be involved in porting > the k8s PVC reuse feature onto the spark standalone cluster manager which > reuses the shuffle files present locally in the disk > We are a heavy user of spot instances and we suffer from spot terminations > impacting our long running jobs > The logic in `KubernetesLocalDiskShuffleExecutorComponents` itself is not > that much. However when I tried this on the > `LocalDiskShuffleExecutorComponents` it was not a successful experiment which > suggests there is more to recovering shuffle files > I'd like to understand what will be the work involved for this. We'll be more > than happy to contribute -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45598) Delta table 3.0-rc2 not working with Spark Connect 3.5.0
Faiz Halde created SPARK-45598: -- Summary: Delta table 3.0-rc2 not working with Spark Connect 3.5.0 Key: SPARK-45598 URL: https://issues.apache.org/jira/browse/SPARK-45598 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Faiz Halde Spark version 3.5.0 Spark Connect version 3.5.0 Delta table 3.0-rc2 When trying to run a simple job that writes to a delta table {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{val data = spark.read.json("profiles.json")}} {{data.write.format("delta").save("/tmp/delta")}} {{Error log in connect client}} {{Exception in thread "main" org.apache.spark.SparkException: io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} {{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} {{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} {{...}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} {{ at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)}} {{ at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)}} {{ at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.to(GrpcExceptionConverter.scala:46)}} {{ at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)}} {{ at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)}} {{ at org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.toBuffer(GrpcExceptionConverter.scala:46)}} {{ at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:554)}} {{ at org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:257)}} {{ at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:221)}} {{ at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:210)}} {{ at Main$.main(Main.scala:11)}} {{ at Main.main(Main.scala)}} {{Error log in spark connect
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768040#comment-17768040 ] Faiz Halde edited comment on SPARK-45255 at 9/22/23 2:31 PM: - to get past the error org/sparkproject/connect/client/com/google/common/cache/CacheLoader even after adding guava library, you need to copy their shading rules ``` (assembly / assemblyShadeRules) := Seq( ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll, ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll, ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll, ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll, ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll, ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.client.io.perfmark.@1").inAll, ShadeRule.rename("org.codehaus.**" -> "org.sparkproject.connect.client.org.codehaus.@1").inAll, ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.client.android.annotation.@1").inAll ), ``` was (Author: JIRAUSER300204): to get pas the error org/sparkproject/connect/client/com/google/common/cache/CacheLoader even after adding guava library, you need to copy their shading rules ``` (assembly / assemblyShadeRules) := Seq( ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll, ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll, ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll, ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll, ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll, ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.client.io.perfmark.@1").inAll, ShadeRule.rename("org.codehaus.**" -> "org.sparkproject.connect.client.org.codehaus.@1").inAll, ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.client.android.annotation.@1").inAll ), ``` > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM >
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768040#comment-17768040 ] Faiz Halde edited comment on SPARK-45255 at 9/22/23 2:31 PM: - to get past the error `org/sparkproject/connect/client/com/google/common/cache/CacheLoader` even after adding guava library, you need to copy their shading rules ``` (assembly / assemblyShadeRules) := Seq( ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll, ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll, ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll, ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll, ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll, ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.client.io.perfmark.@1").inAll, ShadeRule.rename("org.codehaus.**" -> "org.sparkproject.connect.client.org.codehaus.@1").inAll, ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.client.android.annotation.@1").inAll ), ``` was (Author: JIRAUSER300204): to get past the error org/sparkproject/connect/client/com/google/common/cache/CacheLoader even after adding guava library, you need to copy their shading rules ``` (assembly / assemblyShadeRules) := Seq( ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll, ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll, ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll, ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll, ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll, ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.client.io.perfmark.@1").inAll, ShadeRule.rename("org.codehaus.**" -> "org.sparkproject.connect.client.org.codehaus.@1").inAll, ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.client.android.annotation.@1").inAll ), ``` > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM >
[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768040#comment-17768040 ] Faiz Halde commented on SPARK-45255: to get pas the error org/sparkproject/connect/client/com/google/common/cache/CacheLoader even after adding guava library, you need to copy their shading rules ``` (assembly / assemblyShadeRules) := Seq( ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll, ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll, ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll, ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll, ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll, ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.client.io.perfmark.@1").inAll, ShadeRule.rename("org.codehaus.**" -> "org.sparkproject.connect.client.org.codehaus.@1").inAll, ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.client.android.annotation.@1").inAll ), ``` > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM > org.sparkproject.connect.client.io.grpc.NameResolverRegistry > getDefaultRegistry}} > {{WARNING: No NameResolverProviders found via ServiceLoader, including for > DNS. This is probably due to a broken build. If using ProGuard, check your > configuration}} > {{Exception in thread "main" > org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: > > org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: > No functional channel service provider found. Try adding a dependency on the > grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} > {{ at >
[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768037#comment-17768037 ] Faiz Halde commented on SPARK-45255: For now, I unblocked myself by manually building spark connect {{build/mvn -Pconnect -DskipTests clean package}} {{and then running}} {{mkdir connect-jars}} {{./bin/spark-connect-scala-client-classpath | tr ':' '\n' | xargs -I{} cp {} connect-jars}} {{Then, in your client application, have the connect-jars directory in your classpath. Not sure if this is the right way though}} > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM > org.sparkproject.connect.client.io.grpc.NameResolverRegistry > getDefaultRegistry}} > {{WARNING: No NameResolverProviders found via ServiceLoader, including for > DNS. This is probably due to a broken build. If using ProGuard, check your > configuration}} > {{Exception in thread "main" > org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: > > org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: > No functional channel service provider found. Try adding a dependency on the > grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}} > {{ at scala.Option.getOrElse(Option.scala:189)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{
[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768037#comment-17768037 ] Faiz Halde edited comment on SPARK-45255 at 9/22/23 2:29 PM: - For now, I unblocked myself by manually building spark connect {{build/mvn -Pconnect -DskipTests clean package}} {{and then running}} {{mkdir connect-jars}} {{./bin/spark-connect-scala-client-classpath | tr ':' '\n' | xargs -I{} cp {} connect-jars}} {{Then, when starting your client application, have the connect-jars directory in your classpath. Not sure if this is the right way though}} was (Author: JIRAUSER300204): For now, I unblocked myself by manually building spark connect {{build/mvn -Pconnect -DskipTests clean package}} {{and then running}} {{mkdir connect-jars}} {{./bin/spark-connect-scala-client-classpath | tr ':' '\n' | xargs -I{} cp {} connect-jars}} {{Then, in your client application, have the connect-jars directory in your classpath. Not sure if this is the right way though}} > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM > org.sparkproject.connect.client.io.grpc.NameResolverRegistry > getDefaultRegistry}} > {{WARNING: No NameResolverProviders found via ServiceLoader, including for > DNS. This is probably due to a broken build. If using ProGuard, check your > configuration}} > {{Exception in thread "main" > org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: > > org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: > No functional channel service provider found. Try adding a dependency on the > grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}} > {{ at scala.Option.getOrElse(Option.scala:189)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}} > {{ at
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: java 1.8, sbt 1.9, scala 2.12 I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) EDIT Not sure if it's the right mitigation but explicitly adding guava worked but now I am in the 2nd territory of error {{Sep 21, 2023 8:21:59 PM org.sparkproject.connect.client.io.grpc.NameResolverRegistry getDefaultRegistry}} {{WARNING: No NameResolverProviders found via ServiceLoader, including for DNS. This is probably due to a broken build. If using ProGuard, check your configuration}} {{Exception in thread "main" org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} {{ at org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} {{ at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}} {{ at scala.Option.getOrElse(Option.scala:189)}} {{ at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} {{ at org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry.newChannelBuilder(ManagedChannelRegistry.java:179)}} {{ at org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry.newChannelBuilder(ManagedChannelRegistry.java:155)}} {{ at org.sparkproject.connect.client.io.grpc.Grpc.newChannelBuilder(Grpc.java:101)}} {{ at org.sparkproject.connect.client.io.grpc.Grpc.newChannelBuilderForAddress(Grpc.java:111)}} {{ at org.apache.spark.sql.connect.client.SparkConnectClient$Configuration.createChannel(SparkConnectClient.scala:633)}} {{ at
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: java 1.8, sbt 1.9, scala 2.12 I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main"
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Issue Type: Bug (was: New Feature) > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > I followed the doc exactly as described. Can somebody help -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. It's not run neither under a `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. Neither under `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{
[jira] [Created] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
Faiz Halde created SPARK-45255: -- Summary: Spark connect client failing with java.lang.NoClassDefFoundError Key: SPARK-45255 URL: https://issues.apache.org/jira/browse/SPARK-45255 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Faiz Halde I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader at Main$.delayedEndpoint$Main$1(Main.scala:4) at Main$delayedInit$body.apply(Main.scala:3) at scala.Function0.apply$mcV$sp(Function0.scala:39) at scala.Function0.apply$mcV$sp$(Function0.scala:39) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17) at scala.App.$anonfun$main$1$adapted(App.scala:80) at scala.collection.immutable.List.foreach(List.scala:431) at scala.App.main(App.scala:80) at scala.App.main$(App.scala:78) at Main$.main(Main.scala:3) at Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 11 more ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. Neither under `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-45255: --- Description: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` {{Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} {{ at Main$delayedInit$body.apply(Main.scala:3)}} {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} {{ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} {{ at scala.collection.immutable.List.foreach(List.scala:431)}} {{ at scala.App.main(App.scala:80)}} {{ at scala.App.main$(App.scala:78)}} {{ at Main$.main(Main.scala:3)}} {{ at Main.main(Main.scala)}} {{Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} {{ ... 11 more}} ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. Neither under `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? was: I have a very simple repo with the following dependency in `build.sbt` ``` {{libraryDependencies ++= Seq("org.apache.spark" %% "spark-connect-client-jvm" % "3.5.0")}} ``` A simple application ``` {{object Main extends App {}} {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} {{}}} ``` But when I run it, I get the following error ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader at Main$.delayedEndpoint$Main$1(Main.scala:4) at Main$delayedInit$body.apply(Main.scala:3) at scala.Function0.apply$mcV$sp(Function0.scala:39) at scala.Function0.apply$mcV$sp$(Function0.scala:39) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17) at scala.App.$anonfun$main$1$adapted(App.scala:80) at scala.collection.immutable.List.foreach(List.scala:431) at scala.App.main(App.scala:80) at scala.App.main$(App.scala:78) at Main$.main(Main.scala:3) at Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 11 more ``` I know the connect does a bunch of shading during assembly so it could be related to that. This application is not started via spark-submit or anything. Neither under `SPARK_HOME` ( I guess that's the whole point of connect client ) I followed the doc exactly as described. Can somebody help? BTW it did work if I copied the exact shading rules in my project but I wonder if that's the right thing to do? > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get
[jira] [Updated] (SPARK-44526) Porting k8s PVC reuse logic to spark standalone
[ https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-44526: --- Description: Hi, This ticket is meant to understand the work that would be involved in porting the k8s PVC reuse feature onto the spark standalone cluster manager which reuses the shuffle files present locally in the disk We are a heavy user of spot instances and we suffer from spot terminations impacting our long running jobs The logic in `KubernetesLocalDiskShuffleDataIO` itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to it I'd like to understand what will be the work involved for this. We'll be more than happy to contribute was: Hi, This ticket is meant to understand the work that would be involved in porting the k8s PVC reuse feature onto the spark standalone cluster manager which reuses the shuffle files present locally in the disk We are a heavy user of spot instances and we suffer from spot terminations impacting our long running jobs The logic in KubernetesLocalDiskShuffleDataIO itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to it I'd like to understand what will be the work involved for this. We'll be more than happy to contribute > Porting k8s PVC reuse logic to spark standalone > --- > > Key: SPARK-44526 > URL: https://issues.apache.org/jira/browse/SPARK-44526 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.4.1 >Reporter: Faiz Halde >Priority: Major > > Hi, > This ticket is meant to understand the work that would be involved in porting > the k8s PVC reuse feature onto the spark standalone cluster manager which > reuses the shuffle files present locally in the disk > We are a heavy user of spot instances and we suffer from spot terminations > impacting our long running jobs > The logic in `KubernetesLocalDiskShuffleDataIO` > itself is not that much. However when I tried this on the > `LocalDiskShuffleExecutorComponents` it was not a successful experiment which > suggests there is more to it > I'd like to understand what will be the work involved for this. We'll be more > than happy to contribute -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44526) Porting k8s PVC reuse logic to spark standalone
[ https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-44526: --- Description: Hi, This ticket is meant to understand the work that would be involved in porting the k8s PVC reuse feature onto the spark standalone cluster manager which reuses the shuffle files present locally in the disk We are a heavy user of spot instances and we suffer from spot terminations impacting our long running jobs The logic in `KubernetesLocalDiskShuffleExecutorComponents` itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to recovering shuffle files I'd like to understand what will be the work involved for this. We'll be more than happy to contribute was: Hi, This ticket is meant to understand the work that would be involved in porting the k8s PVC reuse feature onto the spark standalone cluster manager which reuses the shuffle files present locally in the disk We are a heavy user of spot instances and we suffer from spot terminations impacting our long running jobs The logic in `KubernetesLocalDiskShuffleDataIO` itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to it I'd like to understand what will be the work involved for this. We'll be more than happy to contribute > Porting k8s PVC reuse logic to spark standalone > --- > > Key: SPARK-44526 > URL: https://issues.apache.org/jira/browse/SPARK-44526 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.4.1 >Reporter: Faiz Halde >Priority: Major > > Hi, > This ticket is meant to understand the work that would be involved in porting > the k8s PVC reuse feature onto the spark standalone cluster manager which > reuses the shuffle files present locally in the disk > We are a heavy user of spot instances and we suffer from spot terminations > impacting our long running jobs > The logic in `KubernetesLocalDiskShuffleExecutorComponents` itself is not > that much. However when I tried this on the > `LocalDiskShuffleExecutorComponents` it was not a successful experiment which > suggests there is more to recovering shuffle files > I'd like to understand what will be the work involved for this. We'll be more > than happy to contribute -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44526) Porting k8s PVC reuse logic to spark standalone
[ https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-44526: --- Description: Hi, This ticket is meant to understand the work that would be involved in porting the k8s PVC reuse feature onto the spark standalone cluster manager which reuses the shuffle files present locally in the disk We are a heavy user of spot instances and we suffer from spot terminations impacting our long running jobs The logic in KubernetesLocalDiskShuffleDataIO itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to it I'd like to understand what will be the work involved for this. We'll be more than happy to contribute was: Hi, This ticket is meant to understand the work that would be involved in porting the PVC reuse feature onto the spark standalone cluster manager The logic in KubernetesLocalDiskShuffleDataIO itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to it I'd like to understand what will be the work involved for this. We'll be more than happy to contribute > Porting k8s PVC reuse logic to spark standalone > --- > > Key: SPARK-44526 > URL: https://issues.apache.org/jira/browse/SPARK-44526 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.4.1 >Reporter: Faiz Halde >Priority: Major > > Hi, > This ticket is meant to understand the work that would be involved in porting > the k8s PVC reuse feature onto the spark standalone cluster manager which > reuses the shuffle files present locally in the disk > We are a heavy user of spot instances and we suffer from spot terminations > impacting our long running jobs > The logic in > KubernetesLocalDiskShuffleDataIO > itself is not that much. However when I tried this on the > `LocalDiskShuffleExecutorComponents` it was not a successful experiment which > suggests there is more to it > I'd like to understand what will be the work involved for this. We'll be more > than happy to contribute -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44526) Porting k8s PVC reuse logic to spark standalone
Faiz Halde created SPARK-44526: -- Summary: Porting k8s PVC reuse logic to spark standalone Key: SPARK-44526 URL: https://issues.apache.org/jira/browse/SPARK-44526 Project: Spark Issue Type: New Feature Components: Shuffle, Spark Core Affects Versions: 3.4.1 Reporter: Faiz Halde Hi, This ticket is meant to understand the work that would be involved in porting the PVC reuse feature onto the spark standalone cluster manager The logic in KubernetesLocalDiskShuffleDataIO itself is not that much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it was not a successful experiment which suggests there is more to it I'd like to understand what will be the work involved for this. We'll be more than happy to contribute -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43426) Introducing a new SparkListenerEvent for jobs running in AQE mode
Faiz Halde created SPARK-43426: -- Summary: Introducing a new SparkListenerEvent for jobs running in AQE mode Key: SPARK-43426 URL: https://issues.apache.org/jira/browse/SPARK-43426 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.4.0 Reporter: Faiz Halde Prior to AQE, we used to rely on SparkListenerJobEnd to figure out when a job has been completed from Spark's scheduler point of view Unfortunately with AQE a job is split into multiple jobs and there's no way for us to figure out when a job is completed We have a listener that tracks some spark metrics. In order to establish the right relations among stages, jobs, and tasks we had to keep a mapping around. We used the SparkListenerJobEnd event to clear out these entries, however with AQE we cannot know when is the right time to remove these entries that we track What suggestion would you have for this? Would having a SparkListenerJobGroupEnd be a rationale addition? Because our jobs have 1 single action, we probably can rely on stage metrics i.e if a stage completed that had some output metrics > 0, that's a useful indicator that the next Job End likely was the last job in that Job group -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43408) Spark caching in the context of a single job
[ https://issues.apache.org/jira/browse/SPARK-43408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-43408: --- Description: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API. The StorageLevel allowed in my case is DISK_ONLY i.e. I am not looking to speed up by caching data in memory To rephrase, is DISK_ONLY caching better or same as shuffle checkpointing in the context of a single action was: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API. The StorageLevel allowed in my case would only be DISK_ONLY i.e. I am not looking to speed up by caching data in memory > Spark caching in the context of a single job > > > Key: SPARK-43408 > URL: https://issues.apache.org/jira/browse/SPARK-43408 > Project: Spark > Issue Type: Question > Components: Shuffle >Affects Versions: 3.3.1 >Reporter: Faiz Halde >Priority: Trivial > > Does caching benefit a spark job with only a single action in it? Spark IIRC > already optimizes shuffles by persisting them onto the disk > I am unable to find a counter-example where caching would benefit a job with > a single action. In every case I can think of, the shuffle checkpoint acts as > a good enough caching mechanism in itself > FWIW, I am talking specifically in the context of the Dataframe API. The > StorageLevel allowed in my case is DISK_ONLY i.e. I am not looking to speed > up by caching data in memory > To rephrase, is DISK_ONLY caching better or same as shuffle checkpointing in > the context of a single action -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43408) Spark caching in the context of a single job
[ https://issues.apache.org/jira/browse/SPARK-43408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-43408: --- Description: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API. The StorageLevel allowed in my case would only be DISK_ONLY i.e. I am not looking to speed up by caching data in memory was: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API > Spark caching in the context of a single job > > > Key: SPARK-43408 > URL: https://issues.apache.org/jira/browse/SPARK-43408 > Project: Spark > Issue Type: Question > Components: Shuffle >Affects Versions: 3.3.1 >Reporter: Faiz Halde >Priority: Trivial > > Does caching benefit a spark job with only a single action in it? Spark IIRC > already optimizes shuffles by persisting them onto the disk > I am unable to find a counter-example where caching would benefit a job with > a single action. In every case I can think of, the shuffle checkpoint acts as > a good enough caching mechanism in itself > FWIW, I am talking specifically in the context of the Dataframe API. The > StorageLevel allowed in my case would only be DISK_ONLY i.e. I am not looking > to speed up by caching data in memory -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43408) Spark caching in the context of a single job
[ https://issues.apache.org/jira/browse/SPARK-43408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faiz Halde updated SPARK-43408: --- Description: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself FWIW, I am talking specifically in the context of the Dataframe API was: Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself > Spark caching in the context of a single job > > > Key: SPARK-43408 > URL: https://issues.apache.org/jira/browse/SPARK-43408 > Project: Spark > Issue Type: Question > Components: Shuffle >Affects Versions: 3.3.1 >Reporter: Faiz Halde >Priority: Trivial > > Does caching benefit a spark job with only a single action in it? Spark IIRC > already optimizes shuffles by persisting them onto the disk > I am unable to find a counter-example where caching would benefit a job with > a single action. In every case I can think of, the shuffle checkpoint acts as > a good enough caching mechanism in itself > FWIW, I am talking specifically in the context of the Dataframe API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43408) Spark caching in the context of a single job
Faiz Halde created SPARK-43408: -- Summary: Spark caching in the context of a single job Key: SPARK-43408 URL: https://issues.apache.org/jira/browse/SPARK-43408 Project: Spark Issue Type: Question Components: Shuffle Affects Versions: 3.3.1 Reporter: Faiz Halde Does caching benefit a spark job with only a single action in it? Spark IIRC already optimizes shuffles by persisting them onto the disk I am unable to find a counter-example where caching would benefit a job with a single action. In every case I can think of, the shuffle checkpoint acts as a good enough caching mechanism in itself -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43407) Can executors recover/reuse shuffle files upon failure?
Faiz Halde created SPARK-43407: -- Summary: Can executors recover/reuse shuffle files upon failure? Key: SPARK-43407 URL: https://issues.apache.org/jira/browse/SPARK-43407 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 3.3.1 Reporter: Faiz Halde Hello, We've been in touch with a few spark specialists who suggested us a potential solution to improve the reliability of our jobs that are shuffle heavy Here is what our setup looks like * Spark version: 3.3.1 * Java version: 1.8 * We do not use external shuffle service * We use spot instances We run spark jobs on clusters that use Amazon EBS volumes. The spark.local.dir is mounted on this EBS volume. One of the offerings from the service we use is EBS migration which basically means if a host is about to get evicted, a new host is created and the EBS volume is attached to it When Spark assigns a new executor to the newly created instance, it basically can recover all the shuffle files that are already persisted in the migrated EBS volume Is this how it works? Do executors recover / re-register the shuffle files that they found? So far I have not come across any recovery mechanism. I can only see {noformat} KubernetesLocalDiskShuffleDataIO{noformat} that has a pre-init step where it tries to register the available shuffle files to itself A natural follow-up on this, If what they claim is true, then ideally we should expect that when an executor is killed/OOM'd and a new executor is spawned on the same host, the new executor registers the shuffle files to itself. Is that so? Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org