[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768618#comment-17768618 ] Aleksandr Aleksandrov commented on SPARK-45255: --- Yes, thanks. After adding shading rules the error is gone. But anyway it's incorrect approach that I should add something additional libraries with shading rules for using the spark connect... > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM > org.sparkproject.connect.client.io.grpc.NameResolverRegistry > getDefaultRegistry}} > {{WARNING: No NameResolverProviders found via ServiceLoader, including for > DNS. This is probably due to a broken build. If using ProGuard, check your > configuration}} > {{Exception in thread "main" > org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: > > org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: > No functional channel service provider found. Try adding a dependency on the > grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}} > {{ at scala.Option.getOrElse(Option.scala:189)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: >
[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError
[ https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768005#comment-17768005 ] Aleksandr Aleksandrov commented on SPARK-45255: --- I have the same issue. But adding guava dependency didn't help me {code:java} Exception in thread "main" java.lang.NoClassDefFoundError: org/sparkproject/connect/client/com/google/common/cache/CacheLoader at ... Caused by: java.lang.ClassNotFoundException: org.sparkproject.connect.client.com.google.common.cache.CacheLoader at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 2 more{code} > Spark connect client failing with java.lang.NoClassDefFoundError > > > Key: SPARK-45255 > URL: https://issues.apache.org/jira/browse/SPARK-45255 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > java 1.8, sbt 1.9, scala 2.12 > > I have a very simple repo with the following dependency in `build.sbt` > ``` > {{libraryDependencies ++= Seq("org.apache.spark" %% > "spark-connect-client-jvm" % "3.5.0")}} > ``` > A simple application > ``` > {{object Main extends App {}} > {{ val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{ s.read.json("/tmp/input.json").repartition(10).show(false)}} > {{}}} > ``` > But when I run it, I get the following error > > ``` > {{Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/connect/client/com/google/common/cache/CacheLoader}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}} > {{ at scala.App.$anonfun$main$1$adapted(App.scala:80)}} > {{ at scala.collection.immutable.List.foreach(List.scala:431)}} > {{ at scala.App.main(App.scala:80)}} > {{ at scala.App.main$(App.scala:78)}} > {{ at Main$.main(Main.scala:3)}} > {{ at Main.main(Main.scala)}} > {{Caused by: java.lang.ClassNotFoundException: > org.sparkproject.connect.client.com.google.common.cache.CacheLoader}} > {{ at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}} > {{ at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}} > {{ at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}} > {{ ... 11 more}} > ``` > I know the connect does a bunch of shading during assembly so it could be > related to that. This application is not started via spark-submit or > anything. It's not run neither under a `SPARK_HOME` ( I guess that's the > whole point of connect client ) > > EDIT > Not sure if it's the right mitigation but explicitly adding guava worked but > now I am in the 2nd territory of error > {{Sep 21, 2023 8:21:59 PM > org.sparkproject.connect.client.io.grpc.NameResolverRegistry > getDefaultRegistry}} > {{WARNING: No NameResolverProviders found via ServiceLoader, including for > DNS. This is probably due to a broken build. If using ProGuard, check your > configuration}} > {{Exception in thread "main" > org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException: > > org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException: > No functional channel service provider found. Try adding a dependency on the > grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}} > {{ at > org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}} > {{ at scala.Option.getOrElse(Option.scala:189)}} > {{ at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}} > {{ at Main$.delayedEndpoint$Main$1(Main.scala:4)}} > {{ at Main$delayedInit$body.apply(Main.scala:3)}} > {{ at scala.Function0.apply$mcV$sp(Function0.scala:39)}} > {{ at scala.Function0.apply$mcV$sp$(Function0.scala:39)}} > {{ at >
[jira] [Updated] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Aleksandrov updated SPARK-44040: -- Description: When i try to call count after distinct function for Decimal null field, spark return incorrect result starting from spark 3.4.0. A minimal example to reproduce: import org.apache.spark.sql.types._ import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} import org.apache.spark.sql.types.\{StringType, StructField, StructType} val schema = StructType( Array( StructField("money", DecimalType(38,6), true), StructField("reference_id", StringType, true) )) val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2")) val unionDF: DataFrame = aggDf.union(aggDf1) unionDF.select("money").distinct.show // return correct result unionDF.select("money").distinct.count // return 2 instead of 1 unionDF.select("money").distinct.count == 1 // return false This block of code returns some assertion error and after that an incorrect count (in spark 3.2.1 everything works fine and i get correct result = 1): *scala> unionDF.select("money").distinct.show // return correct result* java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional while compiling: during phase: globalPhase=terminal, enteringPhase=jvm library version: version 2.12.17 compiler version: version 2.12.17 reconstructed args: -classpath /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar -Yrepl-class-based -Yrepl-outdir /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 last tree to typer: TypeTree(class Byte) tree position: line 6 of tree tpe: Byte symbol: (final abstract) class Byte in package scala symbol definition: final abstract class Byte extends (a ClassSymbol) symbol package: scala symbol owners: class Byte call site: constructor $eval in object $eval in package $line19 == Source file context for tree position == 3 4object $eval { 5lazyval $result = $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 6lazyval $print: {_}root{_}.java.lang.String = { 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw 8 9"" at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) at scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) at scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) at scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96) at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88) at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47) at scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173) at scala.tools.nsc.symtab.classfile.ClassfileParser.parseClass(ClassfileParser.scala:467) at scala.tools.nsc.symtab.classfile.ClassfileParser.$anonfun$parse$2(ClassfileParser.scala:160) at
[jira] [Created] (SPARK-44040) Incorrect result after count distinct
Aleksandr Aleksandrov created SPARK-44040: - Summary: Incorrect result after count distinct Key: SPARK-44040 URL: https://issues.apache.org/jira/browse/SPARK-44040 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Aleksandr Aleksandrov When i try to call count after distinct function for Decimal null field, spark return incorrect result starting from spark 3.4.0. A minimal example to reproduce: import org.apache.spark.sql.types._ import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} import org.apache.spark.sql.types.\{StringType, StructField, StructType} val schema = StructType( Array( StructField("money", DecimalType(38,6), true), StructField("reference_id", StringType, true) )) val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2")) val unionDF: DataFrame = aggDf.union(aggDf1) unionDF.select("money").distinct.show // return correct result unionDF.select("money").distinct.count // return 2 instead of 1 unionDF.select("money").distinct.count == 1 // return false This block of code returns some assertion error and after that an incorrect count (in spark 3.2.1 everything works fine and i get correct result = 1): *scala> unionDF.select("money").distinct.show // return correct result* java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional while compiling: during phase: globalPhase=terminal, enteringPhase=jvm library version: version 2.12.17 compiler version: version 2.12.17 reconstructed args: -classpath /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar -Yrepl-class-based -Yrepl-outdir /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 last tree to typer: TypeTree(class Byte) tree position: line 6 of tree tpe: Byte symbol: (final abstract) class Byte in package scala symbol definition: final abstract class Byte extends (a ClassSymbol) symbol package: scala symbol owners: class Byte call site: constructor $eval in object $eval in package $line19 == Source file context for tree position == 3 4object $eval { 5lazyval $result = $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 6lazyval $print: _root_.java.lang.String = { 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw 8 9"" at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) at scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) at scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) at scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96) at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88) at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47) at scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173) at