[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-25 Thread Aleksandr Aleksandrov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768618#comment-17768618
 ] 

Aleksandr Aleksandrov commented on SPARK-45255:
---

Yes, thanks. After adding shading rules the error is gone. But anyway it's 
incorrect approach that I should add something additional libraries with 
shading rules for using the spark connect...

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
> org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
> getDefaultRegistry}}
> {{WARNING: No NameResolverProviders found via ServiceLoader, including for 
> DNS. This is probably due to a broken build. If using ProGuard, check your 
> configuration}}
> {{Exception in thread "main" 
> org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
>  No functional channel service provider found. Try adding a dependency on the 
> grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}}
> {{    at scala.Option.getOrElse(Option.scala:189)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: 
> 

[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Aleksandr Aleksandrov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768005#comment-17768005
 ] 

Aleksandr Aleksandrov commented on SPARK-45255:
---

I have the same issue. But adding guava dependency didn't help me


 
{code:java}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader 
at ... 
Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader 
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
 
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
 
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) 
... 2 more{code}
 

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
> org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
> getDefaultRegistry}}
> {{WARNING: No NameResolverProviders found via ServiceLoader, including for 
> DNS. This is probably due to a broken build. If using ProGuard, check your 
> configuration}}
> {{Exception in thread "main" 
> org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
>  No functional channel service provider found. Try adding a dependency on the 
> grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}}
> {{    at scala.Option.getOrElse(Option.scala:189)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> 

[jira] [Updated] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Aleksandr Aleksandrov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Aleksandrov updated SPARK-44040:
--
Description: 
When i try to call count after distinct function for Decimal null field, spark 
return incorrect result starting from spark 3.4.0.
A minimal example to reproduce:

import org.apache.spark.sql.types._
import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
import org.apache.spark.sql.types.\{StringType, StructField, StructType}
val schema = StructType( Array(
StructField("money", DecimalType(38,6), true),
StructField("reference_id", StringType, true)
))

val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)

val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2"))
val unionDF: DataFrame = aggDf.union(aggDf1)
unionDF.select("money").distinct.show // return correct result
unionDF.select("money").distinct.count // return 2 instead of 1
unionDF.select("money").distinct.count == 1 // return false


This block of code returns some assertion error and after that an incorrect 
count (in spark 3.2.1 everything works fine and i get correct result = 1):

*scala> unionDF.select("money").distinct.show // return correct result*
java.lang.AssertionError: assertion failed:
Decimal$DecimalIsFractional
while compiling: 
during phase: globalPhase=terminal, enteringPhase=jvm
library version: version 2.12.17
compiler version: version 2.12.17
reconstructed args: -classpath 
/Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
 -Yrepl-class-based -Yrepl-outdir 
/private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1

last tree to typer: TypeTree(class Byte)
tree position: line 6 of 
tree tpe: Byte
symbol: (final abstract) class Byte in package scala
symbol definition: final abstract class Byte extends (a ClassSymbol)
symbol package: scala
symbol owners: class Byte
call site: constructor $eval in object $eval in package $line19

== Source file context for tree position ==

3
4object $eval {
5lazyval $result = 
$line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
6lazyval $print: {_}root{_}.java.lang.String = {
7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
8
9""
at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
at 
scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
at 
scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
at 
scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96)
at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88)
at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.parseClass(ClassfileParser.scala:467)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.$anonfun$parse$2(ClassfileParser.scala:160)
at 

[jira] [Created] (SPARK-44040) Incorrect result after count distinct

2023-06-13 Thread Aleksandr Aleksandrov (Jira)
Aleksandr Aleksandrov created SPARK-44040:
-

 Summary: Incorrect result after count distinct
 Key: SPARK-44040
 URL: https://issues.apache.org/jira/browse/SPARK-44040
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Aleksandr Aleksandrov


When i try to call count after distinct function for Decimal null field, spark 
return incorrect result starting from spark 3.4.0.
A minimal example to reproduce:
import org.apache.spark.sql.types._
import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
import org.apache.spark.sql.types.\{StringType, StructField, StructType}
val schema = StructType( Array(
 StructField("money", DecimalType(38,6), true),
 StructField("reference_id", StringType, true)
 ))

val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)

val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", lit("df2"))
val unionDF: DataFrame = aggDf.union(aggDf1)
unionDF.select("money").distinct.show // return correct result
unionDF.select("money").distinct.count // return 2 instead of 1
unionDF.select("money").distinct.count == 1 // return false
This block of code returns some assertion error and after that an incorrect 
count (in spark 3.2.1 everything works fine and i get correct result = 1):


*scala> unionDF.select("money").distinct.show // return correct result*
java.lang.AssertionError: assertion failed:
Decimal$DecimalIsFractional
while compiling: 
during phase: globalPhase=terminal, enteringPhase=jvm
library version: version 2.12.17
compiler version: version 2.12.17
reconstructed args: -classpath 
/Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
 -Yrepl-class-based -Yrepl-outdir 
/private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1

last tree to typer: TypeTree(class Byte)
tree position: line 6 of 
tree tpe: Byte
symbol: (final abstract) class Byte in package scala
symbol definition: final abstract class Byte extends (a ClassSymbol)
symbol package: scala
symbol owners: class Byte
call site: constructor $eval in object $eval in package $line19

== Source file context for tree position ==

3
4object $eval {
5lazyval $result = 
$line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
6lazyval $print: _root_.java.lang.String = {
7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
8
9""
at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
at 
scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
at 
scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
at 
scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
at 
scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
at 
scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357)
at 
scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96)
at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88)
at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47)
at 
scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1173)
at