[ https://issues.apache.org/jira/browse/SPARK-34583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kondziolka9ld updated SPARK-34583: ---------------------------------- Description: Please consider a following scenario: {code:java} scala> abstract class SomeAbstractClass { | type SomeTypeMember | } defined class SomeAbstractClassscala> class SomeSpecificClass extends SomeAbstractClass { | override type SomeTypeMember = Int | } defined class SomeSpecificClassscala> def someFunction(someInstance: SomeAbstractClass): Any = { | udf { _: someInstance.SomeTypeMember => 42 } | } someFunction: (someInstance: SomeAbstractClass)Any scala> someFunction(new SomeSpecificClass) java.lang.NoClassDefFoundError: no Java class corresponding to someInstance.SomeTypeMember found at scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354) at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227) at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56) at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.sql.functions$.udf(functions.scala:4509) at someFunction(<console>:25) ... 47 elided {code} On *spark-2.4.7* it works. I guess that it is related to: {code:java} In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default. {code} [https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions] Does spark try to do some type inferation? When it refers to `SomeAbstractClass.SomeTypeMember` it really does not exist. Some workaround could be runtime type casting, something like: {code:java} udf { param: Any => { ... param.asInstanceOf[someInstance.SomeTypeMember] ... } {code} ---- I classified it as bug since on previous versions of spark it worked. However, I believe that it can work as designed. was: Please consider a following scenario: {code:java} scala> abstract class SomeAbstractClass { | type SomeTypeMember | } defined class SomeAbstractClassscala> class SomeSpecificClass extends SomeAbstractClass { | override type SomeTypeMember = Int | } defined class SomeSpecificClassscala> def someFunction(someInstance: SomeAbstractClass): Any = { | udf { _: someInstance.SomeTypeMember => 42 } | } someFunction: (someInstance: SomeAbstractClass)Any scala> someFunction(new SomeSpecificClass) java.lang.NoClassDefFoundError: no Java class corresponding to someInstance.SomeTypeMember found at scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354) at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227) at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56) at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.sql.functions$.udf(functions.scala:4509) at someFunction(<console>:25) ... 47 elided {code} On *spark-2.4.7* it works. I guess that it is related to: {code:java} In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default. {code} [https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions] Does spark try to do some type inferation? When it refers to `SomeAbstractClass.SomeTypeMember` it really does not exist. Some workaround could be runtime type casting, something like: {code:java} udf { param: Any => { ... param.asInstanceOf[someInstance.SomeTypeMember] ... } {code} ---- I classified it as bug since on previous versions of spark it worked. However, I believe that it can work as designed. > typed udf fails when it refers to type member in abstract class > --------------------------------------------------------------- > > Key: SPARK-34583 > URL: https://issues.apache.org/jira/browse/SPARK-34583 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.1 > Reporter: kondziolka9ld > Priority: Minor > > Please consider a following scenario: > {code:java} > scala> abstract class SomeAbstractClass { > | type SomeTypeMember > | } > defined class SomeAbstractClassscala> class SomeSpecificClass extends > SomeAbstractClass { > | override type SomeTypeMember = Int > | } > defined class SomeSpecificClassscala> def someFunction(someInstance: > SomeAbstractClass): Any = { > | udf { _: someInstance.SomeTypeMember => 42 } > | } > someFunction: (someInstance: SomeAbstractClass)Any > scala> someFunction(new SomeSpecificClass) > java.lang.NoClassDefFoundError: no Java class corresponding to > someInstance.SomeTypeMember found > at > scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354) > at > scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227) > at > scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56) > at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509) > at scala.util.Try$.apply(Try.scala:213) > at org.apache.spark.sql.functions$.udf(functions.scala:4509) > at someFunction(<console>:25) > ... 47 elided > {code} > On *spark-2.4.7* it works. I guess that it is related to: > {code:java} > In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is > not allowed by default. Remove the return type parameter to automatically > switch to typed Scala udf is recommended, or set > spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark > version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, > DataType) gets a Scala closure with primitive-type argument, the returned UDF > returns null if the input values is null. However, in Spark 3.0, the UDF > returns the default value of the Java type if the input value is null. For > example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in > Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This > behavior change is introduced because Spark 3.0 is built with Scala 2.12 by > default. > {code} > [https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions] > Does spark try to do some type inferation? When it refers to > `SomeAbstractClass.SomeTypeMember` it really does not exist. > Some workaround could be runtime type casting, something like: > {code:java} > udf { param: Any => { > ... > param.asInstanceOf[someInstance.SomeTypeMember] > ... > } {code} > ---- > I classified it as bug since on previous versions of spark it worked. > However, I believe that it can work as designed. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org