kondziolka9ld created SPARK-34583:
-------------------------------------

             Summary: typed udf fails when it refers to type member in abstract 
class
                 Key: SPARK-34583
                 URL: https://issues.apache.org/jira/browse/SPARK-34583
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: kondziolka9ld


 Please consider a following scenario: 
{code:java}
scala> abstract class SomeAbstractClass {
     |   type SomeTypeMember
     | }
defined class SomeAbstractClassscala> class SomeSpecificClass extends 
SomeAbstractClass {
     |   override type SomeTypeMember = Int
     | }
defined class SomeSpecificClassscala> def someFunction(someInstance: 
SomeAbstractClass): Any = {
     |   udf { _: someInstance.SomeTypeMember => 42 }
     | }
someFunction: (someInstance: SomeAbstractClass)Any
scala> someFunction(new SomeSpecificClass)
java.lang.NoClassDefFoundError: no Java class corresponding to 
someInstance.SomeTypeMember found
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
  at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509)
  at scala.util.Try$.apply(Try.scala:213)
  at org.apache.spark.sql.functions$.udf(functions.scala:4509)
  at someFunction(<console>:25)
  ... 47 elided

{code}
On *spark-2.4.7* it works. I guess that it is related to:
{code:java}
In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not 
allowed by default. Remove the return type parameter to automatically switch to 
typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to 
true to keep using it. In Spark version 2.4 and below, if 
org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with 
primitive-type argument, the returned UDF returns null if the input values is 
null. However, in Spark 3.0, the UDF returns the default value of the Java type 
if the input value is null. For example, val f = udf((x: Int) => x, 
IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, 
and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 
is built with Scala 2.12 by default.
{code}
[https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions]

Does spark try to do some type inferation? When it refers to 
`SomeAbstractClass.SomeTypeMember` it really does not exist.

Some workaround could be runtime type casting, something like:

 
{code:java}
udf { param: Any => {
     ...
     param.asInstanceOf[someInstance.SomeTypeMember]
     ...  
}
{code}
 
----
 

I classified it as bug since on previous versions of spark it worked. However, 
I believe that it can work as designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to