[jira] [Updated] (SPARK-34583) typed udf fails when it refers to type member in abstract class

kondziolka9ld (Jira) Mon, 01 Mar 2021 04:18:34 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-34583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


kondziolka9ld updated SPARK-34583:
----------------------------------
    Description: 
 Please consider a following scenario: 
{code:java}
scala> abstract class SomeAbstractClass {
     |   type SomeTypeMember
     | }
defined class SomeAbstractClassscala> class SomeSpecificClass extends 
SomeAbstractClass {
     |   override type SomeTypeMember = Int
     | }
defined class SomeSpecificClassscala> def someFunction(someInstance: 
SomeAbstractClass): Any = {
     |   udf { _: someInstance.SomeTypeMember => 42 }
     | }
someFunction: (someInstance: SomeAbstractClass)Any
scala> someFunction(new SomeSpecificClass)
java.lang.NoClassDefFoundError: no Java class corresponding to 
someInstance.SomeTypeMember found
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
  at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509)
  at scala.util.Try$.apply(Try.scala:213)
  at org.apache.spark.sql.functions$.udf(functions.scala:4509)
  at someFunction(<console>:25)
  ... 47 elided

{code}
On *spark-2.4.7* it works. I guess that it is related to:
{code:java}
In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not 
allowed by default. Remove the return type parameter to automatically switch to 
typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to 
true to keep using it. In Spark version 2.4 and below, if 
org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with 
primitive-type argument, the returned UDF returns null if the input values is 
null. However, in Spark 3.0, the UDF returns the default value of the Java type 
if the input value is null. For example, val f = udf((x: Int) => x, 
IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, 
and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 
is built with Scala 2.12 by default.
{code}
[https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions]

Does spark try to do some type inferation? When it refers to 
`SomeAbstractClass.SomeTypeMember` it really does not exist.

Some workaround could be runtime type casting, something like: 
{code:java}
udf { param: Any => {
     ...
     param.asInstanceOf[someInstance.SomeTypeMember]
     ...  
} {code}
----
 I classified it as bug since on previous versions of spark it worked. However, 
I believe that it can work as designed.

  was:
 Please consider a following scenario: 
{code:java}
scala> abstract class SomeAbstractClass {
     |   type SomeTypeMember
     | }
defined class SomeAbstractClassscala> class SomeSpecificClass extends 
SomeAbstractClass {
     |   override type SomeTypeMember = Int
     | }
defined class SomeSpecificClassscala> def someFunction(someInstance: 
SomeAbstractClass): Any = {
     |   udf { _: someInstance.SomeTypeMember => 42 }
     | }
someFunction: (someInstance: SomeAbstractClass)Any
scala> someFunction(new SomeSpecificClass)
java.lang.NoClassDefFoundError: no Java class corresponding to 
someInstance.SomeTypeMember found
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
  at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509)
  at scala.util.Try$.apply(Try.scala:213)
  at org.apache.spark.sql.functions$.udf(functions.scala:4509)
  at someFunction(<console>:25)
  ... 47 elided

{code}
On *spark-2.4.7* it works. I guess that it is related to:
{code:java}
In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not 
allowed by default. Remove the return type parameter to automatically switch to 
typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to 
true to keep using it. In Spark version 2.4 and below, if 
org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with 
primitive-type argument, the returned UDF returns null if the input values is 
null. However, in Spark 3.0, the UDF returns the default value of the Java type 
if the input value is null. For example, val f = udf((x: Int) => x, 
IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, 
and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 
is built with Scala 2.12 by default.
{code}
[https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions]

Does spark try to do some type inferation? When it refers to 
`SomeAbstractClass.SomeTypeMember` it really does not exist.

Some workaround could be runtime type casting, something like:

 
{code:java}
udf { param: Any => {
     ...
     param.asInstanceOf[someInstance.SomeTypeMember]
     ...  
}
{code}
 
----
 

I classified it as bug since on previous versions of spark it worked. However, 
I believe that it can work as designed.


> typed udf fails when it refers to type member in abstract class
> ---------------------------------------------------------------
>
>                 Key: SPARK-34583
>                 URL: https://issues.apache.org/jira/browse/SPARK-34583
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.1
>            Reporter: kondziolka9ld
>            Priority: Minor
>
>  Please consider a following scenario: 
> {code:java}
> scala> abstract class SomeAbstractClass {
>      |   type SomeTypeMember
>      | }
> defined class SomeAbstractClassscala> class SomeSpecificClass extends 
> SomeAbstractClass {
>      |   override type SomeTypeMember = Int
>      | }
> defined class SomeSpecificClassscala> def someFunction(someInstance: 
> SomeAbstractClass): Any = {
>      |   udf { _: someInstance.SomeTypeMember => 42 }
>      | }
> someFunction: (someInstance: SomeAbstractClass)Any
> scala> someFunction(new SomeSpecificClass)
> java.lang.NoClassDefFoundError: no Java class corresponding to 
> someInstance.SomeTypeMember found
>   at 
> scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354)
>   at 
> scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227)
>   at 
> scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
>   at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509)
>   at scala.util.Try$.apply(Try.scala:213)
>   at org.apache.spark.sql.functions$.udf(functions.scala:4509)
>   at someFunction(<console>:25)
>   ... 47 elided
> {code}
> On *spark-2.4.7* it works. I guess that it is related to:
> {code:java}
> In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is 
> not allowed by default. Remove the return type parameter to automatically 
> switch to typed Scala udf is recommended, or set 
> spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark 
> version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, 
> DataType) gets a Scala closure with primitive-type argument, the returned UDF 
> returns null if the input values is null. However, in Spark 3.0, the UDF 
> returns the default value of the Java type if the input value is null. For 
> example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in 
> Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This 
> behavior change is introduced because Spark 3.0 is built with Scala 2.12 by 
> default.
> {code}
> [https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions]
> Does spark try to do some type inferation? When it refers to 
> `SomeAbstractClass.SomeTypeMember` it really does not exist.
> Some workaround could be runtime type casting, something like: 
> {code:java}
> udf { param: Any => {
>      ...
>      param.asInstanceOf[someInstance.SomeTypeMember]
>      ...  
> } {code}
> ----
>  I classified it as bug since on previous versions of spark it worked. 
> However, I believe that it can work as designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34583) typed udf fails when it refers to type member in abstract class

Reply via email to