Maryann Xue created SPARK-25690: ----------------------------------- Summary: Analyzer rule "HandleNullInputsForUDF" does not stabilize and can be applied infinitely Key: SPARK-25690 URL: https://issues.apache.org/jira/browse/SPARK-25690 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 2.4.0 Reporter: Maryann Xue Assignee: Sean Owen Fix For: 2.4.0
A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix HandleNullInputsForUDF rule": {code:java} - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** Results do not match for query: ... == Results == == Results == !== Correct Answer - 3 == == Spark Answer - 3 == !struct<> struct<a:bigint,b:int,c:int> ![0,10,null] [0,10,0] ![1,12,null] [1,12,1] ![2,14,null] [2,14,2] (QueryTest.scala:163){code} You can kind of get what's going on reading the test: {code:java} test("SPARK-24891 Fix HandleNullInputsForUDF rule") { // assume(!ClosureCleanerSuite2.supportsLMFs) // This test won't test what it intends to in 2.12, as lambda metafactory closures // have arg types that are not primitive, but Object val udf1 = udf({(x: Int, y: Int) => x + y}) val df = spark.range(0, 3).toDF("a") .withColumn("b", udf1($"a", udf1($"a", lit(10)))) .withColumn("c", udf1($"a", lit(null))) val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed comparePlans(df.logicalPlan, plan) checkAnswer( df, Seq( Row(0, 10, null), Row(1, 12, null), Row(2, 14, null))) }{code} It seems that the closure that is fed in as a UDF changes behavior, in a way that primitive-type arguments are handled differently. For example an Int argument, when fed 'null', acts like 0. I'm sure it's a difference in the LMF closure and how its types are understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org