[spark] branch branch-3.5 updated: [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail

wenchen Mon, 14 Aug 2023 19:56:43 -0700

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 4fd5e770209 [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should 
not fail
4fd5e770209 is described below

commit 4fd5e770209ee4eb1c3eb5c0210588362cb53966
Author: Wenchen Fan <wenc...@databricks.com>
AuthorDate: Tue Aug 15 10:55:57 2023 +0800

    [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail
    
    ### What changes were proposed in this pull request?
    
    This is a followup of https://github.com/apache/spark/pull/41448 . As an 
optimizer rule, the produced plan should be resolved and resolved expressions 
should be able to report data type. The `Instruction` expression fails to 
report data type and may break external optimizer rules. This PR makes it to 
return dummy NullType.
    
    ### Why are the changes needed?
    
    to not break external optimizer rules.
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    existing tests
    
    Closes #42482 from cloud-fan/merge.
    
    Authored-by: Wenchen Fan <wenc...@databricks.com>
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
    (cherry picked from commit c9ff70253999bf396b07eec84a3a86ca3191efa3)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
---
 .../org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala   | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala
index e8888cca758..9b1c8bc733a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala
@@ -21,7 +21,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeSet, Expre
 import org.apache.spark.sql.catalyst.plans.logical.MergeRows.{Instruction, 
ROW_ID}
 import org.apache.spark.sql.catalyst.trees.UnaryLike
 import org.apache.spark.sql.catalyst.util.truncatedString
-import org.apache.spark.sql.types.DataType
+import org.apache.spark.sql.types.{DataType, NullType}
 
 case class MergeRows(
     isSourceRowPresent: Expression,
@@ -74,7 +74,11 @@ object MergeRows {
     def condition: Expression
     def outputs: Seq[Seq[Expression]]
     override def nullable: Boolean = false
-    override def dataType: DataType = throw new 
UnsupportedOperationException("dataType")
+    // We return NullType here as only the `MergeRows` operator can contain 
`Instruction`
+    // expressions and it doesn't care about the data type. Some external 
optimizer rules may
+    // assume optimized plan is always resolved and Expression#dataType is 
always available, so
+    // we can't just fail here.
+    override def dataType: DataType = NullType
   }
 
   case class Keep(condition: Expression, output: Seq[Expression]) extends 
Instruction {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail

Reply via email to