This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new 4fd5e770209 [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail 4fd5e770209 is described below commit 4fd5e770209ee4eb1c3eb5c0210588362cb53966 Author: Wenchen Fan <wenc...@databricks.com> AuthorDate: Tue Aug 15 10:55:57 2023 +0800 [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/41448 . As an optimizer rule, the produced plan should be resolved and resolved expressions should be able to report data type. The `Instruction` expression fails to report data type and may break external optimizer rules. This PR makes it to return dummy NullType. ### Why are the changes needed? to not break external optimizer rules. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #42482 from cloud-fan/merge. Authored-by: Wenchen Fan <wenc...@databricks.com> Signed-off-by: Wenchen Fan <wenc...@databricks.com> (cherry picked from commit c9ff70253999bf396b07eec84a3a86ca3191efa3) Signed-off-by: Wenchen Fan <wenc...@databricks.com> --- .../org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala index e8888cca758..9b1c8bc733a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/MergeRows.scala @@ -21,7 +21,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expre import org.apache.spark.sql.catalyst.plans.logical.MergeRows.{Instruction, ROW_ID} import org.apache.spark.sql.catalyst.trees.UnaryLike import org.apache.spark.sql.catalyst.util.truncatedString -import org.apache.spark.sql.types.DataType +import org.apache.spark.sql.types.{DataType, NullType} case class MergeRows( isSourceRowPresent: Expression, @@ -74,7 +74,11 @@ object MergeRows { def condition: Expression def outputs: Seq[Seq[Expression]] override def nullable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException("dataType") + // We return NullType here as only the `MergeRows` operator can contain `Instruction` + // expressions and it doesn't care about the data type. Some external optimizer rules may + // assume optimized plan is always resolved and Expression#dataType is always available, so + // we can't just fail here. + override def dataType: DataType = NullType } case class Keep(condition: Expression, output: Seq[Expression]) extends Instruction { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org