[
https://issues.apache.org/jira/browse/SPARK-49515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912324#comment-17912324
]
Jungtaek Lim edited comment on SPARK-49515 at 1/13/25 2:46 AM:
---------------------------------------------------------------
> I guess problem still exists when falling back to "old way"
My fix is actually performing "double" fallback. If the old way does not work
after falling back, it just picks up the result of new way, because the reason
fallback is triggered might be due to valid pruning, like the example in this
Jira ticket.
Unfortunately, this requires changes to the constructor of catalyst classes,
which is a sort of breaking change (catalyst is private API but people has been
using it), hence we can't port the fix back.
was (Author: kabhwan):
> I guess problem still exists when falling back to "old way"
My fix is actually performing "double" fallback. If the old way does not work
after falling back, it just picks up the result of new way, because the reason
fallback is triggered is due to valid pruning, like the example in this Jira
ticket.
Unfortunately, this requires changes to the constructor of catalyst classes,
which is a sort of breaking change (catalyst is private API but people has been
using it), hence we can't port the fix back.
> numInputRows is 0 when one of input relation is empty
> -----------------------------------------------------
>
> Key: SPARK-49515
> URL: https://issues.apache.org/jira/browse/SPARK-49515
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.4.0
> Environment: Reproducible in Spark 3.4 , probably all versions after
> introducing org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation
>
> Reporter: Eunjin Song
> Priority: Trivial
> Attachments: Query 0.png, Query 1.png, Query 2.png
>
>
> [https://github.com/apache/spark/blob/2ed6c3e511f322c5fd01953736c376a85ff2c687/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L481]
>
> In case # relation of logical plan != # relation of physical plan,
> numInputRows captured as 0 for all sources. However it includes the case that
> empty relations in union is removed from physical plan by
> "org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation".
>
> Repro code:
> import org.apache.spark.sql.streaming.Trigger
> val df1 = spark.readStream.schema(sch).parquet("/streaming/parquet1")
> val df2 = spark.readStream.schema(sch).parquet("/streaming/emptyparquet")
> df1.union(df2).writeStream.option("checkpointLocation",
> "/streaming/checkpoint8").format("parquet").trigger(Trigger.Once()).option("path",
> "/streaming/result9").start
>
> workaround:
> spark.conf.set("spark.sql.optimizer.excludedRules","org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation")
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]