This is an automated email from the ASF dual-hosted git repository.
chengpan pushed a commit to branch branch-1.10
in repository https://gitbox.apache.org/repos/asf/kyuubi.git
The following commit(s) were added to refs/heads/branch-1.10 by this push:
new 6744258dfc [KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2
6744258dfc is described below
commit 6744258dfc61e65110eb37f52144ad75d521646b
Author: zhaohehuhu <[email protected]>
AuthorDate: Thu May 29 13:25:55 2025 +0800
[KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2
### Why are the changes needed?
To enhance the MaxScanStrategy in Spark's DSv2 to ensure it only works for
relations that support statistics reporting. This prevents Spark from returning
a default value of Long.MaxValue, which, leads to some queries failing or
behaving unexpectedly.
### How was this patch tested?
It tested out locally.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #7077 from zhaohehuhu/dev-0527.
Closes #7077
64001c94e [zhaohehuhu] fix MaxScanStrategy for datasource v2
Authored-by: zhaohehuhu <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit bcaff5a3f1232945cf7d029b0db13feca51b3d9b)
Signed-off-by: Cheng Pan <[email protected]>
---
.../main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git
a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
index e647ad3250..e8144f25ae 100644
---
a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
+++
b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
@@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.SQLConfHelper
import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
import org.apache.spark.sql.catalyst.planning.ScanOperation
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connector.read.SupportsReportStatistics
import org.apache.spark.sql.execution.SparkPlan
import org.apache.spark.sql.execution.datasources.{CatalogFileIndex,
HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
import org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation
@@ -237,7 +238,7 @@ case class MaxScanStrategy(session: SparkSession)
_,
_,
_,
- relation @ DataSourceV2ScanRelation(_, _, _, _, _)) =>
+ relation @ DataSourceV2ScanRelation(_, _:
SupportsReportStatistics, _, _, _)) =>
val table = relation.relation.table
if (table.partitioning().nonEmpty) {
val partitionColumnNames = table.partitioning().map(_.describe())