yihua opened a new issue, #18002:
URL: https://github.com/apache/hudi/issues/18002

   ### Bug Description
   
   **What happened:**
   In certain cases, an incremental query with full scan mode in MOR table 
fails on Databricks Runtime with the following exception.  Such cases include: 
(1) start instant time is in archival timeline, (2) start instant time is in 
active timeline, but some files of the incremental commit range are not 
available due to cleaning or compaction. If the start and end instant time are 
in active timeline and all files in the commit range are available, the 
incremental query succeeds.
   
   ```
   spark.read.format("org.apache.hudi")
     .option("hoodie.datasource.query.type", "incremental")
     .option("hoodie.datasource.read.begin.instanttime", "20260120221843052")
     .option("hoodie.datasource.read.end.instanttime", "20260120221843053")
     .option("hoodie.metadata.enable", "false")
     .option("hoodie.datasource.read.incr.fallback.fulltablescan.enable", 
"true")
     .option("hoodie.datasource.read.incr.path.glob", "san_francisco/*")
     .load("s3a://dbr-test/hudi_mor_v6").show(false)
   ```
   
   ```
   NoSuchMethodError: 'org.apache.hadoop.fs.FileStatus 
org.apache.spark.sql.execution.datasources.FileStatusWithMetadata.fileStatus()'
   at 
org.apache.spark.sql.execution.datasources.HoodieSpark35PartitionedFileUtils$.$anonfun$toFileStatuses$2(HoodieSpark35PartitionedFileUtils.scala:48)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.sql.execution.datasources.HoodieSpark35PartitionedFileUtils$.toFileStatuses(HoodieSpark35PartitionedFileUtils.scala:48)
        at 
org.apache.hudi.HoodieBaseRelation.listLatestFileSlices(HoodieBaseRelation.scala:428)
        at 
org.apache.hudi.MergeOnReadIncrementalRelationV1.listFileSplits(MergeOnReadIncrementalRelationV1.scala:131)
        at 
org.apache.hudi.HoodieIncrementalFileIndex.listFiles(HoodieIncrementalFileIndex.scala:47)
        at 
org.apache.spark.sql.execution.datasources.FileIndex.listPartitionDirectoriesAndFiles(FileIndex.scala:234)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.listFiles(DataSourceScanExec.scala:972)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.listFiles$(DataSourceScanExec.scala:954)
        at 
org.apache.spark.sql.execution.FileSourceScanExec.listFiles(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.$anonfun$_selectedPartitions$2(DataSourceScanExec.scala:1046)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike._selectedPartitions(DataSourceScanExec.scala:1038)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike._selectedPartitions$(DataSourceScanExec.scala:1037)
        at 
org.apache.spark.sql.execution.FileSourceScanExec._selectedPartitions$lzycompute(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.FileSourceScanExec._selectedPartitions(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.setDriverMetricsForSelectedPartitions(DataSourceScanExec.scala:1064)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.selectedPartitions(DataSourceScanExec.scala:1069)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.selectedPartitions$(DataSourceScanExec.scala:1068)
        at 
org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.$anonfun$_dynamicallySelectedPartitions$1(DataSourceScanExec.scala:1157)
        at 
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike._dynamicallySelectedPartitions(DataSourceScanExec.scala:1078)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike._dynamicallySelectedPartitions$(DataSourceScanExec.scala:1076)
        at 
org.apache.spark.sql.execution.FileSourceScanExec._dynamicallySelectedPartitions$lzycompute(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.FileSourceScanExec._dynamicallySelectedPartitions(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.$anonfun$dynamicallySelectedPartitions$2(DataSourceScanExec.scala:1192)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.dynamicallySelectedPartitions(DataSourceScanExec.scala:1191)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.dynamicallySelectedPartitions$(DataSourceScanExec.scala:1190)
        at 
org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.finalSelectedPartitions(DataSourceScanExec.scala:1230)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.finalSelectedPartitions$(DataSourceScanExec.scala:1230)
        at 
org.apache.spark.sql.execution.FileSourceScanExec.finalSelectedPartitions(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.totalFinalSelectedPartitionFileSize(DataSourceScanExec.scala:1219)
        at 
org.apache.spark.sql.execution.SparkOrAetherFileSourceScanLike.totalFinalSelectedPartitionFileSize$(DataSourceScanExec.scala:1219)
        at 
org.apache.spark.sql.execution.FileSourceScanExec.totalFinalSelectedPartitionFileSize$lzycompute(DataSourceScanExec.scala:3187)
        at 
org.apache.spark.sql.execution.FileSourceScanExec.totalFinalSelectedPartitionFileSize(DataSourceScanExec.scala:3187)
        at 
com.databricks.sql.transaction.tahoe.metering.DeltaMetering$.$anonfun$reportUsage$3(DeltaMetering.scala:656)
        at 
com.databricks.sql.transaction.tahoe.metering.DeltaMetering$.$anonfun$reportUsage$3$adapted(DeltaMetering.scala:251)
        at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at 
com.databricks.sql.transaction.tahoe.metering.DeltaMetering$.reportUsage(DeltaMetering.scala:251)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$10(SQLExecution.scala:651)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:810)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:352)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1481)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:217)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:747)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:5032)
        at org.apache.spark.sql.Dataset.head(Dataset.scala:3772)
        at org.apache.spark.sql.Dataset.take(Dataset.scala:4007)
        at org.apache.spark.sql.Dataset.getRows(Dataset.scala:460)
        at org.apache.spark.sql.Dataset.showString(Dataset.scala:496)
        at org.apache.spark.sql.Dataset.show(Dataset.scala:1113)
        at org.apache.spark.sql.Dataset.show(Dataset.scala:1090)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-7721031122083688:11)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-7721031122083688:56)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$$iw$$iw$$iw$$iw.<init>(command-7721031122083688:58)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$$iw$$iw$$iw.<init>(command-7721031122083688:60)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$$iw$$iw.<init>(command-7721031122083688:62)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$$iw.<init>(command-7721031122083688:64)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read.<init>(command-7721031122083688:66)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$.<init>(command-7721031122083688:70)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$read$.<clinit>(command-7721031122083688)
        at 
$lineb8d76de79b95437d99b85019c4eadcc525.$eval$.$print$lzycompute(<notebook>:7)
        at $lineb8d76de79b95437d99b85019c4eadcc525.$eval$.$print(<notebook>:6)
        at $lineb8d76de79b95437d99b85019c4eadcc525.$eval.$print(<notebook>)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747)
        at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020)
        at 
scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568)
        at 
scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
        at 
scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
        at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564)
        at 
com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:201)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$3(ScalaDriverLocal.scala:296)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.threadSafeTrapExit(DriverLocal.scala:1811)
        at 
com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:1769)
        at 
com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:1660)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal.executeCommand$1(ScalaDriverLocal.scala:296)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$2(ScalaDriverLocal.scala:265)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at scala.Console$.withErr(Console.scala:196)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:262)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at scala.Console$.withOut(Console.scala:167)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:262)
        at 
com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$36(DriverLocal.scala:1321)
        at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:133)
        at 
com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$30(DriverLocal.scala:1312)
        at 
com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
        at 
com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at 
com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
        at 
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:130)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:96)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:77)
        at 
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:130)
        at 
com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$1(DriverLocal.scala:1236)
        at 
com.databricks.backend.daemon.driver.DriverLocal$.$anonfun$maybeSynchronizeExecution$4(DriverLocal.scala:1721)
        at 
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:879)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$2(DriverWrapper.scala:1054)
        at scala.util.Try$.apply(Try.scala:213)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:1043)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$3(DriverWrapper.scala:1089)
        at 
com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:616)
        at 
com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:643)
        at 
com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
        at 
com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at 
com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.withAttributionContext(DriverWrapper.scala:81)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:96)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:77)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.withAttributionTags(DriverWrapper.scala:81)
        at 
com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:611)
        at 
com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:519)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.recordOperationWithResultTags(DriverWrapper.scala:81)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:1089)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:766)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:859)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$runInnerLoop$1(DriverWrapper.scala:630)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
        at 
com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at 
com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
        at 
com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.withAttributionContext(DriverWrapper.scala:81)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:625)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:548)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:373)
        at java.base/java.lang.Thread.run(Thread.java:840)
   ```
   
   **What you expected:**
   The incremental should work on Databricks Runtime.
   
   **Steps to reproduce:**
   1. Create a Databricks Compute Cluster with Databricks Runtime 16.4 LTS 
(Spark 3.5, Scala 2.12) and Spark configs below with 
hudi_spark3_5_bundle_2_12_1_1_1.jar
   2. Create a Hudi MOR table with a few delta commits
   3. Run the incremental query above with start instant time before the start 
of the active timeline
   
   
   ### Environment
   
   **Hudi version:** 1.1.1, master
   **Query engine:** (Spark/Flink/Trino etc) Databricks Spark Runtime 16.4 LTS 
(Spark 3.5, Scala 2.12/2.13), 17.3 LTS (Spark 4.0, Scala 2.13)
   **Relevant configs:**
   Spark configs for compute cluster:
   ```
   spark.serializer org.apache.spark.serializer.KryoSerializer
   spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
   spark.sql.catalog.spark_catalog 
org.apache.spark.sql.hudi.catalog.HoodieCatalog
   spark.kryo.registrator org.apache.spark.HoodieSparkKryoRegistrar
   spark.jars 
dbfs:/FileStore/jars/hudi_spark3_5_bundle_2_12_1_1_1.jar,dbfs:/FileStore/jars/aws_java_sdk_bundle_1_12_48.jar,dbfs:/FileStore/jars/hadoop_aws_3_3_1.jar
   spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
   spark.executor.userClassPathFirst true
   spark.driver.userClassPathFirst true
   ```
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to