huangxiaopingRD opened a new pull request, #8352:
URL: https://github.com/apache/hudi/pull/8352

   ### Change Logs
   
   Spark will cache some meta information of the table. After the 
RollbackToInstantTimeProcedure is executed on the table, the meta information 
will change and the table needs to be refreshed. Otherwise, the following error 
will occur when querying the data again:
   
   ```
   Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://xxxxx/user/hive/warehouse/hudi_cow_nonpcf_tbl2/7a19abfb-35ab-40bb-9580-6b1af681506a-0_0-23-20_20230402002001284.parquet
   It is possible the underlying files have been updated. You can explicitly 
invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in 
SQL or by recreating the Dataset/DataFrame involved.
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:187)
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
        at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
   ```
   
   ### Impact
   
   No
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to