LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689993828
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala ########## @@ -154,11 +155,18 @@ class OrcFileFormat (file: PartitionedFile) => { val conf = broadcastedConf.value.value + val metaCacheEnabled = + conf.getBoolean(SQLConf.FILE_META_CACHE_ORC_ENABLED.key, false) val filePath = new Path(new URI(file.filePath)) val fs = filePath.getFileSystem(conf) - val readerOptions = OrcFile.readerOptions(conf).filesystem(fs) + val readerOptions = if (metaCacheEnabled) { + val tail = OrcFileMeta.readTailFromCache(filePath, conf) + OrcFile.readerOptions(conf).filesystem(fs).orcTail(tail) Review comment: This is a very good question!!! If we want to handle it well, we need a relatively more complex design, such as adding a command to trigger the relevant cleaning of all executors when the file changes. However, if the file changes are not perceived by spark, I think wrong data will be read here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org