n3nash commented on issue #2098: URL: https://github.com/apache/hudi/issues/2098#issuecomment-696209149
@RajasekarSribalan A FileNotFound error indicates that you are reading a version of the parquet file that has been deleted or no longer exists. This can happen due to the following reason : Say your job in writing to the Hudi table every 15 mins and you have chosen to keep only the latest version of the parquet file. Now, your snapshot job runs every 1 hr and takes around 1 hr to finish. What can happen is that the snapshot job ends up reading an older version of the parquet file while the new version is being created by the ingestion job and the cleaner deletes the older version. Since your job was running for many days, it seems like either a) The frequency of the ingestion job to Hudi or the snapshot job to Hive changed b) The snapshot job runs for longer period of time causing file not found c) It was just working by chance To fix this issue, please make sure you keep enough number of file versions so that a long running job (like the snapshot job) can find the file it started to read in the first place. Please take a look at your configurations for the cleaner policy and then tune them using this config -> https://hudi.apache.org/docs/configurations.html#withCleanerPolicy ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org