n3nash commented on issue #2098:
URL: https://github.com/apache/hudi/issues/2098#issuecomment-696209149


   @RajasekarSribalan A FileNotFound error indicates that you are reading a 
version of the parquet file that has been deleted or no longer exists. This can 
happen due to the following reason : Say your job in writing to the Hudi table 
every 15 mins and you have chosen to keep only the latest version of the 
parquet file. Now, your snapshot job runs every 1 hr and takes around 1 hr to 
finish. What can happen is that the snapshot job ends up reading an older 
version of the parquet file while the new version is being created by the 
ingestion job and the cleaner deletes the older version.
   Since your job was running for many days, it seems like either a) The 
frequency of the ingestion job to Hudi or the snapshot job to Hive changed b) 
The snapshot job runs for longer period of time causing file not found c) It 
was just working by chance
   To fix this issue, please make sure you keep enough number of file versions 
so that a long running job (like the snapshot job) can find the file it started 
to read in the first place. 
   Please take a look at your configurations for the cleaner policy and then 
tune them using this config -> 
https://hudi.apache.org/docs/configurations.html#withCleanerPolicy


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to