[ https://issues.apache.org/jira/browse/HUDI-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-7209: --------------------------------- Labels: pull-request-available (was: ) > Add configuration to skip not exists file in streaming read > ----------------------------------------------------------- > > Key: HUDI-7209 > URL: https://issues.apache.org/jira/browse/HUDI-7209 > Project: Apache Hudi > Issue Type: Improvement > Components: flink > Reporter: Ruguo Yu > Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: 289447957-f25cda8d-e75c-4380-b660-8ad347c4a6ca.png > > > In `streaming reading`, if there are a large number of files in metada, > especially archive files that are very old, then it is IO-intensive to > determine whether the file exists during the file traversal process. In > extreme cases, flink checkpoint may not be completed. > !289447957-f25cda8d-e75c-4380-b660-8ad347c4a6ca.png|width=697,height=562! > Another potential problem is that if deleted files are skipped by default, is > there a problem of missing data and the user is not aware of it? -- This message was sent by Atlassian Jira (v8.20.10#820010)