[ 
https://issues.apache.org/jira/browse/HUDI-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7209:
---------------------------------
    Labels: pull-request-available  (was: )

> Add configuration to skip not exists file in streaming read
> -----------------------------------------------------------
>
>                 Key: HUDI-7209
>                 URL: https://issues.apache.org/jira/browse/HUDI-7209
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: flink
>            Reporter: Ruguo Yu
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>         Attachments: 289447957-f25cda8d-e75c-4380-b660-8ad347c4a6ca.png
>
>
> In `streaming reading`, if there are a large number of files in metada, 
> especially archive files that are very old, then it is IO-intensive to 
> determine whether the file exists during the file traversal process. In 
> extreme cases, flink checkpoint may not be completed.
> !289447957-f25cda8d-e75c-4380-b660-8ad347c4a6ca.png|width=697,height=562!
> Another potential problem is that if deleted files are skipped by default, is 
> there a problem of missing data and the user is not aware of it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to