[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-1723: --------------------------------- Labels: pull-request-available sev:critical user-support-issues (was: sev:critical user-support-issues) > DFSPathSelector skips files with the same modify date when read up to source > limit > ---------------------------------------------------------------------------------- > > Key: HUDI-1723 > URL: https://issues.apache.org/jira/browse/HUDI-1723 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer > Reporter: Raymond Xu > Priority: Blocker > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png > > > org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles > filters the input files based on last saved checkpoint, which was the > modification date from last read file. However, the last read file's > modification date could be duplicated for multiple files and resulted in > skipping a few of them when reading up to source limit. An illustration is > shown in the attached picture. -- This message was sent by Atlassian Jira (v8.3.4#803005)