[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342178#comment-17342178 ]
Vinoth Chandar commented on HUDI-1723: -------------------------------------- yes. [~xushiyan] can we file an umbrella issue and file one for s3 and one for gcs [https://cloud.google.com/storage/docs/object-change-notification] > DFSPathSelector skips files with the same modify date when read up to source > limit > ---------------------------------------------------------------------------------- > > Key: HUDI-1723 > URL: https://issues.apache.org/jira/browse/HUDI-1723 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer > Reporter: Raymond Xu > Assignee: Raymond Xu > Priority: Blocker > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png > > > org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles > filters the input files based on last saved checkpoint, which was the > modification date from last read file. However, the last read file's > modification date could be duplicated for multiple files and resulted in > skipping a few of them when reading up to source limit. An illustration is > shown in the attached picture. -- This message was sent by Atlassian Jira (v8.3.4#803005)