[ https://issues.apache.org/jira/browse/HUDI-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Sumit closed HUDI-2772. ----------------------------- Resolution: Fixed > Deltastreamer fails to read checkpoint from previous commit metadata by spark > writer on continuous mode where there is no data in source > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: HUDI-2772 > URL: https://issues.apache.org/jira/browse/HUDI-2772 > Project: Apache Hudi > Issue Type: Task > Components: multi-writer > Reporter: sivabalan narayanan > Assignee: Harshal Patil > Priority: Major > Fix For: 0.14.0 > > > Even after setting the right config to copy over deltastreamer checkpoint, > deltastreamer fails to read the checkpoint from previous commit metadata. > This is not something that happens in general. In this case, in continuous > mode, there is no data in source (parquet dfs) folder and so deltatastreamer > continuously checks source folder and also loads last checkpoint from > timeline metadata. So, with this set up, when a write from spark-datasource > is triggered, deltastreamer immediately fails to read the checkpoint from the > completed spark-writer commit. But if deltastreamer is restarted, the > exception is not seen and picks up the checkpoint. > I induced a 1 sec delay in continuous mode and things were fine too. > > Setup: > Deltastreamer in continuous mode. source folder did not have any data, and so > deltastreamer was checking source folder and fetching latest checkpoint from > commit metadata in quick succession. > And triggered a concurrent write from spark-datasource. > > I inspected the last commit.completed instant(that was reported by > deltastreamer) made by spark writer and it looks ok to me. > {code:java} > grep "checkpoint" > /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074129737.deltacommit > "deltastreamer.checkpoint.key" : "1637066483000" {code} > But after the below exception, if I restart deltastreamer, it just runs fine. > Very strange? I was able to reprod this 2 times out of 5. > here is the checkpoint from last delta commit by deltastreamer (which matches > the entry found by delta commit by spark writer above) > {code:java} > grep "checkpoint" > /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074123384.deltacommit > "deltastreamer.checkpoint.key" : "1637066483000" {code} > > I also check detlastreamer code and we do look at only completed instants and > the completed commit metadata. So, not sure why is this happening. > stacktrace: > {code:java} > 21/11/16 10:51:15 WARN HoodieDeltaStreamer: Next round > 21/11/16 10:51:15 WARN DeltaSync: Extra metadata :: 20211116105105578, > 20211116105105578.deltacommit, = [schema, deltastreamer.checkpoint.key] > 21/11/16 10:51:15 WARN HoodieDeltaStreamer: Next round > 21/11/16 10:51:15 WARN DeltaSync: Extra metadata :: 20211116105105578, > 20211116105105578.deltacommit, = [schema, deltastreamer.checkpoint.key] > 21/11/16 10:51:15 WARN HoodieDeltaStreamer: Next round > 21/11/16 10:51:15 WARN DeltaSync: Extra metadata :: 20211116105112814, > 20211116105112814.deltacommit, = [] > 21/11/16 10:51:15 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to > exception > org.apache.hudi.utilities.exception.HoodieDeltaStreamerException: Unable to > find previous checkpoint. Please double check if this table was indeed built > via delta streamer. Last Commit > :Option{val=[20211116105112814__deltacommit__COMPLETED]}, Instants > :[[20211116104228269__deltacommit__COMPLETED], > [20211116104553080__deltacommit__COMPLETED], > [20211116104759622__deltacommit__COMPLETED], > [20211116105105578__deltacommit__COMPLETED], > [20211116105112814__deltacommit__COMPLETED]], CommitMetadata={ > "partitionToWriteStats" : { }, > "compacted" : false, > "extraMetadata" : { }, > "operationType" : "UNKNOWN", > "fileIdAndRelativePaths" : { }, > "totalRecordsDeleted" : 0, > "totalLogRecordsCompacted" : 0, > "totalLogFilesCompacted" : 0, > "totalCompactedRecordsUpdated" : 0, > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)