[ https://issues.apache.org/jira/browse/HUDI-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar updated HUDI-2780: --------------------------------- Parent: HUDI-2749 Issue Type: Sub-task (was: Bug) > Mor reads the log file and skips the complete block as a bad block, resulting > in data loss > ------------------------------------------------------------------------------------------ > > Key: HUDI-2780 > URL: https://issues.apache.org/jira/browse/HUDI-2780 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: jing > Assignee: jing > Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > Attachments: image-2021-11-17-15-45-33-031.png, > image-2021-11-17-15-46-04-313.png, image-2021-11-17-15-46-14-694.png > > > Check the data in the middle of the bad block through debug, and find that > the lost data is in the offset of the bad block, but because of the eof skip > during the reading, the compact merge cannot be written to the parquet at > that time, but the deltacommit of the time is successful. There are two > consecutive hudi magic in the middle of the bad block. Reading blocksize in > the next digit actually reads the binary conversion of #HUDI# to 1227030528, > which means that the eof exception is reported when the file size is exceeded. > !image-2021-11-17-15-45-33-031.png! > Detect the position of the next block and skip the bad block. It should not > start from the position after reading the blocksize, but from the position > before reading the blocksize > !image-2021-11-17-15-46-04-313.png! > !image-2021-11-17-15-46-14-694.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)