[ 
https://issues.apache.org/jira/browse/HUDI-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2780:
---------------------------------
    Fix Version/s: 0.11.0

> Mor reads the log file and skips the complete block as a bad block, resulting 
> in data loss
> ------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2780
>                 URL: https://issues.apache.org/jira/browse/HUDI-2780
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: jing
>            Assignee: jing
>            Priority: Major
>              Labels: core-flow-ds, pull-request-available, sev:critical
>             Fix For: 0.11.0
>
>         Attachments: image-2021-11-17-15-45-33-031.png, 
> image-2021-11-17-15-46-04-313.png, image-2021-11-17-15-46-14-694.png
>
>
> Check the data in the middle of the bad block through debug, and find that 
> the lost data is in the offset of the bad block, but because of the eof skip 
> during the reading, the compact merge cannot be written to the parquet at 
> that time, but the deltacommit of the time is successful. There are two 
> consecutive hudi magic in the middle of the bad block. Reading blocksize in 
> the next digit actually reads the binary conversion of #HUDI# to 1227030528, 
> which means that the eof exception is reported when the file size is exceeded.
> !image-2021-11-17-15-45-33-031.png!
> Detect the position of the next block and skip the bad block. It should not 
> start from the position after reading the blocksize, but from the position 
> before reading the blocksize
> !image-2021-11-17-15-46-04-313.png!
> !image-2021-11-17-15-46-14-694.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to