[ https://issues.apache.org/jira/browse/HDFS-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721199#comment-17721199 ]
farmmamba commented on HDFS-17002: ---------------------------------- [~ayushtkn] , Yes, sir. This is not a bug. The type of this Jira is just an improvement. Yes, client won't read parity blocks when all data blocks are healthy. When the DirectoryScanner is not working, we know nothing about parity blocks even they got screwed. So, I am thinking about whether we should to sample to check the correctness of the parity blocks with some probability when reading ec files or some other methods to prevent the parity blocks break down silently. > Erasure coding:Generate parity blocks in time to prevent file corruption > ------------------------------------------------------------------------ > > Key: HDFS-17002 > URL: https://issues.apache.org/jira/browse/HDFS-17002 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding > Affects Versions: 3.4.0 > Reporter: farmmamba > Priority: Major > > In current EC implementation, the corrupted parity block will not be > regenerated in time. > Think about below scene when using RS-6-3-1024k EC policy: > If three parity blocks p1, p2, p3 are all corrupted or deleted, we are not > aware of it. > Unfortunately, a data block is also corrupted in this time period, then this > file will be corrupted and can not be read by decoding. > > So, here we should always re-generate parity block in time when it is > unhealthy. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org