[ https://issues.apache.org/jira/browse/HDFS-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jing Zhao updated HDFS-8602: ---------------------------- Attachment: HDFS-8602.000.patch Thanks very much for reporting the issue and working on this, [~kaisasak]! I also did some debugging on the issue. Looks like the cause is a deadlock: after hitting the exception while reading the corrupted block, {{readToBuffer}} tries to print out some warning msg during which {{getCurrentBlock}} is called. {{getCurrentBlock}} needs to acquire the inputstream's lock, which is currently held by the main thread, and the main thread is waiting for the response from the reading threads. The patch includes a simple fix and also a unit test that can reproduce the issue ({{testReadCorruptedData2}}). > Erasure Coding: Client can't read(decode) the EC files which have corrupt > blocks. > --------------------------------------------------------------------------------- > > Key: HDFS-8602 > URL: https://issues.apache.org/jira/browse/HDFS-8602 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Takanobu Asanuma > Assignee: Kai Sasaki > Fix For: HDFS-7285 > > Attachments: HDFS-8602.000.patch > > > Before the DataNode(s) reporting bad block(s), when Client reads the EC file > which has bad blocks, Client gets hung up. And there are no error messages. > (When Client reads the replicated file which has bad blocks, the bad blocks > are reconstructed at the same time, and Client can reads it.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)