[ 
https://issues.apache.org/jira/browse/HDFS-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8602:
----------------------------
    Attachment: HDFS-8602.000.patch

Thanks very much for reporting the issue and working on this, [~kaisasak]!

I also did some debugging on the issue. Looks like the cause is a deadlock: 
after hitting the exception while reading the corrupted block, {{readToBuffer}} 
tries to print out some warning msg during which {{getCurrentBlock}} is called. 
{{getCurrentBlock}} needs to acquire the inputstream's lock, which is currently 
held by the main thread, and the main thread is waiting for the response from 
the reading threads.

The patch includes a simple fix and also a unit test that can reproduce the 
issue ({{testReadCorruptedData2}}).

> Erasure Coding: Client can't read(decode) the EC files which have corrupt 
> blocks.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-8602
>                 URL: https://issues.apache.org/jira/browse/HDFS-8602
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Takanobu Asanuma
>            Assignee: Kai Sasaki
>             Fix For: HDFS-7285
>
>         Attachments: HDFS-8602.000.patch
>
>
> Before the DataNode(s) reporting bad block(s), when Client reads the EC file 
> which has bad blocks, Client gets hung up. And there are no error messages.
> (When Client reads the replicated file which has bad blocks, the bad blocks 
> are reconstructed at the same time, and Client can reads it.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to