[ 
https://issues.apache.org/jira/browse/HDFS-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202063#comment-16202063
 ] 

Daryn Sharp commented on HDFS-12645:
------------------------------------

I understand the lifeline protocol was designed to avoid the node being 
declared dead, but it's just hiding the consequences of a poor locking design.  
Preventing a dead node via a lifeline is of dubious value when the node is 
effectively dead due to blocked IO in the dataset lock.  The node can't process 
replications which may lead to data loss when another node could have serviced 
the replication request.  Ex.  The lifeline will keep a node "alive" even 
though it's having severe hw issues and ultimately crashed.

> FSDatasetImpl lock will stall BP service actors and may cause missing blocks
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-12645
>                 URL: https://issues.apache.org/jira/browse/HDFS-12645
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>
> The DN is extremely susceptible to a slow volume due bad locking practices.  
> DN operations require a fs dataset lock.  IO in the dataset lock should not 
> be permissible as it leads to severe performance degradation and possibly 
> (temporarily) missing blocks.
> A slow disk will cause pipelines to experience significant latency and 
> timeouts, increasing lock/io contention while cleaning up, leading to more 
> timeouts, etc.  Meanwhile, the actor service thread is interleaving multiple 
> lock acquire/releases with xceivers.  If many commands are issued, the node 
> may be incorrectly declared as dead.
> HDFS-12639 documents that both actors synchronize on the offer service lock 
> while processing commands.  A backlogged active actor will block the standby 
> actor and cause it to go dead too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to