[ 
https://issues.apache.org/jira/browse/HDFS-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDFS-12639:
-------------------------------------

    Assignee: Hanisha Koneru

> BPOfferService lock may stall all service actors
> ------------------------------------------------
>
>                 Key: HDFS-12639
>                 URL: https://issues.apache.org/jira/browse/HDFS-12639
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
>            Assignee: Hanisha Koneru
>
> {{BPOfferService}} manages {{BPServiceActor}} instances for the active and 
> standby.  It uses a RW lock to primarily protect registration information 
> while determining the active/standby from heartbeats.
> Unfortunately the write lock is held during command processing.  If an actor 
> is experiencing high latency processing commands, the other actor will 
> neither be able to register (blocked in createRegistration, setNamespaceInfo, 
> verifyAndSetNamespaceInfo) nor process heartbeats (blocked in 
> updateActorStatesFromHeartbeat).
> The worst case scenario for processing commands while holding the lock is 
> re-registration.  The actor will loop, catching and logging exceptions, 
> leaving the other actor blocked for an non-deterministic (possibly infinite) 
> amount of time.
> The lock must not be held during command processing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to