[ https://issues.apache.org/jira/browse/HDFS-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanisha Koneru reassigned HDFS-12639: ------------------------------------- Assignee: Hanisha Koneru > BPOfferService lock may stall all service actors > ------------------------------------------------ > > Key: HDFS-12639 > URL: https://issues.apache.org/jira/browse/HDFS-12639 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.8.0 > Reporter: Daryn Sharp > Assignee: Hanisha Koneru > > {{BPOfferService}} manages {{BPServiceActor}} instances for the active and > standby. It uses a RW lock to primarily protect registration information > while determining the active/standby from heartbeats. > Unfortunately the write lock is held during command processing. If an actor > is experiencing high latency processing commands, the other actor will > neither be able to register (blocked in createRegistration, setNamespaceInfo, > verifyAndSetNamespaceInfo) nor process heartbeats (blocked in > updateActorStatesFromHeartbeat). > The worst case scenario for processing commands while holding the lock is > re-registration. The actor will loop, catching and logging exceptions, > leaving the other actor blocked for an non-deterministic (possibly infinite) > amount of time. > The lock must not be held during command processing. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org