[
https://issues.apache.org/jira/browse/ZOOKEEPER-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Hunt updated ZOOKEEPER-2201:
------------------------------------
Assignee: Donny Nadolny
> Network issues can cause cluster to hang due to near-deadlock
> -------------------------------------------------------------
>
> Key: ZOOKEEPER-2201
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2201
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.6
> Reporter: Donny Nadolny
> Assignee: Donny Nadolny
> Priority: Critical
> Attachments: ZOOKEEPER-2201.patch
>
>
> {{DataTree.serializeNode}} synchronizes on the {{DataNode}} it is about to
> serialize then writes it out via {{OutputArchive.writeRecord}}, potentially
> to a network connection. Under default linux TCP settings, a network
> connection where the other side completely disappears will hang (blocking on
> the {{java.net.SocketOutputStream.socketWrite0}} call) for over 15 minutes.
> During this time, any attempt to create/delete/modify the {{DataNode}} will
> cause the leader to hang at the beginning of the request processor chain:
> {noformat}
> "ProcessThread(sid:5 cport:-1):" prio=10 tid=0x00000000026f1800 nid=0x379c
> waiting for monitor entry [0x00007fe6c2a8c000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.zookeeper.server.PrepRequestProcessor.getRecordForPath(PrepRequestProcessor.java:163)
> - waiting to lock <0x00000000d4cd9e28> (a
> org.apache.zookeeper.server.DataNode)
> - locked <0x00000000d2ef81d0> (a java.util.ArrayList)
> at
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:345)
> at
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:534)
> at
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:131)
> {noformat}
> Additionally, any attempt to send a snapshot to a follower or to disk will
> hang.
> Because the ping packets are sent by another thread which is unaffected,
> followers never time out and become leader, even though the cluster will make
> no progress until either the leader is killed or the TCP connection times
> out. This isn't exactly a deadlock since it will resolve itself eventually,
> but as mentioned above this will take > 15 minutes with the default TCP retry
> settings in linux.
> A simple solution to this is: in {{DataTree.serializeNode}} we can take a
> copy of the contents of the {{DataNode}} (as is done with its children) in
> the synchronized block, then call {{writeRecord}} with the copy of the
> {{DataNode}} outside of the original {{DataNode}} synchronized block.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)