----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31341/#review73795 -----------------------------------------------------------
Ship it! Ship It! - Tom Beerbower On Feb. 24, 2015, 5:36 a.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31341/ > ----------------------------------------------------------- > > (Updated Feb. 24, 2015, 5:36 a.m.) > > > Review request for Ambari, Nate Cole and Tom Beerbower. > > > Bugs: AMBARI-9761 > https://issues.apache.org/jira/browse/AMBARI-9761 > > > Repository: ambari > > > Description > ------- > > Another case of misunderstanding how locks work. > > During provisioning of a cluster with at least 200 hosts, Ambari Server > becomes unresponsive. Based on the thread dump, there exists a deadlock > between: > - Cluster readers > - Cluster writers > - ServiceComponentHost writers > > qtp626652285-97 ClusterImpl.convertToResponse() (cluster readLock) > qtp1282624353-47 ServiceComponentHostImpl.setRestartRequired() (sch > writeLock) > qtp626652285-97 ServiceComponentHostImpl.getMaintenanceState() (sch > readLock BLOCKED by qtp1282624353-47) > qtp1282624353-60 ClusterImpl.recalculateClusterVersionState() (cluster > writeLock BLOCKED by qtp626652285-97) > qtp1282624353-47 ServiceComponentHostImpl.isPersisted() (cluster readLock > BLOCKED by qtp1282624353-60) > > The underlying problem is that a writeLock.lock() is parked which causes all > subsequent readLock.lock() requests to also park. This includes the request > from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, > is blocking qtp626652285-97 (the original cluster readLock reader which > blocks the cluster write) > > Long story short is that I think we need to revisit locks again after 2.0.0; > I just don't see a need for locking on reads in most places - that's what the > database is doing for us. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java > 117526c > ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java > 0de62ea > > ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java > c43044c > > ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java > 96a1443 > > Diff: https://reviews.apache.org/r/31341/diff/ > > > Testing > ------- > > Reproduced the deadlock in a unit test first, and then verified the deadlock > does not occur anymore in the test after applying the patch. > > > Thanks, > > Jonathan Hurley > >
