[ 
https://issues.apache.org/jira/browse/NIFI-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121429#comment-15121429
 ] 

ASF subversion and git services commented on NIFI-1333:
-------------------------------------------------------

Commit f70f7e34470142437f2e4f2c7ef78fa7711858aa in nifi's branch 
refs/heads/master from [~ozhurakousky]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=f70f7e3 ]

NIFI-1333 fixed FlowController shutdown deadlock. put read lock back. This 
closes #148

Signed-off-by: Matt Gilman <[email protected]>


> FlowController fails to shut down gracefully even though there is nothing 
> going on in the flow
> ----------------------------------------------------------------------------------------------
>
>                 Key: NIFI-1333
>                 URL: https://issues.apache.org/jira/browse/NIFI-1333
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 0.4.1
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>            Priority: Trivial
>             Fix For: 0.5.0
>
>
> Basically the following test fails: 
> https://github.com/olegz/nifi/blob/int-test/nifi-integration-tests/src/test/java/org/apache/nifi/test/flowcontroll/FlowControllerTests.java#L50
>  even though there is no compelling reason for it to fail based on what's in 
> the flow.
> Also, the message in logs is confusing . . .
> {code}
> Initiated graceful shutdown of flow controller...waiting up to 10 seconds
> 2015-12-23 15:19:11,977 WARN [main] o.apache.nifi.controller.FlowController 
> Controller hasn't terminated properly.  There exists an uninterruptable 
> thread that will take an indeterminate amount of time to stop.  Might need to 
> kill the program manually.
> {code}
> What actually happens is deadlock during the shutdown.
> Below are the relevant jstack:
> {code}
> java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000007aeb20988> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>       at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
>       at 
> org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1124)
>       at org.apache.nifi.test.s2s.SiteToSiteTests.bar(SiteToSiteTests.java:75)
> . . .
> "Framework Task Thread Thread-1" prio=5 tid=0x00007fc8a2064800 nid=0x6a03 
> waiting on condition [0x0000700001ded000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000007aeb20288> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>       at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>       at 
> org.apache.nifi.controller.FlowController.getRootGroupId(FlowController.java:1262)
>       at 
> org.apache.nifi.controller.tasks.ExpireFlowFiles.run(ExpireFlowFiles.java:54)
> . . .
> "Timer-Driven Process Thread-1" prio=5 tid=0x00007fc8a3146800 nid=0x6c03 
> waiting on condition [0x0000700001ef0000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000007aeb20288> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>       at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>       at 
> org.apache.nifi.controller.FlowController.isClustered(FlowController.java:2984)
>       at 
> org.apache.nifi.controller.FlowController.heartbeat(FlowController.java:3444)
> {code}
> The issue the way I see it is that FlowController's _shutdown_ routine is 
> synchronized under the same lock as most of the FlowController callbacks made 
> by other threads, hence those threads can't be shutdown since they are in 
> dead-lock.
> I don't think there is any reason to synchronize the the shutdown routine 
> since all we are trying to do is shut down the very same threads that are 
> blocking. Removing synchronization resolves the issue.
> Will submit a patch in a few



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to