[
https://issues.apache.org/jira/browse/NIFI-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121429#comment-15121429
]
ASF subversion and git services commented on NIFI-1333:
-------------------------------------------------------
Commit f70f7e34470142437f2e4f2c7ef78fa7711858aa in nifi's branch
refs/heads/master from [~ozhurakousky]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=f70f7e3 ]
NIFI-1333 fixed FlowController shutdown deadlock. put read lock back. This
closes #148
Signed-off-by: Matt Gilman <[email protected]>
> FlowController fails to shut down gracefully even though there is nothing
> going on in the flow
> ----------------------------------------------------------------------------------------------
>
> Key: NIFI-1333
> URL: https://issues.apache.org/jira/browse/NIFI-1333
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 0.4.1
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Priority: Trivial
> Fix For: 0.5.0
>
>
> Basically the following test fails:
> https://github.com/olegz/nifi/blob/int-test/nifi-integration-tests/src/test/java/org/apache/nifi/test/flowcontroll/FlowControllerTests.java#L50
> even though there is no compelling reason for it to fail based on what's in
> the flow.
> Also, the message in logs is confusing . . .
> {code}
> Initiated graceful shutdown of flow controller...waiting up to 10 seconds
> 2015-12-23 15:19:11,977 WARN [main] o.apache.nifi.controller.FlowController
> Controller hasn't terminated properly. There exists an uninterruptable
> thread that will take an indeterminate amount of time to stop. Might need to
> kill the program manually.
> {code}
> What actually happens is deadlock during the shutdown.
> Below are the relevant jstack:
> {code}
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007aeb20988> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
> at
> org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1124)
> at org.apache.nifi.test.s2s.SiteToSiteTests.bar(SiteToSiteTests.java:75)
> . . .
> "Framework Task Thread Thread-1" prio=5 tid=0x00007fc8a2064800 nid=0x6a03
> waiting on condition [0x0000700001ded000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007aeb20288> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> at
> org.apache.nifi.controller.FlowController.getRootGroupId(FlowController.java:1262)
> at
> org.apache.nifi.controller.tasks.ExpireFlowFiles.run(ExpireFlowFiles.java:54)
> . . .
> "Timer-Driven Process Thread-1" prio=5 tid=0x00007fc8a3146800 nid=0x6c03
> waiting on condition [0x0000700001ef0000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007aeb20288> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> at
> org.apache.nifi.controller.FlowController.isClustered(FlowController.java:2984)
> at
> org.apache.nifi.controller.FlowController.heartbeat(FlowController.java:3444)
> {code}
> The issue the way I see it is that FlowController's _shutdown_ routine is
> synchronized under the same lock as most of the FlowController callbacks made
> by other threads, hence those threads can't be shutdown since they are in
> dead-lock.
> I don't think there is any reason to synchronize the the shutdown routine
> since all we are trying to do is shut down the very same threads that are
> blocking. Removing synchronization resolves the issue.
> Will submit a patch in a few
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)