[
https://issues.apache.org/jira/browse/FLUME-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879129#comment-15879129
]
Andras Beni commented on FLUME-3056:
------------------------------------
The issue was discovered while running tests with maven. JUnit ran test class
TestApplication for exactly 15 minutes which is the default timeout for forked
processes, and did not report on test results.
I was able to reproduce the problem (it is to be observed once every few
hundred runs) and made the thread dump of which a part is visible in this
issues description. The test method TestApplication#testFlume1854() starts an
application and stops it almost immediately. The application started contains a
PollingPropertiesFileConfigurationProvider. This configuration provider starts
a separate thread in its start() method, that polls the configuration file
every 30 seconds. When it find out, that the file has changed (and on first
run), it notifies the application through EventBus, that it has to reconfigure.
In the case of the test, the application is started, which starts the
configuration provider and its poller thread. By the time the thread tries to
send the configuration event to the Application object, it is already stopping,
so (method stop() being synchronized ) it locks on itself. But the poller
thread is also trying to lock on the application by calling
handleConfigurationEvent(...). As Application.stop() waits for the poller
thread to terminate and the poller thread waits for Application.stop() to
release the lock, we get in a deadlock.
In normal usage it is possible, that a user edits configuration and stop or
restarts the application. If the shutdown hook runs at the same time as the
poller notices the change, the application will get into the same deadlock, and
the user will not be able to shutdown flume properly.
> TestApplication hangs indefinitely
> ----------------------------------
>
> Key: FLUME-3056
> URL: https://issues.apache.org/jira/browse/FLUME-3056
> Project: Flume
> Issue Type: Bug
> Components: Configuration, Test
> Affects Versions: v1.7.0
> Reporter: Andras Beni
> Priority: Minor
>
> Unit test hangs indefinitely when TestApplication.testFLUME1854() becomes
> blocked in the following, deadlock-like situation:
> Application waits for PollingPropertiesFileConfigurationProvider to stop
> while PollingPropertiesFileConfigurationProvider tries to notify Application
> of configuration change.
> {noformat}
> "conf-file-poller-0" #17750 prio=5 os_prio=31 tid=0x00007fdeb7972000
> nid=0x638f waiting for monitor entry [0x000070000eb36000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.flume.node.Application.handleConfigurationEvent(Application.java:87)
> - waiting to lock <0x000000077a130178> (a
> org.apache.flume.node.Application)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:68)
> at
> com.google.common.eventbus.SynchronizedEventHandler.handleEvent(SynchronizedEventHandler.java:45)
> - locked <0x000000077a1301b0> (a
> com.google.common.eventbus.SynchronizedEventHandler)
> at com.google.common.eventbus.EventBus.dispatch(EventBus.java:313)
> at
> com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
> at com.google.common.eventbus.EventBus.post(EventBus.java:264)
> at
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - <0x000000077a130250> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "main" #1 prio=5 os_prio=31 tid=0x00007fdeb9000000 nid=0x1b03 waiting on
> condition [0x000070000d779000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006c253e9f8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
> at
> java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
> at
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop(PollingPropertiesFileConfigurationProvider.java:88)
> at
> org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:104)
> - locked <0x000000077a138158> (a
> org.apache.flume.lifecycle.LifecycleSupervisor)
> at org.apache.flume.node.Application.stop(Application.java:92)
> - locked <0x000000077a130178> (a org.apache.flume.node.Application)
> at
> org.apache.flume.node.TestApplication.testFLUME1854(TestApplication.java:155)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)