[ 
https://issues.apache.org/jira/browse/FLUME-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879129#comment-15879129
 ] 

Andras Beni edited comment on FLUME-3056 at 2/22/17 8:14 PM:
-------------------------------------------------------------

The issue was discovered while running tests with maven. JUnit ran test class 
TestApplication for exactly 15 minutes which is the default timeout for forked 
processes, and did not report on test results. 
I was able to reproduce the problem (it is to be observed once every few 
hundred runs) and made the thread dump of which a part is visible in this 
issues description. The test method TestApplication.testFlume1854() starts an 
application and stops it almost immediately. The application started contains a 
 PollingPropertiesFileConfigurationProvider. This configuration provider starts 
a separate thread in its start() method, that polls the configuration file 
every 30 seconds. When it find out, that the file has changed (and on first 
run), it notifies the application through EventBus, that it has to reconfigure. 
In the case of the test, the application is started, which starts the 
configuration provider and its poller thread. By the time the thread tries to 
send the configuration event to the Application object, it is already stopping, 
so (method stop() being synchronized ) it locks on itself. But the poller 
thread is also trying to lock on the application by calling 
handleConfigurationEvent(...). As Application.stop() waits for the poller 
thread to terminate and the poller thread waits for Application.stop() to 
release the lock, we get in a deadlock.
In normal usage it is possible, that a user edits configuration and stops or 
restarts the application. If the shutdown hook runs at the same time as the 
poller notices the change, the application will get into the same deadlock, and 
the user will not be able to shutdown flume properly.

The same issue cannot happen with Application.start(), because start does not 
wait for the poller thread, so even if the poller has to wait for start to 
release the lock, it will not have to wait forever.


was (Author: andrasbeni):
The issue was discovered while running tests with maven. JUnit ran test class 
TestApplication for exactly 15 minutes which is the default timeout for forked 
processes, and did not report on test results. 
I was able to reproduce the problem (it is to be observed once every few 
hundred runs) and made the thread dump of which a part is visible in this 
issues description. The test method TestApplication#testFlume1854() starts an 
application and stops it almost immediately. The application started contains a 
 PollingPropertiesFileConfigurationProvider. This configuration provider starts 
a separate thread in its start() method, that polls the configuration file 
every 30 seconds. When it find out, that the file has changed (and on first 
run), it notifies the application through EventBus, that it has to reconfigure. 
In the case of the test, the application is started, which starts the 
configuration provider and its poller thread. By the time the thread tries to 
send the configuration event to the Application object, it is already stopping, 
so (method stop() being synchronized ) it locks on itself. But the poller 
thread is also trying to lock on the application by calling 
handleConfigurationEvent(...). As Application.stop() waits for the poller 
thread to terminate and the poller thread waits for Application.stop() to 
release the lock, we get in a deadlock.
In normal usage it is possible, that a user edits configuration and stop or 
restarts the application. If the shutdown hook runs at the same time as the 
poller notices the change, the application will get into the same deadlock, and 
the user will not be able to shutdown flume properly.

> TestApplication hangs indefinitely
> ----------------------------------
>
>                 Key: FLUME-3056
>                 URL: https://issues.apache.org/jira/browse/FLUME-3056
>             Project: Flume
>          Issue Type: Bug
>          Components: Configuration, Test
>    Affects Versions: v1.7.0
>            Reporter: Andras Beni
>            Priority: Minor
>
> Unit test hangs indefinitely when TestApplication.testFLUME1854() becomes 
> blocked in the following, deadlock-like situation:
> Application waits for PollingPropertiesFileConfigurationProvider to stop 
> while PollingPropertiesFileConfigurationProvider tries to notify Application 
> of configuration change.
> {noformat}
> "conf-file-poller-0" #17750 prio=5 os_prio=31 tid=0x00007fdeb7972000 
> nid=0x638f waiting for monitor entry [0x000070000eb36000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.flume.node.Application.handleConfigurationEvent(Application.java:87)
>       - waiting to lock <0x000000077a130178> (a 
> org.apache.flume.node.Application)
>       at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:68)
>       at 
> com.google.common.eventbus.SynchronizedEventHandler.handleEvent(SynchronizedEventHandler.java:45)
>       - locked <0x000000077a1301b0> (a 
> com.google.common.eventbus.SynchronizedEventHandler)
>       at com.google.common.eventbus.EventBus.dispatch(EventBus.java:313)
>       at 
> com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
>       at com.google.common.eventbus.EventBus.post(EventBus.java:264)
>       at 
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - <0x000000077a130250> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "main" #1 prio=5 os_prio=31 tid=0x00007fdeb9000000 nid=0x1b03 waiting on 
> condition [0x000070000d779000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000006c253e9f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>       at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
>       at 
> java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
>       at 
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop(PollingPropertiesFileConfigurationProvider.java:88)
>       at 
> org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:104)
>       - locked <0x000000077a138158> (a 
> org.apache.flume.lifecycle.LifecycleSupervisor)
>       at org.apache.flume.node.Application.stop(Application.java:92)
>       - locked <0x000000077a130178> (a org.apache.flume.node.Application)
>       at 
> org.apache.flume.node.TestApplication.testFLUME1854(TestApplication.java:155)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to