[
https://issues.apache.org/jira/browse/FLUME-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yan Jian updated FLUME-2786:
----------------------------
Attachment: flume-2786-v1.6.0.patch
This bug also occured in our production environment.
It can lead a nested monitor lockout between thread _agent-shutdown-hook_ and
_conf-file-poller_, details as below:
# _agent-shutdown-hook_ acquired {{application}} lock and tried to stop the
{{executeService}} ??a {{ThreadPoolExecutor}} instance??.
# _conf-file-poller_ is scheduled to running in the {{executeService}}'s pool,
preventing the {{executeService}} from being stopped.
# _conf-file-poller_ waits for {{application}} lock which was held by
_agent-shutdown-hook_.
In our solution, {{synchronized}} is upgraded to {{ReentrantLock}}, and
_conf-file-poller_ watches {{beingStopped}} condition with a 500ms interval
when trying to acquire {{application}} lock.
Our solution based on 1.6.0 is shared as +flume-2786-v1.6.0.patch+.
> It will enter a deadlock state when modify the conf file before I stop
> flume-ng
> --------------------------------------------------------------------------------
>
> Key: FLUME-2786
> URL: https://issues.apache.org/jira/browse/FLUME-2786
> Project: Flume
> Issue Type: Bug
> Components: Master
> Affects Versions: v1.6.0
> Reporter: godfrey he
> Priority: Blocker
> Attachments: flume-2786-v1.6.0.patch
>
>
> When modify the conf fileļ¼and then I stop the flume-ng, It will enter a
> deadlock state.
> jstack result:
> "agent-shutdown-hook" prio=10 tid=0x00007f2e26419800 nid=0x333ae waiting on
> condition [0x0000000042c16000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000eaff3df8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
> at
> java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:635)
> at
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop(PollingPropertiesFileConfigurationProvider.java:87)
> at
> org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:106)
> - locked <0x00000000eaf2daa0> (a
> org.apache.flume.lifecycle.LifecycleSupervisor)
> at org.apache.flume.node.Application.stop(Application.java:93)
> - locked <0x00000000eaf3c580> (a org.apache.flume.node.Application)
> at org.apache.flume.node.Application$1.run(Application.java:348)
> "conf-file-poller-0" prio=10 tid=0x00007f2e2e8cd000 nid=0x21819 waiting for
> monitor entry [0x0000000041e3f000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.flume.node.Application.handleConfigurationEvent(Application.java:88)
> - waiting to lock <0x00000000eaf3c580> (a
> org.apache.flume.node.Application)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)