[ 
https://issues.apache.org/jira/browse/FLUME-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated FLUME-768:
----------------------------------

    Attachment: Flume-768.patch

Attaching the patch with approved review changes
                
> Agent deadlock possible due to blocked latch in driver thread.
> --------------------------------------------------------------
>
>                 Key: FLUME-768
>                 URL: https://issues.apache.org/jira/browse/FLUME-768
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>         Attachments: Flume-768.patch, Flume-768.patch
>
>
> There are three threads essentially blocked. 2 of the three are blocked 
> because of the 3rd.  
> The main problem is that roll close is blocked attempting for a close to 
> complete.  It has a subordinate thread that seems to be gone normally 
> triggers the latch that allows it to close.  My guess is some exception in 
> that TriggerThread exited and because the latch countdowns aren't present, 
> the ok to shutdown latch never got cleared.
> The other two threads are blocked because this -- and likely wouldn't get 
> stuck here if that intermediate threads wasn't stuck.
> The agent's avro source queue is full and it is blocked trying to enqueue 
> more data.
> There is also another thread that is blocked -- it is wal draining thread is 
> blocked with nothing left to do (why everything is in sent state).  This 
> doesn't seem to be part of the problem.
> Thread 21 (448511246@qtp-1388647956-1):
>   State: WAITING
>   Blocked count: 3
>   Waited count: 29
>   Waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@11031d18
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
>     
> com.cloudera.flume.handlers.avro.AvroEventSource.enqueue(AvroEventSource.java:114)
>     
> com.cloudera.flume.handlers.avro.AvroEventSource$1.append(AvroEventSource.java:135)
>     sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>     
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     java.lang.reflect.Method.invoke(Method.java:597)
>     
> org.apache.avro.specific.SpecificResponder.respond(SpecificResponder.java:93)
>     org.apache.avro.ipc.Responder.respond(Responder.java:136)
>     org.apache.avro.ipc.Responder.respond(Responder.java:88)
>     org.apache.avro.ipc.ResponderServlet.doPost(ResponderServlet.java:48)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>     org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>     org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
>     org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>     org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>     org.mortbay.jetty.Server.handle(Server.java:326)
> Here's another thread that is essentially blocked:
> Thread 19 (logicalNode agent-19):
>   State: WAITING
>   Blocked count: 83
>   Waited count: 1143043
>   Waiting on java.util.concurrent.CountDownLatch$Sync@5c328896
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>     java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>     com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:213)
>     
> com.cloudera.flume.agent.durability.NaiveFileWALDeco.close(NaiveFileWALDeco.java:147)
>     com.cloudera.flume.agent.AgentSink.close(AgentSink.java:118)
>     
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>     
> com.cloudera.flume.handlers.debug.LazyOpenDecorator.close(LazyOpenDecorator.java:81)
>     
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:121)
> Here's the wal draining thread trying to pull things out of the wal.
> Thread 24 (naive file wal transmit-24):
>   State: TIMED_WAITING
>   Blocked count: 156
>   Waited count: 171352
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>     
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:424)
>     
> com.cloudera.flume.agent.durability.NaiveFileWALManager.getUnackedSource(NaiveFileWALManager.java:763)
>     com.cloudera.flume.agent.durability.WALSource.next(WALSource.java:104)
>     
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:91

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to