[ 
https://issues.apache.org/jira/browse/FLUME-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117902#comment-13117902
 ] 

[email protected] commented on FLUME-768:
-----------------------------------------------------



bq.  On 2011-09-30 05:54:25, jmhsieh wrote:
bq.  > Prasad,
bq.  > 
bq.  > I'm testing the full test suite on a dedicated machine currently but 
code looks good.  
bq.  > 
bq.  > Can you address some of the nits? 
bq.  > 
bq.  > Thanks,
bq.  > Jon.
bq.  > 
bq.  >

Also, there are some minor spacing nits.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2107/#review2207
-----------------------------------------------------------


On 2011-09-29 03:22:14, Prasad Mujumdar wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2107/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 03:22:14)
bq.  
bq.  
bq.  Review request for jmhsieh.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  It looks like that the Trigger Thread is aborted due to some unexpected 
error. If its killed for any reason other interrupt, then the it doesn't clear 
the doneLatch which leaves the pumper thread waiting forever. 
bq.  The patch is simply to clear that latch on exit in all cases.
bq.  
bq.  
bq.  This addresses bug Flume-768.
bq.      https://issues.apache.org/jira/browse/Flume-768
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 
32a3411 
bq.    
flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 
1fd788f 
bq.  
bq.  Diff: https://reviews.apache.org/r/2107/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  New unit test
bq.  Full unit test run
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prasad
bq.  
bq.


                
> Agent deadlock possible due to blocked latch in driver thread.
> --------------------------------------------------------------
>
>                 Key: FLUME-768
>                 URL: https://issues.apache.org/jira/browse/FLUME-768
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v0.9.4
>            Reporter: Jonathan Hsieh
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>         Attachments: Flume-768.patch
>
>
> There are three threads essentially blocked. 2 of the three are blocked 
> because of the 3rd.  
> The main problem is that roll close is blocked attempting for a close to 
> complete.  It has a subordinate thread that seems to be gone normally 
> triggers the latch that allows it to close.  My guess is some exception in 
> that TriggerThread exited and because the latch countdowns aren't present, 
> the ok to shutdown latch never got cleared.
> The other two threads are blocked because this -- and likely wouldn't get 
> stuck here if that intermediate threads wasn't stuck.
> The agent's avro source queue is full and it is blocked trying to enqueue 
> more data.
> There is also another thread that is blocked -- it is wal draining thread is 
> blocked with nothing left to do (why everything is in sent state).  This 
> doesn't seem to be part of the problem.
> Thread 21 (448511246@qtp-1388647956-1):
>   State: WAITING
>   Blocked count: 3
>   Waited count: 29
>   Waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@11031d18
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
>     
> com.cloudera.flume.handlers.avro.AvroEventSource.enqueue(AvroEventSource.java:114)
>     
> com.cloudera.flume.handlers.avro.AvroEventSource$1.append(AvroEventSource.java:135)
>     sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>     
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     java.lang.reflect.Method.invoke(Method.java:597)
>     
> org.apache.avro.specific.SpecificResponder.respond(SpecificResponder.java:93)
>     org.apache.avro.ipc.Responder.respond(Responder.java:136)
>     org.apache.avro.ipc.Responder.respond(Responder.java:88)
>     org.apache.avro.ipc.ResponderServlet.doPost(ResponderServlet.java:48)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>     org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>     org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
>     org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>     org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>     org.mortbay.jetty.Server.handle(Server.java:326)
> Here's another thread that is essentially blocked:
> Thread 19 (logicalNode agent-19):
>   State: WAITING
>   Blocked count: 83
>   Waited count: 1143043
>   Waiting on java.util.concurrent.CountDownLatch$Sync@5c328896
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>     java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>     com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:213)
>     
> com.cloudera.flume.agent.durability.NaiveFileWALDeco.close(NaiveFileWALDeco.java:147)
>     com.cloudera.flume.agent.AgentSink.close(AgentSink.java:118)
>     
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>     
> com.cloudera.flume.handlers.debug.LazyOpenDecorator.close(LazyOpenDecorator.java:81)
>     
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:121)
> Here's the wal draining thread trying to pull things out of the wal.
> Thread 24 (naive file wal transmit-24):
>   State: TIMED_WAITING
>   Blocked count: 156
>   Waited count: 171352
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
>     
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>     
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:424)
>     
> com.cloudera.flume.agent.durability.NaiveFileWALManager.getUnackedSource(NaiveFileWALManager.java:763)
>     com.cloudera.flume.agent.durability.WALSource.next(WALSource.java:104)
>     
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:91

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to