Frantz Mazoyer created THRIFT-3313:
--------------------------------------

             Summary: Thrift java server hogs 100% CPU and clients are stuck 
                 Key: THRIFT-3313
                 URL: https://issues.apache.org/jira/browse/THRIFT-3313
             Project: Thrift
          Issue Type: Bug
          Components: Java - Library
    Affects Versions: 0.9.2, 0.9.1, 0.9, 0.8, 0.7
         Environment: Storm 0.9.5 (nimbus)
            Reporter: Frantz Mazoyer


Testing environment is Storm 0.9.5 / thrift java 0.7.
Test scenario: 
  Deploy storm topology in loop.
  When nimbus cleanup timeout is reached, an error is thrown by thrift server: 
  "Exception while invoking ..." ... TException

Test result:
  Thrift java server goes 100% CPU in infinite loop in:

jstack:
{code}
"Thread-5" prio=10 tid=0x00007fb134aab800 nid=0x6767 runnable 
[0x00007fb129c9b000]
   java.lang.Thread.State: RUNNABLE
                                      at 
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
                                      at 
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
                                      at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
                                      at 
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
...
at 
org.apache.thrift7.server.TNonblockingServer$SelectThread.select(TNonblockingServer.java:284)
 
{code}

strace:
{code}
epoll_wait(70, {{EPOLLIN, {u32=866, u64=866}}, {EPOLLIN, {u32=876, u64=876}}}, 
4096, 4294967295) = 2
{code}

Investigation and tests show that:
Any Exception thrown during the processor execution will bypass the call to 
{code} responseReady() {code} and will cause the counter {code}       
readBufferBytesAllocated.addAndGet(-buffer_.array().length); {code} not to be 
decremented by the size of the request buffer.

After a bunch of failed requests, this counter almost reaches the max value 
MAX_READ_BUFFER_BYTES causing any subsequent request to be delayed forever 
because the following test in {code} read() {code}:
{code}           if (readBufferBytesAllocated.get() + frameSize > 
MAX_READ_BUFFER_BYTES)  {code} is always true.

At the end, the server thread loops in select() which immediately wakes up for 
read() since the content of the socket was never drained.

This loops forever between select and read() method above causing a 100% CPU on 
server thread.
Moreover, all client requests are stuck forever.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to