[ https://issues.apache.org/jira/browse/THRIFT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Duxbury closed THRIFT-1493. --------------------------------- Resolution: Not A Problem > Possible infinite loop in TThreadPoolServer > ------------------------------------------- > > Key: THRIFT-1493 > URL: https://issues.apache.org/jira/browse/THRIFT-1493 > Project: Thrift > Issue Type: Bug > Components: Java - Library > Affects Versions: 0.7 > Environment: Debian Squeeze > Reporter: bert Passek > > I just faced a major problem in Thrift in combination with Flume, but the > problem actually could be tracked down to the Thrift library. > I'm using Thrift in a typical client/server environment for tracking tons of > data. We ran into an exception which basically looks like: > 2012-01-11 14:57:30,487 ERROR com.cloudera.flume.core.connector.DirectDriver: > Exiting driver logicalNode newsletterImpressionLog01-21 in error state > ThriftEventSource | CassandraSink because sleep interrupted > 2012-01-11 17:18:14,808 WARN org.apache.thrift.server.TSaneThreadPoolServer: > Transport error occurred during acceptance of message. > org.apache.thrift.transport.TTransportException: java.net.SocketException: > Too many open files > at > org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139) > > at > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) > at > org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175) > > Caused by: java.net.SocketException: Too many open files > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) > at java.net.ServerSocket.implAccept(ServerSocket.java:462) > at java.net.ServerSocket.accept(ServerSocket.java:430) > at > org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134) > > ... 2 more > 2012-01-11 17:18:14,809 WARN org.apache.thrift.server.TSaneThreadPoolServer: > Transport error occurred during acceptance of message. > org.apache.thrift.transport.TTransportException: java.net.SocketException: > Too many open files > at > org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139) > > at > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) > at > org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175) > > Caused by: java.net.SocketException: Too many open files > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) > at java.net.ServerSocket.implAccept(ServerSocket.java:462) > at java.net.ServerSocket.accept(ServerSocket.java:430) > at > org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134) > > ... 2 more > Note: Flume is using their own implementation of TThreadPoolServer which is > literally copied and pasted from original source code from Thrift. Flume > embedded this part of thrift library in a massive multi-threading environment. > I was running out of socket connection indicated by exception "too many open > files". This exception causes an infinite loop in this part of method serve(): > while (!stopped_) { > int failureCount = 0; > try { > TTransport client = serverTransport_.accept(); > WorkerProcess wp = new WorkerProcess(client); > executorService_.execute(wp); > } catch (TTransportException ttx) { > if (!stopped_) { > ++failureCount; > LOGGER.warn("Transport error occurred during acceptance of > message.", ttx); > } > } > } > Furthermore in an overnight process i was running out of disk space because > the logged exceptions were increasing the size of the log file dramatically. > There was no way of recovery. > If there are any critical exceptions the while-loop will never be stopped. > This can only be done by calling stop() method. > The question is how to handle such exceptions as described above in general? > I can't even catch an exception because the exception is just logged but not > handled in any way. So there is no way of reacting for doing some cleanup or > restarting the server for example. > Best Regards > Bert Passek -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira