[ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297905#comment-15297905 ]
Siddharth Ahuja commented on FLUME-2905: ---------------------------------------- Thanks [~jarcec], I have found the issue here. It is to do with how I set up my source in TestNetcatSource.java. As I created a new NetcatSource in my test method and did not set up the channelProcessor for it manually it failed for you (but for some reason it worked fine for me in both my eclipse & command line environment). Regardless, I have modified the test so that it re-uses the existing source object with the channelProcessor already set up through the setup() method of the test suite. The point of the test is to ensure that when we start up a source that tries to bind on a port which is already bound we should stop the source. Creating a "new" source in the test suite is not important and we can re-use the already existing source. > NetcatSource - Socket not closed when an exception is encountered during > start() leading to file descriptor leaks > ----------------------------------------------------------------------------------------------------------------- > > Key: FLUME-2905 > URL: https://issues.apache.org/jira/browse/FLUME-2905 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: v1.6.0 > Reporter: Siddharth Ahuja > Assignee: Siddharth Ahuja > Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch > > > During the flume agent start-up, the flume configuration containing the > NetcatSource is parsed and the source's start() is called. If there is an > issue while binding the channel's socket to a local address to configure the > socket to listen for connections following exception is thrown but the socket > open just before is not closed. > {code} > 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: > Unable to start EventDrivenSourceRunner: { > source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - > Exception follows. > org.apache.flume.FlumeException: java.net.BindException: Address already in > use > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173) > at > org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167) > ... 9 more > {code} > The source's start() is then called again leading to another socket being > opened but not closed and so on. This leads to file descriptor (socket) leaks. > This can be easily reproduced as follows: > 1. Set Netcat as the source in flume agent configuration. > 2. Set the bind port for the netcat source to a port which is already in use. > e.g. in my case I used 50010 which is the port for DataNode's XCeiver > Protocol in use by the HDFS service. > 3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice > the file descriptors keep on growing due to socket leaks with errors like: > "can't identify protocol". -- This message was sent by Atlassian JIRA (v6.3.4#6332)