Siddharth Ahuja created FLUME-2905:
--------------------------------------
Summary: NetcatSource - Socket not closed when an exception is
encountered during start() leading to file descriptor leaks
Key: FLUME-2905
URL: https://issues.apache.org/jira/browse/FLUME-2905
Project: Flume
Issue Type: Bug
Components: Sinks+Sources
Affects Versions: v1.6.0
Reporter: Siddharth Ahuja
During the flume agent start-up, the flume configuration containing the
NetcatSource is parsed and the source's start() is called. If there is an issue
while binding the channel's socket to a local address to configure the socket
to listen for connections following exception is thrown but the socket open
just before is not closed.
2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor:
Unable to start EventDrivenSourceRunner: {
source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } -
Exception follows.
org.apache.flume.FlumeException: java.net.BindException: Address already in use
at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
at
org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
... 9 more
The source's start() is then called again leading to another socket being
opened but not closed and so on. This leads to file descriptor (socket) leaks.
This can be easily reproduced as follows:
1. Set Netcat as the source in flume agent configuration.
2. Set the bind port for the netcat source to a port which is already in use.
e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol
in use by the HDFS service.
3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice
the file descriptors keep on growing due to socket leaks with errors like:
"can't identify protocol".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)