[ 
https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058745#comment-16058745
 ] 

Siddharth Ahuja commented on FLUME-2905:
----------------------------------------

Hi [~sati], thanks a lot for your review.

Please find my answers for your points:

* For point 1. - "calling stop() after writing out the exception", I have moved 
stop() after logging the exception but just before it gets thrown.
* For point 2. - We should possibly have a dedicated JIRA for removing the 
"return" statement from the stop() method as this would be a different issue to 
what I am trying to fix in this JIRA which is to prevent socket leaks if a port 
is already bound. Also, it would make tracking easier with a new JIRA as 
otherwise any issues (if any) arising from this removal will be discussed in 
this JIRA which is a side-track from the original issue that is already 
potentially resolved. What do you think?
* For point 3 - I believe I have nothing to do here.

I have gone on and created a new patch - FLUME-2905-5.patch for your review.

I will try and add this to review board (haven't done that yet) soon.

Thanks once again.

> NetcatSource - Socket not closed when an exception is encountered during 
> start() leading to file descriptor leaks
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2905
>                 URL: https://issues.apache.org/jira/browse/FLUME-2905
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.6.0
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>         Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, 
> FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch
>
>
> During the flume agent start-up, the flume configuration containing the 
> NetcatSource is parsed and the source's start() is called. If there is an 
> issue while binding the channel's socket to a local address to configure the 
> socket to listen for connections following exception is thrown but the socket 
> open just before is not closed. 
> {code}
> 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: 
> Unable to start EventDrivenSourceRunner: { 
> source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - 
> Exception follows.
> org.apache.flume.FlumeException: java.net.BindException: Address already in 
> use
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
>         at 
> org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind0(Native Method)
>         at sun.nio.ch.Net.bind(Net.java:444)
>         at sun.nio.ch.Net.bind(Net.java:436)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
>         ... 9 more
> {code}
> The source's start() is then called again leading to another socket being 
> opened but not closed and so on. This leads to file descriptor (socket) leaks.
> This can be easily reproduced as follows:
> 1. Set Netcat as the source in flume agent configuration.
> 2. Set the bind port for the netcat source to a port which is already in use. 
> e.g. in my case I used 50010 which is the port for DataNode's XCeiver 
> Protocol in use by the HDFS service.
> 3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice 
> the file descriptors keep on growing due to socket leaks with errors like: 
> "can't identify protocol".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to