[
https://issues.apache.org/jira/browse/FLUME-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267338#comment-13267338
]
Will McQueen edited comment on FLUME-1175 at 5/3/12 10:48 AM:
--------------------------------------------------------------
I'm not sure, but what I think may be happening is:
1) Reconfig event occurs
2) RollingFileSink finishes executing its process() method, or maybe the sink
runner thread that's calling RollingFileSink.process gets interrupted somewhere
in the middle of the process() call. Either way, I believe in these cases that
it's possible that the process() method will return while the outputStream
field still has a non-null value pointing to some BufferedOutputStreamObject
3) As part of the reconfig steps, the sink's configure() is called next. And
configure() sets the 'directory' field, but outputStream field is still set to
the old value.
4) Next, stop() is called (again as part of reconfig steps). outputStream
field's object is flushed and closed, but not nulled-out afterwards.
5) Next, start() is called. Reconfig steps are done.
6) Next, sink runner calls process(), which checks if outputStream != null
(which is true, since outputStream is pointing to the old, closed
BufferedOutputStream).
So maybe one possible fix could be to insert a line in stop() to null-out the
outputStream field (and also null-out the serializer field while we're at
it)... something like this:
{code}
if (serializer != null) {
try {
serializer.flush();
serializer.beforeClose();
} catch (IOException e) {
logger.error("Unable to cleanup serializer. Exception follows.", e);
} finally {
serializer = null;
}
}
if (outputStream != null) {
logger.debug("Closing file {}", pathController.getCurrentFile());
try {
outputStream.flush();
outputStream.close();
} catch (IOException e) {
logger.error("Unable to close outputStream. Exception follows.", e);
} finally {
outputStream = null;
}
}
{code}
was (Author: [email protected]):
I'm not sure, but what I think may be happening is:
1) Reconfig event occurs
2) RollingFileSink finishes executing its process() method, or maybe the sink
runner thread that's calling RollingFileSink.process gets interrupted somewhere
in the middle of the process() call. Either way, I believe in these cases that
it's possible that the process() method will return while the outputStream
field still has a non-null value pointing to some BufferedOutputStreamObject
3) As part of the reconfig steps, the sink's configure() is called next. And
configure() sets the 'directory' field, but outputStream field is still set to
the old value.
4) Next, stop() is called (again as part of reconfig steps). outputStream
field's object is flushed and closed, but not nulled-out afterwards.
5) Next, start() is called. Reconfig steps are done.
6) Next, sink runner calls process(), which checks if outputStream != null
(which is true, since outputStream is pointing to the old, closed
BufferedOutputStream).
So maybe one possible fix could be to insert a line in stop() to null-out the
outputStream field... something like this:
{code}
if (serializer != null) {
try {
serializer.flush();
serializer.beforeClose();
} catch (IOException e) {
logger.error("Unable to cleanup serializer. Exception follows.", e);
} finally {
serializer = null;
}
}
if (outputStream != null) {
logger.debug("Closing file {}", pathController.getCurrentFile());
try {
outputStream.flush();
outputStream.close();
} catch (IOException e) {
logger.error("Unable to close outputStream. Exception follows.", e);
} finally {
outputStream = null;
}
}
{code}
> RollingFileSink complains of Bad File Descriptor upon a reconfig event
> ----------------------------------------------------------------------
>
> Key: FLUME-1175
> URL: https://issues.apache.org/jira/browse/FLUME-1175
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.2.0
> Environment: CentOS 6.2 64-bit
> Reporter: Will McQueen
> Fix For: v1.2.0
>
>
> Steps:
> 1) Create a config file that looks something like this:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = MEMORY
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = SEQ
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = FILE_ROLL
> agent.sinks.k1.sink.directory = /var/log/flume-ng
> agent.sinks.k1.sink.rollInterval = 0
> 2) Start the Flume NG agent
> 3) touch the config file so that a reconfig event is triggered within 30 secs
> 4) tail the output file to observer the sequence generator events:
> tail -f /var/log/flume-ng/XXXXXXXXXXXX
> 5) Notice that the flow suddenly stops at the reconfig event (within 30 secs
> after touching the config file). Flow doesn't continue. The flume log shows a
> Bad File Descriptor error for the RollingFileSink:
> 2012-05-03 01:34:34,806 (SinkRunner-PollingRunner-DefaultSinkProcessor)
> [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
> Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to process event: [Event
> headers = {timestamp=1336034074797, nanos=3762297996593382, pri=INFO,
> host=<mysupersecrethost>, FlumeOG=yes, execcmd=java.nio.HeapByteBuffer[pos=0
> lim=24 cap=24], procsource=java.nio.HeapByteBuffer[pos=0 lim=6 cap=6],
> service=java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]}, body.length = 26 ]
> at
> org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:201)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Bad file descriptor
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:282)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at
> org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:193)
> ... 3 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira