[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208414#comment-14208414 ] Nina Safonova commented on FLUME-2307: -- Flume 1.5.0-cdh5.1.0 Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git Revision: 14be91ec816bac5a91c321b9e8620ffb04acf04c Compiled by jenkins on Sat Jul 12 09:17:48 PDT 2014 >From source with checksum bf4451b17198a612fea60ad6f5420bbc > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) > at > org.jboss.netty.handler.codec.frame.Fr
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207648#comment-14207648 ] Nina Safonova commented on FLUME-2307: -- I didn't delete any files before I start to experience this issue. After I ran out of disk space I tried to restart the agent. When it didn't help to clean old files I stopped the agent and deleted all the files manually. > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207643#comment-14207643 ] Nina Safonova commented on FLUME-2307: -- I didn't try to delete checkpoint files. To keep processing data I deleted all the files (data and checkpoints) after I stopped the agent. > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageR
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207623#comment-14207623 ] Nina Safonova commented on FLUME-2307: -- OS is CentOS 6.5, flume version is 1.5 as I mentioned above, all the log I also posted above. I waited log enough and no old files were deleted. This doesn't work that way all the time, from the start to some random moment it's working as expected and cleans old files, but at some point it just stops to do this and at some later point disk ran out of space. > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202342#comment-14202342 ] Nina Safonova commented on FLUME-2307: -- It is still occurring. > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(Si
[jira] [Comment Edited] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096325#comment-14096325 ] Nina Safonova edited comment on FLUME-2307 at 8/14/14 12:01 AM: Unfortunately I already restarted flume. But channel was operating normally, sinks were reading from it, sources were writing to it. After restart no old logs was deleted so I did it manually. Here is the log of restart with channel1 related info: {code} 12 Aug 2014 21:36:38,022 INFO [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating instance of channel channel1 type FILE 12 Aug 2014 21:36:38,035 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) - Created channel channel1 12 Aug 2014 21:36:38,205 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:119) - Channel channel1 connected to [es-sink1, es-sink4, es-sink3, es-sink2] 12 Aug 2014 21:36:38,221 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:139) - Starting new configuration:{ sourceRunners:{} sinkRunners:{google-BQ-perf-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6eb5845c counterGroup:{ name:null counters:{} } }, es-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4f04eccc counterGroup:{ name:null counters:{} } }, es-sink4=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4c566d9b counterGroup:{ name:null counters:{} } }, es-sink3=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3e360244 counterGroup:{ name:null counters:{} } }, es-sink2=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4bcede44 counterGroup:{ name:null counters:{} } }, google-BQ-sink4=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@7a62693d counterGroup:{ name:null counters:{} } }, google-BQ-sink3=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@52eb6290 counterGroup:{ name:null counters:{} } }, google-BQ-sink2=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5b940677 counterGroup:{ name:null counters:{} } }, google-BQ-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@53349d99 counterGroup:{ name:null counters:{} } }} channels:{channel1=FileChannel channel1 { dataDirs: [/local/flume-ng/data/channel1] }, channel2=FileChannel channel2 { dataDirs: [/local/flume-ng/data/channel2] }, channel3=org.apache.flume.channel.MemoryChannel{name: channel3}} } 12 Aug 2014 21:36:38,222 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:146) - Starting Channel channel1 12 Aug 2014 21:36:38,222 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:259) - Starting FileChannel channel1 { dataDirs: [/local/flume-ng/data/channel1] }... 12 Aug 2014 21:36:38,270 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.Log.replay:385) - Found NextFileID 714, from [/local/flume-ng/data/channel1/log-691, /local/flume-ng/data/channel1/log-707, /local/flume-ng/data/channel1/log-696, /local/flume-ng/data/channel1/log-697, /local/flume-ng/data/channel1/log-702, /local/flume-ng/data/channel1/log-710, /local/flume-ng/data/channel1/log-704, /local/flume-ng/data/channel1/log-711, /local/flume-ng/data/channel1/log-698, /local/flume-ng/data/channel1/log-686, /local/flume-ng/data/channel1/log-688, /local/flume-ng/data/channel1/log-706, /local/flume-ng/data/channel1/log-712, /local/flume-ng/data/channel1/log-705, /local/flume-ng/data/channel1/log-714, /local/flume-ng/data/channel1/log-713, /local/flume-ng/data/channel1/log-700, /local/flume-ng/data/channel1/log-689, /local/flume-ng/data/channel1/log-687, /local/flume-ng/data/channel1/log-690, /local/flume-ng/data/channel1/log-708, /local/flume-ng/data/channel1/log-701, /local/flume-ng/data/channel1/log-703, /local/flume-ng/data/channel1/log-692, /local/flume-ng/data/channel1/log-694, /local/flume-ng/data/channel1/log-709, /local/flume-ng/data/channel1/log-695, /local/flume-ng/data/channel1/log-693, /local/flume-ng/data/channel1/log-699] 12 Aug 2014 21:36:38,288 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.:53) - Starting up with /local/flume-ng/checkpoints/channel1/checkpoint and /local/flume-ng/checkpoints/channel1/checkpoint.meta 12 Aug 2014 21:36:38,289 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.:57) - Reading checkpoint metadata from /local/flume-ng/checkpoints/channel1/checkpoint.meta 12 Aug 2014 21:36:38,723 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:249) - Starting replay of [/local/flume-ng/data/channel1/log-686, /local/flume-ng/data/channel1/log-687, /local/flume-ng/data/channel1
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096325#comment-14096325 ] Nina Safonova commented on FLUME-2307: -- Unfortunately I already restarted flume. But channel was operating normally, sinks were reading from it, sources were writing to it. After restart no old logs was deleted so I did it manually. Here is the log of restart with channel1 related info: 12 Aug 2014 21:36:38,022 INFO [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating instance of channel channel1 type FILE 12 Aug 2014 21:36:38,035 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) - Created channel channel1 12 Aug 2014 21:36:38,205 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:119) - Channel channel1 connected to [es-sink1, es-sink4, es-sink3, es-sink2] 12 Aug 2014 21:36:38,221 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:139) - Starting new configuration:{ sourceRunners:{} sinkRunners:{google-BQ-perf-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6eb5845c counterGroup:{ name:null counters:{} } }, es-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4f04eccc counterGroup:{ name:null counters:{} } }, es-sink4=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4c566d9b counterGroup:{ name:null counters:{} } }, es-sink3=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3e360244 counterGroup:{ name:null counters:{} } }, es-sink2=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4bcede44 counterGroup:{ name:null counters:{} } }, google-BQ-sink4=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@7a62693d counterGroup:{ name:null counters:{} } }, google-BQ-sink3=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@52eb6290 counterGroup:{ name:null counters:{} } }, google-BQ-sink2=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5b940677 counterGroup:{ name:null counters:{} } }, google-BQ-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@53349d99 counterGroup:{ name:null counters:{} } }} channels:{channel1=FileChannel channel1 { dataDirs: [/local/flume-ng/data/channel1] }, channel2=FileChannel channel2 { dataDirs: [/local/flume-ng/data/channel2] }, channel3=org.apache.flume.channel.MemoryChannel{name: channel3}} } 12 Aug 2014 21:36:38,222 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:146) - Starting Channel channel1 12 Aug 2014 21:36:38,222 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:259) - Starting FileChannel channel1 { dataDirs: [/local/flume-ng/data/channel1] }... 12 Aug 2014 21:36:38,270 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.Log.replay:385) - Found NextFileID 714, from [/local/flume-ng/data/channel1/log-691, /local/flume-ng/data/channel1/log-707, /local/flume-ng/data/channel1/log-696, /local/flume-ng/data/channel1/log-697, /local/flume-ng/data/channel1/log-702, /local/flume-ng/data/channel1/log-710, /local/flume-ng/data/channel1/log-704, /local/flume-ng/data/channel1/log-711, /local/flume-ng/data/channel1/log-698, /local/flume-ng/data/channel1/log-686, /local/flume-ng/data/channel1/log-688, /local/flume-ng/data/channel1/log-706, /local/flume-ng/data/channel1/log-712, /local/flume-ng/data/channel1/log-705, /local/flume-ng/data/channel1/log-714, /local/flume-ng/data/channel1/log-713, /local/flume-ng/data/channel1/log-700, /local/flume-ng/data/channel1/log-689, /local/flume-ng/data/channel1/log-687, /local/flume-ng/data/channel1/log-690, /local/flume-ng/data/channel1/log-708, /local/flume-ng/data/channel1/log-701, /local/flume-ng/data/channel1/log-703, /local/flume-ng/data/channel1/log-692, /local/flume-ng/data/channel1/log-694, /local/flume-ng/data/channel1/log-709, /local/flume-ng/data/channel1/log-695, /local/flume-ng/data/channel1/log-693, /local/flume-ng/data/channel1/log-699] 12 Aug 2014 21:36:38,288 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.:53) - Starting up with /local/flume-ng/checkpoints/channel1/checkpoint and /local/flume-ng/checkpoints/channel1/checkpoint.meta 12 Aug 2014 21:36:38,289 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.:57) - Reading checkpoint metadata from /local/flume-ng/checkpoints/channel1/checkpoint.meta 12 Aug 2014 21:36:38,723 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:249) - Starting replay of [/local/flume-ng/data/channel1/log-686, /local/flume-ng/data/channel1/log-687, /local/flume-ng/data/channel1/log-688, /local/flume-ng/data/channel1/log-689, /local/
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095973#comment-14095973 ] Nina Safonova commented on FLUME-2307: -- Hi guys, recently we migrated to flume 1.5 and we are experiencing the similar issue: at some point flume stopped to remove old files, but it was creating a new ones so at the end we ran out of disk space. I log I see many messages for the same log-686 (it's exactly the one at which flume stopped to remove old logs: 12 Aug 2014 05:24:49,956 INFO [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.LogFile$RandomReader.close:504) - Closing RandomReader /local/flume-ng/data/channel1/log-686 I see no "Removing old file: /local/flume-ng/data/channel1/log-686" message while for all the previous (and removed) logs I see: 12 Aug 2014 05:09:49,607 INFO [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log.removeOldLogs:1060) - Removing old file: /local/flume-ng/data/channel1/log-685 12 Aug 2014 05:09:49,715 INFO [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log.removeOldLogs:1060) - Removing old file: /local/flume-ng/data/channel1/log-685.meta Our configuration is: tracer.channels.channel1.type = FILE tracer.channels.channel1.checkpointDir = /local/flume-ng/checkpoints/channel1 tracer.channels.channel1.dataDirs = /local/flume-ng/data/channel1 tracer.channels.channel1.transactionCapacity = 5000 tracer.channels.channel1.checkpointInterval = 10 tracer.channels.channel1.maxFileSize = 2097152000 tracer.channels.channel1.capacity = 1600 tracer.channels.channel1.write-timeout = 60 This happend twice during last 2 days. I wa able to debug it once and find out that in org.apache.flume.channel.file.Log.removeOldLogs(SortedSet fileIDs) for fileIDs passed just log-686 and the latest file (which is increasing). Nothing in pendingDeletes. 26 entries in idLogFileMap, but none of them is deleted because minFileID is 686 and other files have greater ID. Why this is happening? Thanks > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$F