Review Request 18555: HTTP source handler doesn't allow for responses
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18555/ --- Review request for Flume. Bugs: FLUME-2333 https://issues.apache.org/jira/browse/FLUME-2333 Repository: flume-git Description --- Add a bidirectional HTTP handler to operate beside the existing one. This shouldn't break any existing functionality. Diffs - flume-ng-core/src/main/java/org/apache/flume/source/http/BidirectionalHTTPSourceHandler.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/http/HTTPSource.java 115b34f flume-ng-core/src/test/java/org/apache/flume/source/http/TestHTTPSource.java 5b07a6e flume-ng-doc/sphinx/FlumeUserGuide.rst 8390cd2 Diff: https://reviews.apache.org/r/18555/diff/ Testing --- Unit testing. Thanks, Jeremy Karlson
[jira] [Updated] (FLUME-2333) HTTP source handler doesn't allow for responses
[ https://issues.apache.org/jira/browse/FLUME-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Karlson updated FLUME-2333: -- Description: Existing HTTP source handlers recieve events via a HTTPServletRequest. This works, but because the handler doesn't have access to the HTTPServletResponse, there is no ability to return a response. This makes it unsuitable for some sort of protocol that relies on bidirectional communication. My solution: In addition to the existing HTTPSource interface, I've added a BidirectionalHTTPSource interface that is provided the servlet response as a parameter. I've made some changes in the HTTP source allow for both types to co-exist, and my changes shouldn't affect anyone who is already using the existing interface. Also includes minor documentation updates to reflect this. Review: https://reviews.apache.org/r/18555/ was: Existing HTTP source handlers recieve events via a HTTPServletRequest. This works, but because the handler doesn't have access to the HTTPServletResponse, there is no ability to return a response. This makes it unsuitable for some sort of protocol that relies on bidirectional communication. My solution: In addition to the existing HTTPSource interface, I've added a BidirectionalHTTPSource interface that is provided the servlet response as a parameter. I've made some changes in the HTTP source allow for both types to co-exist, and my changes shouldn't affect anyone who is already using the existing interface. Also includes minor documentation updates to reflect this. > HTTP source handler doesn't allow for responses > --- > > Key: FLUME-2333 > URL: https://issues.apache.org/jira/browse/FLUME-2333 > Project: Flume > Issue Type: Improvement >Reporter: Jeremy Karlson >Assignee: Jeremy Karlson > Attachments: FLUME-2333.diff > > > Existing HTTP source handlers recieve events via a HTTPServletRequest. This > works, but because the handler doesn't have access to the > HTTPServletResponse, there is no ability to return a response. This makes it > unsuitable for some sort of protocol that relies on bidirectional > communication. > My solution: In addition to the existing HTTPSource interface, I've added a > BidirectionalHTTPSource interface that is provided the servlet response as a > parameter. I've made some changes in the HTTP source allow for both types to > co-exist, and my changes shouldn't affect anyone who is already using the > existing interface. > Also includes minor documentation updates to reflect this. > Review: https://reviews.apache.org/r/18555/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (FLUME-2333) HTTP source handler doesn't allow for responses
[ https://issues.apache.org/jira/browse/FLUME-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Karlson updated FLUME-2333: -- Attachment: FLUME-2333.diff > HTTP source handler doesn't allow for responses > --- > > Key: FLUME-2333 > URL: https://issues.apache.org/jira/browse/FLUME-2333 > Project: Flume > Issue Type: Improvement >Reporter: Jeremy Karlson >Assignee: Jeremy Karlson > Attachments: FLUME-2333.diff > > > Existing HTTP source handlers recieve events via a HTTPServletRequest. This > works, but because the handler doesn't have access to the > HTTPServletResponse, there is no ability to return a response. This makes it > unsuitable for some sort of protocol that relies on bidirectional > communication. > My solution: In addition to the existing HTTPSource interface, I've added a > BidirectionalHTTPSource interface that is provided the servlet response as a > parameter. I've made some changes in the HTTP source allow for both types to > co-exist, and my changes shouldn't affect anyone who is already using the > existing interface. > Also includes minor documentation updates to reflect this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (FLUME-2333) HTTP source handler doesn't allow for responses
Jeremy Karlson created FLUME-2333: - Summary: HTTP source handler doesn't allow for responses Key: FLUME-2333 URL: https://issues.apache.org/jira/browse/FLUME-2333 Project: Flume Issue Type: Improvement Reporter: Jeremy Karlson Existing HTTP source handlers recieve events via a HTTPServletRequest. This works, but because the handler doesn't have access to the HTTPServletResponse, there is no ability to return a response. This makes it unsuitable for some sort of protocol that relies on bidirectional communication. My solution: In addition to the existing HTTPSource interface, I've added a BidirectionalHTTPSource interface that is provided the servlet response as a parameter. I've made some changes in the HTTP source allow for both types to co-exist, and my changes shouldn't affect anyone who is already using the existing interface. Also includes minor documentation updates to reflect this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
File Channel Exception "Failed to obtain lock for writing to the log.Try increasing the log write timeout value"
Hi, I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs. I am running a 2 tier(agent,collector) Flume Configuration with custom plugins. There are approximately 20 agents (receiving data) and 6 collector flume (writing to HDFS) machines all running independenly. However, I have been facing some File Channel Exceptions on the collector side. The agent appears to be working fine. Error stacktrace: org.apache.flume.ChannelException: Failed to obtain lock for writing to the log. Try increasing the log write timeout value. [channel=c2] at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) . And I keep on getting the same error P.S :This same exception is repated in most of the flume collector machines.But, not at the same duration. There is usually a difference of a couple of hours or more. 1. HDFS sinks are written in the Amazon EC2 cloud instance. 2. datadir and checkpoint dir of file channel in all flume collector instances are mounted to a separate hadoop ebs drive .This makes sure that two separate collectors do not overlap their log and checkpoint dir. There is a symbolic link i.e /usr/lib/flume-ng/datasource --> /hadoop/ebs/mnt-1 3. The Flume works fine for a couple of days and all the agent,collector are initialized properly without exceptions. Questions: 1. Exception "Failed to obtain lock for writing to the log. Try increasing the log write timeout value . [channel=c2]" . According to the documentation, such an exception occurs only if two processes are acceesing the same file/directory. However, each channel is configured separately so No two channels should access the same dir. Hence, this exception does not indicates anything. Please correct me, if im wrong. 2. Also, HDFS.CallTimeout - indicates calling HDFS for open,write operations. If no response within a duration, it timeouts. And , if its timeouts; it closes the File. Please correct me, if im wrong. Also, if there is a way to specify the number of retries before it closes the file? Your inputs/suggestions will be thoroughly appreciated. Regards Kushal Mangtani Software Engineer
Review Request 18544: Hive Streaming sink
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18544/ --- Review request for Flume. Bugs: FLUME-1734 https://issues.apache.org/jira/browse/FLUME-1734 Repository: flume-git Description --- Hive streaming sink. Diffs - flume-ng-doc/sphinx/FlumeUserGuide.rst 8390cd2 flume-ng-sinks/flume-hive-sink/pom.xml PRE-CREATION flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java PRE-CREATION flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveWriter.java PRE-CREATION flume-ng-sinks/flume-hive-sink/src/test/resources/log4j.properties PRE-CREATION flume-ng-sinks/pom.xml 6ac2b4d pom.xml 362fb45 Diff: https://reviews.apache.org/r/18544/diff/ Testing --- This version lacks unit tests. Thanks, Roshan Naik
[jira] [Updated] (FLUME-1734) Create a Hive Sink based on the new Hive Streaming support
[ https://issues.apache.org/jira/browse/FLUME-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated FLUME-1734: --- Attachment: FLUME-1734.draft.1.patch Draft patch for review. No tests currently. > Create a Hive Sink based on the new Hive Streaming support > -- > > Key: FLUME-1734 > URL: https://issues.apache.org/jira/browse/FLUME-1734 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Affects Versions: v1.2.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: features > Attachments: FLUME-1734.draft.1.patch > > > Create a sink that would stream data into HCatalog partitions. The primary > goal being that once the data is loaded into Hadoop, it should be > automatically queryable (using say Hive or Pig) without requiring additional > post processing steps on behalf of the users. Sink should manage the creation > of new partitions and committing them periodically. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2225) Elasticsearch Sink for ES HTTP API
[ https://issues.apache.org/jira/browse/FLUME-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913276#comment-13913276 ] Otis Gospodnetic commented on FLUME-2225: - [~hshreedharan] - I see another Guava upgrade to 14.x in FLUME-2286. > Elasticsearch Sink for ES HTTP API > -- > > Key: FLUME-2225 > URL: https://issues.apache.org/jira/browse/FLUME-2225 > Project: Flume > Issue Type: New Feature >Affects Versions: v1.5.0 >Reporter: Otis Gospodnetic > Fix For: v1.4.1, v1.5.0 > > Attachments: FLUME-2225-0.patch, FLUME-2225-1.patch > > > Existing ElasticSearchSink uses ES TransportClient. As such, one cannot use > the ES HTTP API, which is sometimes easier, and doesn't have issues around > client and server/cluster components using incompatible versions - currently, > both client and server/cluster need to be on the same version. > See > http://search-hadoop.com/m/k76HH9Te68/otis&subj=Elasticsearch+sink+that+uses+HTTP+API -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (FLUME-1227) Introduce some sort of SpillableChannel
[ https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913186#comment-13913186 ] Otis Gospodnetic edited comment on FLUME-1227 at 2/26/14 5:20 PM: -- Was just about to write to the ML asking about this functionality. Looks like all known issues have been fixed, plus this is new functionality, so it should go in and get some real-world action, which we'd love to give it as soon as 1.5.0 is out! +10 for committing this. Any chances of this going in before 1.5.0 is cut? It's got 32 eyeballs watching it, so there is clear interest. was (Author: otis): Was just about to write to the ML asking about this functionality. Looks like all known issues have been fixed, plus this is new functionality, so it should go in and get some real-world action, which we'd love to give it as soon as 1.5.0 is out! +10 for committing this. Any chances of this going in before 1.5.0 is cut? > Introduce some sort of SpillableChannel > --- > > Key: FLUME-1227 > URL: https://issues.apache.org/jira/browse/FLUME-1227 > Project: Flume > Issue Type: New Feature > Components: Channel >Reporter: Jarek Jarcec Cecho >Assignee: Roshan Naik > Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, > FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, > FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory > Channel Design.pdf > > > I would like to introduce new channel that would behave similarly as scribe > (https://github.com/facebook/scribe). It would be something between memory > and file channel. Input events would be saved directly to the memory (only) > and would be served from there. In case that the memory would be full, we > would outsource the events to file. > Let me describe the use case behind this request. We have plenty of frontend > servers that are generating events. We want to send all events to just > limited number of machines from where we would send the data to HDFS (some > sort of staging layer). Reason for this second layer is our need to decouple > event aggregation and front end code to separate machines. Using memory > channel is fully sufficient as we can survive lost of some portion of the > events. However in order to sustain maintenance windows or networking issues > we would have to end up with a lot of memory assigned to those "staging" > machines. Referenced "scribe" is dealing with this problem by implementing > following logic - events are saved in memory similarly as our MemoryChannel. > However in case that the memory gets full (because of maintenance, networking > issues, ...) it will spill data to disk where they will be sitting until > everything start working again. > I would like to introduce channel that would implement similar logic. It's > durability guarantees would be same as MemoryChannel - in case that someone > would remove power cord, this channel would lose data. Based on the > discussion in FLUME-1201, I would propose to have the implementation > completely independent on any other channel internal code. > Jarcec -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel
[ https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913190#comment-13913190 ] Thilo Seidel commented on FLUME-1227: - Guten Tag, Ich bin heute nicht im Büro. Ihre Mail wird bis zu meiner Rückkehr weder gelesen noch automatisch weitergeleitet. Viele Grüße Thilo Seidel > Introduce some sort of SpillableChannel > --- > > Key: FLUME-1227 > URL: https://issues.apache.org/jira/browse/FLUME-1227 > Project: Flume > Issue Type: New Feature > Components: Channel >Reporter: Jarek Jarcec Cecho >Assignee: Roshan Naik > Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, > FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, > FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory > Channel Design.pdf > > > I would like to introduce new channel that would behave similarly as scribe > (https://github.com/facebook/scribe). It would be something between memory > and file channel. Input events would be saved directly to the memory (only) > and would be served from there. In case that the memory would be full, we > would outsource the events to file. > Let me describe the use case behind this request. We have plenty of frontend > servers that are generating events. We want to send all events to just > limited number of machines from where we would send the data to HDFS (some > sort of staging layer). Reason for this second layer is our need to decouple > event aggregation and front end code to separate machines. Using memory > channel is fully sufficient as we can survive lost of some portion of the > events. However in order to sustain maintenance windows or networking issues > we would have to end up with a lot of memory assigned to those "staging" > machines. Referenced "scribe" is dealing with this problem by implementing > following logic - events are saved in memory similarly as our MemoryChannel. > However in case that the memory gets full (because of maintenance, networking > issues, ...) it will spill data to disk where they will be sitting until > everything start working again. > I would like to introduce channel that would implement similar logic. It's > durability guarantees would be same as MemoryChannel - in case that someone > would remove power cord, this channel would lose data. Based on the > discussion in FLUME-1201, I would propose to have the implementation > completely independent on any other channel internal code. > Jarcec -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel
[ https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913186#comment-13913186 ] Otis Gospodnetic commented on FLUME-1227: - Was just about to write to the ML asking about this functionality. Looks like all known issues have been fixed, plus this is new functionality, so it should go in and get some real-world action, which we'd love to give it as soon as 1.5.0 is out! +10 for committing this. Any chances of this going in before 1.5.0 is cut? > Introduce some sort of SpillableChannel > --- > > Key: FLUME-1227 > URL: https://issues.apache.org/jira/browse/FLUME-1227 > Project: Flume > Issue Type: New Feature > Components: Channel >Reporter: Jarek Jarcec Cecho >Assignee: Roshan Naik > Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, > FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, > FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory > Channel Design.pdf > > > I would like to introduce new channel that would behave similarly as scribe > (https://github.com/facebook/scribe). It would be something between memory > and file channel. Input events would be saved directly to the memory (only) > and would be served from there. In case that the memory would be full, we > would outsource the events to file. > Let me describe the use case behind this request. We have plenty of frontend > servers that are generating events. We want to send all events to just > limited number of machines from where we would send the data to HDFS (some > sort of staging layer). Reason for this second layer is our need to decouple > event aggregation and front end code to separate machines. Using memory > channel is fully sufficient as we can survive lost of some portion of the > events. However in order to sustain maintenance windows or networking issues > we would have to end up with a lot of memory assigned to those "staging" > machines. Referenced "scribe" is dealing with this problem by implementing > following logic - events are saved in memory similarly as our MemoryChannel. > However in case that the memory gets full (because of maintenance, networking > issues, ...) it will spill data to disk where they will be sitting until > everything start working again. > I would like to introduce channel that would implement similar logic. It's > durability guarantees would be same as MemoryChannel - in case that someone > would remove power cord, this channel would lose data. Based on the > discussion in FLUME-1201, I would propose to have the implementation > completely independent on any other channel internal code. > Jarcec -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Issue Comment Deleted] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun updated FLUME-2307: Comment: was deleted (was: Is it possible to port this patch to 1.4.0 branch?) > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUps
[jira] [Commented] (FLUME-2307) Remove Log writetimeout
[ https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912985#comment-13912985 ] Arun commented on FLUME-2307: - Is it possible to port this patch to 1.4.0 branch? > Remove Log writetimeout > --- > > Key: FLUME-2307 > URL: https://issues.apache.org/jira/browse/FLUME-2307 > Project: Flume > Issue Type: Bug > Components: Channel >Affects Versions: v1.4.0 >Reporter: Steve Zesch >Assignee: Hari Shreedharan > Fix For: v1.5.0 > > Attachments: FLUME-2307-1.patch, FLUME-2307.patch > > > I've observed Flume failing to clean up old log data in FileChannels. The > amount of old log data can range anywhere from tens to hundreds of GB. I was > able to confirm that the channels were in fact empty. This behavior always > occurs after lock timeouts when attempting to put, take, rollback, or commit > to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old > files. I was able to confirm that the Log's writeCheckpoint method was still > being called and successfully obtaining a lock from tryLockExclusive(), but I > was not able to confirm removeOldLogs being called. The application log did > not include "Removing old file: log-xyz" for the old files which the Log > class would output if they were correctly being removed. I suspect the lock > timeouts were due to high I/O load at the time. > Some stack traces: > {code} > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478) > at > org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93) > at > org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594) > at > org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) > at > dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548) > at > dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:619) > org.apache.flume.ChannelException: Failed to obtain lock for writing to the > log. Try increasing the log write timeout value. [channel=fileChannel] > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > at > org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194) > at > dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209) > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91) > at org.apache.avro.ipc.Responder.respond(Responder.java:151) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) > at > org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleU
[jira] [Updated] (FLUME-2331) Large TMP files created and never closed
[ https://issues.apache.org/jira/browse/FLUME-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated FLUME-2331: - Attachment: flumelog.zip Thanks! Attached a zipped log file that contains several hours worth of logging data by flume > Large TMP files created and never closed > > > Key: FLUME-2331 > URL: https://issues.apache.org/jira/browse/FLUME-2331 > Project: Flume > Issue Type: Bug > Components: File Channel >Affects Versions: v1.4.0 >Reporter: Krishna Kumar > Attachments: flumelog.zip > > > We are currently writing files to Hadoop partitioned by year, month, day via > Flume. File rollovers are done every 5 minutes. Recently, we noticed that > this file rollover stops happening sometime during the day and that further > data is written to an open TMP file. Because there are no further file > rollovers, this TMP file becomes very large. At the end of the day, the TMP > is not closed also and Flume goes to the next day, creating new files. > We use a "." prefix to prevent Hive from complaining of the open TMP file. > Because of this issue where the TMP file is never closed, the file remains > hidden to Hive even after the day ends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (FLUME-2324) Support writing to multiple HBase clusters using HBaseSink
[ https://issues.apache.org/jira/browse/FLUME-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912746#comment-13912746 ] Gopinathan A commented on FLUME-2324: - +1 for patch. Small suggestion , please update this patch with FlumeUserGuide documentation, also better to mention configuration precedence. > Support writing to multiple HBase clusters using HBaseSink > -- > > Key: FLUME-2324 > URL: https://issues.apache.org/jira/browse/FLUME-2324 > Project: Flume > Issue Type: Bug >Reporter: Hari Shreedharan >Assignee: Hari Shreedharan > Attachments: FLUME-2324.patch > > > The AsyncHBaseSink can already write to multiple HBase clusters, but > HBaseSink cannot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)