Re: Review Request: FLUME-1586. File Channel should support verifying integrity of individual events.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10944/ --- (Updated May 23, 2013, 1:52 a.m.) Review request for Flume. Changes --- Add tests + refactoring. Description --- Patch to add a checksum to events and replace them with a noop event using a tool, if corrupt. This addresses bug FLUME-1586. https://issues.apache.org/jira/browse/FLUME-1586 Diffs (updated) - flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/CorruptEventException.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java cc0d38a flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEvent.java c447335 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventPointer.java 5f06ab7 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java 1918baa flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java d3db896 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFileV3.java d9a2a9b flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/NoopRecordException.java PRE-CREATION flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Pair.java dfcdd73 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Put.java 4235a79 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/ReplayHandler.java fc47b23 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Serialization.java d6897e1 flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/TransactionEventRecord.java 073042f flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java 4860ac2 flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto 1e668d2 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java 0f7d14d flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLog.java 54978f8 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLogFile.java bef22ef flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestTransactionEventRecordV3.java f403422 flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestUtils.java 563dbcc flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AbstractHDFSWriter.java bc3b383 flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java 2c2be6a flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java b8214be flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java 0383744 flume-tools/pom.xml PRE-CREATION flume-tools/src/main/java/org/apache/flume/tools/FileChannelIntegrityTool.java PRE-CREATION flume-tools/src/main/java/org/apache/flume/tools/FlumeTool.java PRE-CREATION flume-tools/src/main/java/org/apache/flume/tools/FlumeToolType.java PRE-CREATION flume-tools/src/main/java/org/apache/flume/tools/FlumeToolsMain.java PRE-CREATION flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java PRE-CREATION flume-tools/src/test/java/org/apache/flume/tools/TestFlumeToolsMain.java PRE-CREATION pom.xml a6992f6 Diff: https://reviews.apache.org/r/10944/diff/ Testing --- Added unit tests when corrupt and noop events are encountered. I will add tests for the tool as well soon. I have not yet tested the tool completely. This patch aims at gathering feedback on the approach. Thanks, Hari Shreedharan
Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hi Phil, Since this is more of a dev discussion I'll just continue the conversation here on this list. FYI the latest Spool Directory Source has support for resuming reading files. Trunk / Flume 1.4 have some new code around this aspect. Regarding Linux, doing lsof is a pretty cool idea but not portable to all systems. Also in Linux, two processes are allowed to have the same file open for write (it's still a bad idea though). I don't know of a portable way to check whether some other process have a given file open. We could, however, check to see if the file changed, and if so just stop processing that file for a while and try again later. I just don't want people to think spooldir is good for "tailing" a file, because it's not… :) Regards, Mike On Wed, May 22, 2013 at 5:24 PM, Phil Scala wrote: > Hey Ed / Mike > > Besides a comment in the users mailing list that I commented on the file > spool starting from the beginning of the file if there was a failure. The > code does have that well commented (in the retireCurrentFile) [if you don't > retire the file then you run the risk of duplicates...fine with my use :)] > > > As Ed mentioned we have been chatting about ensuring there are no > invariants muddled up during file spool processing. I see this as 2 or 3 > pieces...I think the code is pretty solid, with one area I want to look > into. > > I would like to give this more thought... > > The file the spool source has decided is the "next file"... is it in > use/has the "upload" to the spool directory completed. > > Discussions mentioned some "time" delay -> that could be > artificial and still never solve the problem. I need to do some learning > here, coming from windows the file locking was pretty exclusive. I want to > see about FileChannel locks in nio and Linux file management.This could > maybe be an area to look at. Right now there are no locks obtained for the > file being processed. > > I will come back with something a little better formulated soon... > > Thanks > > > Phil Scala > Software Developer / Architect > Global Relay > > phil.sc...@globalrelay.net > > 866.484.6630 | i...@globalrelay.net | globalrelay.com > > -Original Message- > From: ejsa...@gmail.com [mailto:ejsa...@gmail.com] On Behalf Of Edward > Sargisson > Sent: Wednesday, May 22, 2013 12:22 PM > To: Mike Percy; dev@flume.apache.org > Subject: Re: spooldir source reading Flume itself and thinking the file > has changed (1.3.1) > > Hi Mike, > I haven't tried log4j2 in my environments but my review of the log4j2 > change is that it should work. > > What would I change? > Phil Scala may have some thoughts. > > It would be nice if we thought through the file locking. I want to be able > to put a file in the spooldir and know that Flume isn't going to get > started until I'm ready. This certainly involves thinking about what the > file-putting process is doing but it's not clear to me how to ensure this > whole part is safe. > > The thing that is currently annoying is handling stack traces. All logging > systems I've seen (except recent log4j2) output the stack trace with each > frame on a new line. This means that each frame gets its own log event and > the timestamp has to be added by Flume (instead of taken from the original > event). That Flume timestamp might be delayed by up to 1 minute (because of > log rolling so its pretty crap). Logstash has a multiline filter that > somewhat solves this. > > My current approach is to try and get the Log4j2 FlumeAppender and Flume > 1.3.1 reliable and trustworthy. > > Cheers, > Edward > > "Hi Edward, > Did the fixes in LOG4J2-254 fix your file rolling issue? > > What are your thoughts on how to improve spooling directory source's error > handling when it detects a change in the file? Just bail and retry later? I > suppose that's a pretty reasonable approach. > > Regards, > Mike > > > On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson > wrote: > > > Unless I'm mistaken (and concurrent code is easy to be mistaken about) > this > > is a race condition in apache-log4j-extras RollingFileAppender. I live > > in hope that when log4j2 becomes GA we can move to it and then be able > > to use it to log Flume itself. > > > > Evidence: > > File: castellan-reader. > 20130514T2058.log.COMPLETED > > 2013-05-14 20:57:05,330 INFO ... > > > > File: castellan-reader.20130514T2058.log > > 2013-05-14 21:23:05,709 DEBUG ... > > > > Why would an event from 2123 be written into a file from 2058? > > > > My understanding of log4j shows that the RollingFileAppenders end up > > calling this: > > FileAppender: > > public synchronized void setFile(String fileName, boolean append, > boolean > > bufferedIO, int bufferSize) > > > > Which shortly calls: > > this.qw = new QuietWriter(writer, errorHandler); > > > > However, the code to actually write to the writer is this: > > protected > > void subAppend(LoggingEvent event) { > > this.qw.write(this.layout.format(event)); > > > > Un
RE: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hey Ed / Mike Besides a comment in the users mailing list that I commented on the file spool starting from the beginning of the file if there was a failure. The code does have that well commented (in the retireCurrentFile) [if you don't retire the file then you run the risk of duplicates...fine with my use :)] As Ed mentioned we have been chatting about ensuring there are no invariants muddled up during file spool processing. I see this as 2 or 3 pieces...I think the code is pretty solid, with one area I want to look into. I would like to give this more thought... The file the spool source has decided is the "next file"... is it in use/has the "upload" to the spool directory completed. Discussions mentioned some "time" delay -> that could be artificial and still never solve the problem. I need to do some learning here, coming from windows the file locking was pretty exclusive. I want to see about FileChannel locks in nio and Linux file management.This could maybe be an area to look at. Right now there are no locks obtained for the file being processed. I will come back with something a little better formulated soon... Thanks Phil Scala Software Developer / Architect Global Relay phil.sc...@globalrelay.net 866.484.6630 | i...@globalrelay.net | globalrelay.com -Original Message- From: ejsa...@gmail.com [mailto:ejsa...@gmail.com] On Behalf Of Edward Sargisson Sent: Wednesday, May 22, 2013 12:22 PM To: Mike Percy; dev@flume.apache.org Subject: Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1) Hi Mike, I haven't tried log4j2 in my environments but my review of the log4j2 change is that it should work. What would I change? Phil Scala may have some thoughts. It would be nice if we thought through the file locking. I want to be able to put a file in the spooldir and know that Flume isn't going to get started until I'm ready. This certainly involves thinking about what the file-putting process is doing but it's not clear to me how to ensure this whole part is safe. The thing that is currently annoying is handling stack traces. All logging systems I've seen (except recent log4j2) output the stack trace with each frame on a new line. This means that each frame gets its own log event and the timestamp has to be added by Flume (instead of taken from the original event). That Flume timestamp might be delayed by up to 1 minute (because of log rolling so its pretty crap). Logstash has a multiline filter that somewhat solves this. My current approach is to try and get the Log4j2 FlumeAppender and Flume 1.3.1 reliable and trustworthy. Cheers, Edward "Hi Edward, Did the fixes in LOG4J2-254 fix your file rolling issue? What are your thoughts on how to improve spooling directory source's error handling when it detects a change in the file? Just bail and retry later? I suppose that's a pretty reasonable approach. Regards, Mike On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson wrote: > Unless I'm mistaken (and concurrent code is easy to be mistaken about) this > is a race condition in apache-log4j-extras RollingFileAppender. I live > in hope that when log4j2 becomes GA we can move to it and then be able > to use it to log Flume itself. > > Evidence: > File: castellan-reader. 20130514T2058.log.COMPLETED > 2013-05-14 20:57:05,330 INFO ... > > File: castellan-reader.20130514T2058.log > 2013-05-14 21:23:05,709 DEBUG ... > > Why would an event from 2123 be written into a file from 2058? > > My understanding of log4j shows that the RollingFileAppenders end up > calling this: > FileAppender: > public synchronized void setFile(String fileName, boolean append, boolean > bufferedIO, int bufferSize) > > Which shortly calls: > this.qw = new QuietWriter(writer, errorHandler); > > However, the code to actually write to the writer is this: > protected > void subAppend(LoggingEvent event) { > this.qw.write(this.layout.format(event)); > > Unless I'm mistaken there's no happens-before edge between setting the > qw and calling subappend. The code path to get to subAppend appears > not to go through any method synchronized on FileAppender's monitor. > this.qw is not volatile. > > Oh, and based on my cursory inspection of the log4j2 code this exists > in > log4j2 as well. I've just raised log4j2-254 to cover it. We'll see if > I'm actually right... > > Cheers, > Edward > > > > > On Mon, May 13, 2013 at 8:45 AM, Edward Sargisson > wrote: > > > Hi Mike, > > Based on my reading of the various logging frameworks' source code > > and > the > > Java documentation I come to the conclusion that relying on an > > atomic > move > > is not wise. (Next time I see this I might try and prove that the spooled > > file is incomplete). > > > > So I suggest two things: > > 1) A breach of that check should not cause the entire Flume instance > > to stop passing traffic. > > 2) A configurable wait time might work. If you're using the sp
Re: [DISCUSS] Flume 1.4 release plan
+1 for Flume 1.4 +1 for Mike being RM. On Wed, May 22, 2013 at 1:25 PM, Roshan Naik wrote: > +1 for both (non binding) > > > On Wed, May 22, 2013 at 10:41 AM, Mubarak Seyed wrote: > > > +1 for Flume 1.4 > > +1 for Mike being RM > > > > > > -Mubarak > > > > On May 22, 2013, at 9:51 AM, Venkatesh S R wrote: > > > > > +1 for both! Thanks Mike! > > > > > > Best, > > > Venkatesh > > > > > > > > > On Wed, May 22, 2013 at 9:41 AM, Will McQueen > wrote: > > > > > >> +1 for Flume 1.4 > > >> +1 for Mike being RM. > > >> > > >> On May 22, 2013, at 9:28 AM, Edward Sargisson > wrote: > > >> > > >>> Hi All, > > >>> +1/+1 for 1.4 and Mike. > > >>> > > >>> > > >>> I'm very keen to have a 1.4 for the environments I manage. There's a > > lot > > >> of > > >>> stuff I'm keen on in there. > > >>> > > >>> On my pre-1.4 list: > > >>> 1. compile with elasticsearch 0.90 > > >>> 2. figure out file channel state issue which is stopping Flume > logging > > >> via > > >>> itself. > > >>> > > >>> 1. Currently we compile with es 0.19. If somebody wants to run es > 0.20 > > >> they > > >>> have to recompile (es made an interface change that is source > > compatible > > >>> but requires a recompile). es 0.90 has been out for 2-ish weeks so > safe > > >>> enough to change the compile to. I think I'll raise an empty Jira to > > >> record > > >>> this. > > >>> > > >>> 2. I haven't reported this because I haven't isolated it well enough. > > I'm > > >>> having issues with the 1.3.1 file channel which I'd like to resolve. > > >>> > > >>> Cheers, > > >>> Edward > > >>> > > >>> "Hi folks, > > >>> We have had over 100 commits since 1.3.1, and a bunch of new features > > and > > >>> improvements including a Thrift source, much improved ElasticSearch > > sink, > > >>> support for a new plugins directory and layout, compression support > in > > >> the > > >>> avro sink/source, improved checkpointing in the file channel and > more, > > >> plus > > >>> a lot of bug fixes. > > >>> > > >>> It seems to me that it's time to start thinking about cutting a 1.4 > > >>> release. I would be happy to volunteer to RM the release. Worth > noting > > >> that > > >>> I will be unavailable for the next two weeks... but after that I'd be > > >> happy > > >>> to pick this up and run with it. That's also a decent amount of time > > for > > >>> people to get moving on patches and reviews for their favorite > > features, > > >>> bug fixes, etc. > > >>> > > >>> If this all sounds OK, I'd like to suggest targeting the last week of > > >> June > > >>> as a release date. If we can release in time for Hadoop Summit then > > that > > >>> would be pretty nice. Otherwise, if something comes up and we can't > get > > >> the > > >>> release out that week, let's shoot for the first week of July at the > > >> latest. > > >>> > > >>> Please let me know your thoughts. > > >>> > > >>> Regards, > > >>> Mike > > >>> > > >>> +1 for Flume 1.4 > > >>> +1 for Mike being RM. > > >>> > > >>> > > >>> Cheers, > > >>> Hari" > > >> > > > > > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[FLUME-1995] Remote Channel for Apache Flume
I wanted to get some feedback from others before deciding whether or not to continue working on this. I initially filed this improvement/new feature because of use cases where there is a hardware failure on the machine where the agent is currently running. In terms of disaster recovery, having the events queue up on a remote machine (preferably in the same internal network) will allow another agent with the same configuration to pick it up from another machine and restart the process of data transport towards the sink. Sometimes, events may take a while to process and they may end up staying in the channels (FileChannel) for a long time, during which hardware failure could occur. If the data in the events is mission critical, this could cause a lot of headaches if there is no easy way to recover from the hardware failure after events have been queued up in the file channel. What are your thoughts towards the remote channel? I understand there is a JDBC Channel (http://flume.apache.org/FlumeUserGuide.html#jdbc-channel) but I have heard it has performance issues. This is why I am deciding to use a NoSQL store to solve this. I would like to get some feedback from others so that I can prioritize the tasks in my JIRA queue especially with the 1.4.0 release deadline drawing nearer. Thanks.
[jira] [Commented] (FLUME-1995) CassandraChannel - A Distributed Channel Backed By Apache Cassandra as a Persistent Store for Events
[ https://issues.apache.org/jira/browse/FLUME-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664406#comment-13664406 ] Israel Ekpo commented on FLUME-1995: Hi Edward, Thank you for the feedback. After reading the internal architecture document, I do see your point. I will explore another option then as this would lead to performance and disk space issues down the line. My reason for queuing outside of the same computer as the Flume agent is that there are uses cases where the data is very important and if there happens to be a non-recoverable hardware failure on the agent's machine where the file channel is located, it would be easier to restart another agent with the same configuration from a separate machine since the events would still be available elsewhere. We can discuss more alternatives offline (regarding what the queuing solution should be) but I still think there is a need for a high-performing channel that queues the events outside of the machine where the agent is running. > CassandraChannel - A Distributed Channel Backed By Apache Cassandra as a > Persistent Store for Events > > > Key: FLUME-1995 > URL: https://issues.apache.org/jira/browse/FLUME-1995 > Project: Flume > Issue Type: New Feature > Components: Channel >Affects Versions: v1.4.0 >Reporter: Israel Ekpo >Assignee: Israel Ekpo > > Apache Cassandra Channel > The events received by this channel are queued up in Cassandra to be picked > up later when sinks send pickup requests to the channel. > This type of channel is suitable for use cases where recoverability in the > event of a hardware failure on the agent machine is important. > The Cassandra cluster can be located on a remote machine. > Cassandra also supports replication which could back up and replicate the > events further to other nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Flume 1.4 release plan
+1 for both (non binding) On Wed, May 22, 2013 at 10:41 AM, Mubarak Seyed wrote: > +1 for Flume 1.4 > +1 for Mike being RM > > > -Mubarak > > On May 22, 2013, at 9:51 AM, Venkatesh S R wrote: > > > +1 for both! Thanks Mike! > > > > Best, > > Venkatesh > > > > > > On Wed, May 22, 2013 at 9:41 AM, Will McQueen wrote: > > > >> +1 for Flume 1.4 > >> +1 for Mike being RM. > >> > >> On May 22, 2013, at 9:28 AM, Edward Sargisson wrote: > >> > >>> Hi All, > >>> +1/+1 for 1.4 and Mike. > >>> > >>> > >>> I'm very keen to have a 1.4 for the environments I manage. There's a > lot > >> of > >>> stuff I'm keen on in there. > >>> > >>> On my pre-1.4 list: > >>> 1. compile with elasticsearch 0.90 > >>> 2. figure out file channel state issue which is stopping Flume logging > >> via > >>> itself. > >>> > >>> 1. Currently we compile with es 0.19. If somebody wants to run es 0.20 > >> they > >>> have to recompile (es made an interface change that is source > compatible > >>> but requires a recompile). es 0.90 has been out for 2-ish weeks so safe > >>> enough to change the compile to. I think I'll raise an empty Jira to > >> record > >>> this. > >>> > >>> 2. I haven't reported this because I haven't isolated it well enough. > I'm > >>> having issues with the 1.3.1 file channel which I'd like to resolve. > >>> > >>> Cheers, > >>> Edward > >>> > >>> "Hi folks, > >>> We have had over 100 commits since 1.3.1, and a bunch of new features > and > >>> improvements including a Thrift source, much improved ElasticSearch > sink, > >>> support for a new plugins directory and layout, compression support in > >> the > >>> avro sink/source, improved checkpointing in the file channel and more, > >> plus > >>> a lot of bug fixes. > >>> > >>> It seems to me that it's time to start thinking about cutting a 1.4 > >>> release. I would be happy to volunteer to RM the release. Worth noting > >> that > >>> I will be unavailable for the next two weeks... but after that I'd be > >> happy > >>> to pick this up and run with it. That's also a decent amount of time > for > >>> people to get moving on patches and reviews for their favorite > features, > >>> bug fixes, etc. > >>> > >>> If this all sounds OK, I'd like to suggest targeting the last week of > >> June > >>> as a release date. If we can release in time for Hadoop Summit then > that > >>> would be pretty nice. Otherwise, if something comes up and we can't get > >> the > >>> release out that week, let's shoot for the first week of July at the > >> latest. > >>> > >>> Please let me know your thoughts. > >>> > >>> Regards, > >>> Mike > >>> > >>> +1 for Flume 1.4 > >>> +1 for Mike being RM. > >>> > >>> > >>> Cheers, > >>> Hari" > >> > >
Re: [DISCUSS] Flume 1.4 release plan
+1 for Flume 1.4 +1 for Mike being RM -Mubarak On May 22, 2013, at 9:51 AM, Venkatesh S R wrote: > +1 for both! Thanks Mike! > > Best, > Venkatesh > > > On Wed, May 22, 2013 at 9:41 AM, Will McQueen wrote: > >> +1 for Flume 1.4 >> +1 for Mike being RM. >> >> On May 22, 2013, at 9:28 AM, Edward Sargisson wrote: >> >>> Hi All, >>> +1/+1 for 1.4 and Mike. >>> >>> >>> I'm very keen to have a 1.4 for the environments I manage. There's a lot >> of >>> stuff I'm keen on in there. >>> >>> On my pre-1.4 list: >>> 1. compile with elasticsearch 0.90 >>> 2. figure out file channel state issue which is stopping Flume logging >> via >>> itself. >>> >>> 1. Currently we compile with es 0.19. If somebody wants to run es 0.20 >> they >>> have to recompile (es made an interface change that is source compatible >>> but requires a recompile). es 0.90 has been out for 2-ish weeks so safe >>> enough to change the compile to. I think I'll raise an empty Jira to >> record >>> this. >>> >>> 2. I haven't reported this because I haven't isolated it well enough. I'm >>> having issues with the 1.3.1 file channel which I'd like to resolve. >>> >>> Cheers, >>> Edward >>> >>> "Hi folks, >>> We have had over 100 commits since 1.3.1, and a bunch of new features and >>> improvements including a Thrift source, much improved ElasticSearch sink, >>> support for a new plugins directory and layout, compression support in >> the >>> avro sink/source, improved checkpointing in the file channel and more, >> plus >>> a lot of bug fixes. >>> >>> It seems to me that it's time to start thinking about cutting a 1.4 >>> release. I would be happy to volunteer to RM the release. Worth noting >> that >>> I will be unavailable for the next two weeks... but after that I'd be >> happy >>> to pick this up and run with it. That's also a decent amount of time for >>> people to get moving on patches and reviews for their favorite features, >>> bug fixes, etc. >>> >>> If this all sounds OK, I'd like to suggest targeting the last week of >> June >>> as a release date. If we can release in time for Hadoop Summit then that >>> would be pretty nice. Otherwise, if something comes up and we can't get >> the >>> release out that week, let's shoot for the first week of July at the >> latest. >>> >>> Please let me know your thoughts. >>> >>> Regards, >>> Mike >>> >>> +1 for Flume 1.4 >>> +1 for Mike being RM. >>> >>> >>> Cheers, >>> Hari" >>
[jira] [Created] (FLUME-2050) Upgrade to log4j2 (when GA)
Edward Sargisson created FLUME-2050: --- Summary: Upgrade to log4j2 (when GA) Key: FLUME-2050 URL: https://issues.apache.org/jira/browse/FLUME-2050 Project: Flume Issue Type: Improvement Reporter: Edward Sargisson Log4j1 is being abandoned in favour of log4j2. Log4j2, by all that I've seen, has better concurrency handling and the Log4j2 FlumeAppender is nice (easily configurable, 3 different styles of agents). Log4j1 has a concurrency defect which means that rolling over a log file into a directory for the Flume spool directory source will not be reliable. Log4j2 has fixed this. Alternatively the log4j2 FlumeAppender may allow Flume to log its own logs via itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Flume 1.4 release plan
+1 for both! Thanks Mike! Best, Venkatesh On Wed, May 22, 2013 at 9:41 AM, Will McQueen wrote: > +1 for Flume 1.4 > +1 for Mike being RM. > > On May 22, 2013, at 9:28 AM, Edward Sargisson wrote: > > > Hi All, > > +1/+1 for 1.4 and Mike. > > > > > > I'm very keen to have a 1.4 for the environments I manage. There's a lot > of > > stuff I'm keen on in there. > > > > On my pre-1.4 list: > > 1. compile with elasticsearch 0.90 > > 2. figure out file channel state issue which is stopping Flume logging > via > > itself. > > > > 1. Currently we compile with es 0.19. If somebody wants to run es 0.20 > they > > have to recompile (es made an interface change that is source compatible > > but requires a recompile). es 0.90 has been out for 2-ish weeks so safe > > enough to change the compile to. I think I'll raise an empty Jira to > record > > this. > > > > 2. I haven't reported this because I haven't isolated it well enough. I'm > > having issues with the 1.3.1 file channel which I'd like to resolve. > > > > Cheers, > > Edward > > > > "Hi folks, > > We have had over 100 commits since 1.3.1, and a bunch of new features and > > improvements including a Thrift source, much improved ElasticSearch sink, > > support for a new plugins directory and layout, compression support in > the > > avro sink/source, improved checkpointing in the file channel and more, > plus > > a lot of bug fixes. > > > > It seems to me that it's time to start thinking about cutting a 1.4 > > release. I would be happy to volunteer to RM the release. Worth noting > that > > I will be unavailable for the next two weeks... but after that I'd be > happy > > to pick this up and run with it. That's also a decent amount of time for > > people to get moving on patches and reviews for their favorite features, > > bug fixes, etc. > > > > If this all sounds OK, I'd like to suggest targeting the last week of > June > > as a release date. If we can release in time for Hadoop Summit then that > > would be pretty nice. Otherwise, if something comes up and we can't get > the > > release out that week, let's shoot for the first week of July at the > latest. > > > > Please let me know your thoughts. > > > > Regards, > > Mike > > > > +1 for Flume 1.4 > > +1 for Mike being RM. > > > > > > Cheers, > > Hari" >
Re: [DISCUSS] Flume 1.4 release plan
+1 for Flume 1.4 +1 for Mike being RM. On May 22, 2013, at 9:28 AM, Edward Sargisson wrote: > Hi All, > +1/+1 for 1.4 and Mike. > > > I'm very keen to have a 1.4 for the environments I manage. There's a lot of > stuff I'm keen on in there. > > On my pre-1.4 list: > 1. compile with elasticsearch 0.90 > 2. figure out file channel state issue which is stopping Flume logging via > itself. > > 1. Currently we compile with es 0.19. If somebody wants to run es 0.20 they > have to recompile (es made an interface change that is source compatible > but requires a recompile). es 0.90 has been out for 2-ish weeks so safe > enough to change the compile to. I think I'll raise an empty Jira to record > this. > > 2. I haven't reported this because I haven't isolated it well enough. I'm > having issues with the 1.3.1 file channel which I'd like to resolve. > > Cheers, > Edward > > "Hi folks, > We have had over 100 commits since 1.3.1, and a bunch of new features and > improvements including a Thrift source, much improved ElasticSearch sink, > support for a new plugins directory and layout, compression support in the > avro sink/source, improved checkpointing in the file channel and more, plus > a lot of bug fixes. > > It seems to me that it's time to start thinking about cutting a 1.4 > release. I would be happy to volunteer to RM the release. Worth noting that > I will be unavailable for the next two weeks... but after that I'd be happy > to pick this up and run with it. That's also a decent amount of time for > people to get moving on patches and reviews for their favorite features, > bug fixes, etc. > > If this all sounds OK, I'd like to suggest targeting the last week of June > as a release date. If we can release in time for Hadoop Summit then that > would be pretty nice. Otherwise, if something comes up and we can't get the > release out that week, let's shoot for the first week of July at the latest. > > Please let me know your thoughts. > > Regards, > Mike > > +1 for Flume 1.4 > +1 for Mike being RM. > > > Cheers, > Hari"
[jira] [Created] (FLUME-2049) Compile ElasticSearchSink with elasticsearch 0.90
Edward Sargisson created FLUME-2049: --- Summary: Compile ElasticSearchSink with elasticsearch 0.90 Key: FLUME-2049 URL: https://issues.apache.org/jira/browse/FLUME-2049 Project: Flume Issue Type: Improvement Components: Sinks+Sources Affects Versions: v1.3.1, v1.4.0 Reporter: Edward Sargisson Assignee: Edward Sargisson The ElasticSearchSink currently compiles against es 0.19. Using 0.20 requires a recompile. I haven't tried 0.90 yet. 0.90 has been out for 2-ish weeks so we should change to compiling with it prior to 1.4 release. I'm hoping to get priority for doing this in the next two weeks or so - however I have no issue with somebody else doing it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Flume 1.4 release plan
Hi All, +1/+1 for 1.4 and Mike. I'm very keen to have a 1.4 for the environments I manage. There's a lot of stuff I'm keen on in there. On my pre-1.4 list: 1. compile with elasticsearch 0.90 2. figure out file channel state issue which is stopping Flume logging via itself. 1. Currently we compile with es 0.19. If somebody wants to run es 0.20 they have to recompile (es made an interface change that is source compatible but requires a recompile). es 0.90 has been out for 2-ish weeks so safe enough to change the compile to. I think I'll raise an empty Jira to record this. 2. I haven't reported this because I haven't isolated it well enough. I'm having issues with the 1.3.1 file channel which I'd like to resolve. Cheers, Edward "Hi folks, We have had over 100 commits since 1.3.1, and a bunch of new features and improvements including a Thrift source, much improved ElasticSearch sink, support for a new plugins directory and layout, compression support in the avro sink/source, improved checkpointing in the file channel and more, plus a lot of bug fixes. It seems to me that it's time to start thinking about cutting a 1.4 release. I would be happy to volunteer to RM the release. Worth noting that I will be unavailable for the next two weeks... but after that I'd be happy to pick this up and run with it. That's also a decent amount of time for people to get moving on patches and reviews for their favorite features, bug fixes, etc. If this all sounds OK, I'd like to suggest targeting the last week of June as a release date. If we can release in time for Hadoop Summit then that would be pretty nice. Otherwise, if something comes up and we can't get the release out that week, let's shoot for the first week of July at the latest. Please let me know your thoughts. Regards, Mike +1 for Flume 1.4 +1 for Mike being RM. Cheers, Hari"
Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hi Mike, I haven't tried log4j2 in my environments but my review of the log4j2 change is that it should work. What would I change? Phil Scala may have some thoughts. It would be nice if we thought through the file locking. I want to be able to put a file in the spooldir and know that Flume isn't going to get started until I'm ready. This certainly involves thinking about what the file-putting process is doing but it's not clear to me how to ensure this whole part is safe. The thing that is currently annoying is handling stack traces. All logging systems I've seen (except recent log4j2) output the stack trace with each frame on a new line. This means that each frame gets its own log event and the timestamp has to be added by Flume (instead of taken from the original event). That Flume timestamp might be delayed by up to 1 minute (because of log rolling so its pretty crap). Logstash has a multiline filter that somewhat solves this. My current approach is to try and get the Log4j2 FlumeAppender and Flume 1.3.1 reliable and trustworthy. Cheers, Edward "Hi Edward, Did the fixes in LOG4J2-254 fix your file rolling issue? What are your thoughts on how to improve spooling directory source's error handling when it detects a change in the file? Just bail and retry later? I suppose that's a pretty reasonable approach. Regards, Mike On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson wrote: > Unless I'm mistaken (and concurrent code is easy to be mistaken about) this > is a race condition in apache-log4j-extras RollingFileAppender. I live in > hope that when log4j2 becomes GA we can move to it and then be able to use > it to log Flume itself. > > Evidence: > File: castellan-reader. 20130514T2058.log.COMPLETED > 2013-05-14 20:57:05,330 INFO ... > > File: castellan-reader.20130514T2058.log > 2013-05-14 21:23:05,709 DEBUG ... > > Why would an event from 2123 be written into a file from 2058? > > My understanding of log4j shows that the RollingFileAppenders end up > calling this: > FileAppender: > public synchronized void setFile(String fileName, boolean append, boolean > bufferedIO, int bufferSize) > > Which shortly calls: > this.qw = new QuietWriter(writer, errorHandler); > > However, the code to actually write to the writer is this: > protected > void subAppend(LoggingEvent event) { > this.qw.write(this.layout.format(event)); > > Unless I'm mistaken there's no happens-before edge between setting the qw > and calling subappend. The code path to get to subAppend appears not to go > through any method synchronized on FileAppender's monitor. this.qw is not > volatile. > > Oh, and based on my cursory inspection of the log4j2 code this exists in > log4j2 as well. I've just raised log4j2-254 to cover it. We'll see if I'm > actually right... > > Cheers, > Edward > > > > > On Mon, May 13, 2013 at 8:45 AM, Edward Sargisson > wrote: > > > Hi Mike, > > Based on my reading of the various logging frameworks' source code and > the > > Java documentation I come to the conclusion that relying on an atomic > move > > is not wise. (Next time I see this I might try and prove that the spooled > > file is incomplete). > > > > So I suggest two things: > > 1) A breach of that check should not cause the entire Flume instance to > > stop passing traffic. > > 2) A configurable wait time might work. If you're using the spooling > > source then you've already decided to have some latency so a little more > is > > fine. However, there is still a risk of a race condition because there is > > no signal that the copy is finished. > > > > Cheers, > > Edward > > > > "Hi Edward, > > Thanks for investigating. I'm definitely open to suggestions for > > improvement with this. Maybe dying is a bit extreme… the goal was to > ensure > > that people could not possibly try to use it to tail a file, which will > > definitely not work correctly! :) > > > > Mike > > > > > > > > On Fri, May 10, 2013 at 5:02 PM, Edward Sargisson > > wrote: > > > > > Hi Mike, > > > I was curious so I went on a bit of a hunt through logger source code. > > > The result is that loggers can't be relied on to atomically roll the > > > file so a feature to allow a delay before checking the file would be > > > of great utility. > > > > > > For that matter, having Flume not die completely in this scenario > > > would also be good. > > > > > > apache-log4j-extras does this [1]: > > > return source.renameTo(destination); > > > > > > logback does this [2]: > > > boolean result = srcFile.renameTo(targetFile); > > > > > > log4j2 does this [3]: > > > srcStream = new FileInputStream(source); > > > destStream = new FileOutputStream(destination); > > > srcChannel = srcStream.getChannel(); > > > destChannel = destStream.getChannel(); > > > destChannel.transferFrom( > > srcChannel, 0, srcChannel.size()); > > > > > > The JavaDoc for File.renameTo says: > > > Many aspects of the behavior of this method are inherently > > > platform-d
[jira] [Commented] (FLUME-1995) CassandraChannel - A Distributed Channel Backed By Apache Cassandra as a Persistent Store for Events
[ https://issues.apache.org/jira/browse/FLUME-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664219#comment-13664219 ] Edward Sargisson commented on FLUME-1995: - I think this is bad idea. I just finished working on a team for 18 months heavily using Cassandra and know a little about its internal design. Generally speaking, the advice is that Cassandra is a bad choice for a queue. Queuing behaviour means that you have some producers adding items and consumers deleting items. Cassandra doesn't really delete - it's an append only system so a delete means that it creates a tombstone in the latest SSTable. Then, sometime later, a repair process is run which ensures that all the records are actually deleted. In the meantime you run the risk of some of the nodes replying with a 'latest' record that may have been deleted off some other node but the update hasn't propagated yet. If this is not convincing enough then I'll discuss it on the Cassandra list and bring the results back here. If you happen to want large scalable queueing then a common solution I've seen is to use Redis. However, I don't see why you wouldn't use multiple Flume agents and file channels to solve the same problem. > CassandraChannel - A Distributed Channel Backed By Apache Cassandra as a > Persistent Store for Events > > > Key: FLUME-1995 > URL: https://issues.apache.org/jira/browse/FLUME-1995 > Project: Flume > Issue Type: New Feature > Components: Channel >Affects Versions: v1.4.0 >Reporter: Israel Ekpo >Assignee: Israel Ekpo > > Apache Cassandra Channel > The events received by this channel are queued up in Cassandra to be picked > up later when sinks send pickup requests to the channel. > This type of channel is suitable for use cases where recoverability in the > event of a hardware failure on the agent machine is important. > The Cassandra cluster can be located on a remote machine. > Cassandra also supports replication which could back up and replicate the > events further to other nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Flume 1.4 release plan
+1 (non-binding) for both! Thanks for the effort, Mike! On May 22, 2013, at 10:09 AM, Jarek Jarcec Cecho wrote: > +1 for releasing Flume 1.4 > +1 for Mike being the RM for this release > > Jarcec > > On Wed, May 22, 2013 at 01:02:06AM -0700, Arvind Prabhakar wrote: >> Thanks for taking this initiative Mike! >> >> +1 for 1.4 and Mike as RM. >> >> Regards, >> Arvind Prabhakar >> >> On Wed, May 22, 2013 at 12:45 AM, Hari Shreedharan < >> hshreedha...@cloudera.com> wrote: >> >>> +1 for Flume 1.4 >>> +1 for Mike being RM. >>> >>> >>> Cheers, >>> Hari >>> >>> >>> On Wednesday, May 22, 2013 at 12:33 AM, Mike Percy wrote: >>> Hi folks, We have had over 100 commits since 1.3.1, and a bunch of new features and improvements including a Thrift source, much improved ElasticSearch sink, support for a new plugins directory and layout, compression support in >>> the avro sink/source, improved checkpointing in the file channel and more, >>> plus a lot of bug fixes. It seems to me that it's time to start thinking about cutting a 1.4 release. I would be happy to volunteer to RM the release. Worth noting >>> that I will be unavailable for the next two weeks... but after that I'd be >>> happy to pick this up and run with it. That's also a decent amount of time for people to get moving on patches and reviews for their favorite features, bug fixes, etc. If this all sounds OK, I'd like to suggest targeting the last week of >>> June as a release date. If we can release in time for Hadoop Summit then that would be pretty nice. Otherwise, if something comes up and we can't get >>> the release out that week, let's shoot for the first week of July at the >>> latest. Please let me know your thoughts. Regards, Mike >>> >>> >>> -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
Re: [DISCUSS] Flume 1.4 release plan
+1 for releasing Flume 1.4 +1 for Mike being the RM for this release Jarcec On Wed, May 22, 2013 at 01:02:06AM -0700, Arvind Prabhakar wrote: > Thanks for taking this initiative Mike! > > +1 for 1.4 and Mike as RM. > > Regards, > Arvind Prabhakar > > On Wed, May 22, 2013 at 12:45 AM, Hari Shreedharan < > hshreedha...@cloudera.com> wrote: > > > +1 for Flume 1.4 > > +1 for Mike being RM. > > > > > > Cheers, > > Hari > > > > > > On Wednesday, May 22, 2013 at 12:33 AM, Mike Percy wrote: > > > > > Hi folks, > > > We have had over 100 commits since 1.3.1, and a bunch of new features and > > > improvements including a Thrift source, much improved ElasticSearch sink, > > > support for a new plugins directory and layout, compression support in > > the > > > avro sink/source, improved checkpointing in the file channel and more, > > plus > > > a lot of bug fixes. > > > > > > It seems to me that it's time to start thinking about cutting a 1.4 > > > release. I would be happy to volunteer to RM the release. Worth noting > > that > > > I will be unavailable for the next two weeks... but after that I'd be > > happy > > > to pick this up and run with it. That's also a decent amount of time for > > > people to get moving on patches and reviews for their favorite features, > > > bug fixes, etc. > > > > > > If this all sounds OK, I'd like to suggest targeting the last week of > > June > > > as a release date. If we can release in time for Hadoop Summit then that > > > would be pretty nice. Otherwise, if something comes up and we can't get > > the > > > release out that week, let's shoot for the first week of July at the > > latest. > > > > > > Please let me know your thoughts. > > > > > > Regards, > > > Mike > > > > > > > > > > > > signature.asc Description: Digital signature
Re: [DISCUSS] Flume 1.4 release plan
Thanks for taking this initiative Mike! +1 for 1.4 and Mike as RM. Regards, Arvind Prabhakar On Wed, May 22, 2013 at 12:45 AM, Hari Shreedharan < hshreedha...@cloudera.com> wrote: > +1 for Flume 1.4 > +1 for Mike being RM. > > > Cheers, > Hari > > > On Wednesday, May 22, 2013 at 12:33 AM, Mike Percy wrote: > > > Hi folks, > > We have had over 100 commits since 1.3.1, and a bunch of new features and > > improvements including a Thrift source, much improved ElasticSearch sink, > > support for a new plugins directory and layout, compression support in > the > > avro sink/source, improved checkpointing in the file channel and more, > plus > > a lot of bug fixes. > > > > It seems to me that it's time to start thinking about cutting a 1.4 > > release. I would be happy to volunteer to RM the release. Worth noting > that > > I will be unavailable for the next two weeks... but after that I'd be > happy > > to pick this up and run with it. That's also a decent amount of time for > > people to get moving on patches and reviews for their favorite features, > > bug fixes, etc. > > > > If this all sounds OK, I'd like to suggest targeting the last week of > June > > as a release date. If we can release in time for Hadoop Summit then that > > would be pretty nice. Otherwise, if something comes up and we can't get > the > > release out that week, let's shoot for the first week of July at the > latest. > > > > Please let me know your thoughts. > > > > Regards, > > Mike > > > > > > >
[jira] [Updated] (FLUME-2048) Avro container file deserializer
[ https://issues.apache.org/jira/browse/FLUME-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated FLUME-2048: -- Attachment: FLUME-2048.patch > Avro container file deserializer > > > Key: FLUME-2048 > URL: https://issues.apache.org/jira/browse/FLUME-2048 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Mike Percy >Assignee: Mike Percy > Attachments: FLUME-2048.patch > > > It would be great to support an avro container format deserializer in the > spool directory source. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-2048) Avro container file deserializer
[ https://issues.apache.org/jira/browse/FLUME-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated FLUME-2048: -- Attachment: (was: FLUME-2048.patch) > Avro container file deserializer > > > Key: FLUME-2048 > URL: https://issues.apache.org/jira/browse/FLUME-2048 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Mike Percy >Assignee: Mike Percy > Attachments: FLUME-2048.patch > > > It would be great to support an avro container format deserializer in the > spool directory source. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-2048) Avro container file deserializer
[ https://issues.apache.org/jira/browse/FLUME-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated FLUME-2048: -- Attachment: FLUME-2048.patch Unit test is currently failing, but here is some partial progress on this feature. > Avro container file deserializer > > > Key: FLUME-2048 > URL: https://issues.apache.org/jira/browse/FLUME-2048 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Mike Percy >Assignee: Mike Percy > Attachments: FLUME-2048.patch > > > It would be great to support an avro container format deserializer in the > spool directory source. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (FLUME-2048) Avro container file deserializer
Mike Percy created FLUME-2048: - Summary: Avro container file deserializer Key: FLUME-2048 URL: https://issues.apache.org/jira/browse/FLUME-2048 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Mike Percy Attachments: FLUME-2048.patch It would be great to support an avro container format deserializer in the spool directory source. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (FLUME-2048) Avro container file deserializer
[ https://issues.apache.org/jira/browse/FLUME-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy reassigned FLUME-2048: - Assignee: Mike Percy > Avro container file deserializer > > > Key: FLUME-2048 > URL: https://issues.apache.org/jira/browse/FLUME-2048 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Mike Percy >Assignee: Mike Percy > Attachments: FLUME-2048.patch > > > It would be great to support an avro container format deserializer in the > spool directory source. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Flume 1.4 release plan
+1 for Flume 1.4 +1 for Mike being RM. Cheers, Hari On Wednesday, May 22, 2013 at 12:33 AM, Mike Percy wrote: > Hi folks, > We have had over 100 commits since 1.3.1, and a bunch of new features and > improvements including a Thrift source, much improved ElasticSearch sink, > support for a new plugins directory and layout, compression support in the > avro sink/source, improved checkpointing in the file channel and more, plus > a lot of bug fixes. > > It seems to me that it's time to start thinking about cutting a 1.4 > release. I would be happy to volunteer to RM the release. Worth noting that > I will be unavailable for the next two weeks... but after that I'd be happy > to pick this up and run with it. That's also a decent amount of time for > people to get moving on patches and reviews for their favorite features, > bug fixes, etc. > > If this all sounds OK, I'd like to suggest targeting the last week of June > as a release date. If we can release in time for Hadoop Summit then that > would be pretty nice. Otherwise, if something comes up and we can't get the > release out that week, let's shoot for the first week of July at the latest. > > Please let me know your thoughts. > > Regards, > Mike > >
[DISCUSS] Flume 1.4 release plan
Hi folks, We have had over 100 commits since 1.3.1, and a bunch of new features and improvements including a Thrift source, much improved ElasticSearch sink, support for a new plugins directory and layout, compression support in the avro sink/source, improved checkpointing in the file channel and more, plus a lot of bug fixes. It seems to me that it's time to start thinking about cutting a 1.4 release. I would be happy to volunteer to RM the release. Worth noting that I will be unavailable for the next two weeks... but after that I'd be happy to pick this up and run with it. That's also a decent amount of time for people to get moving on patches and reviews for their favorite features, bug fixes, etc. If this all sounds OK, I'd like to suggest targeting the last week of June as a release date. If we can release in time for Hadoop Summit then that would be pretty nice. Otherwise, if something comes up and we can't get the release out that week, let's shoot for the first week of July at the latest. Please let me know your thoughts. Regards, Mike
Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)
Hi Edward, Did the fixes in LOG4J2-254 fix your file rolling issue? What are your thoughts on how to improve spooling directory source's error handling when it detects a change in the file? Just bail and retry later? I suppose that's a pretty reasonable approach. Regards, Mike On Tue, May 14, 2013 at 4:50 PM, Edward Sargisson wrote: > Unless I'm mistaken (and concurrent code is easy to be mistaken about) this > is a race condition in apache-log4j-extras RollingFileAppender. I live in > hope that when log4j2 becomes GA we can move to it and then be able to use > it to log Flume itself. > > Evidence: > File: castellan-reader.20130514T2058.log.COMPLETED > 2013-05-14 20:57:05,330 INFO ... > > File: castellan-reader.20130514T2058.log > 2013-05-14 21:23:05,709 DEBUG ... > > Why would an event from 2123 be written into a file from 2058? > > My understanding of log4j shows that the RollingFileAppenders end up > calling this: > FileAppender: > public synchronized void setFile(String fileName, boolean append, boolean > bufferedIO, int bufferSize) > > Which shortly calls: > this.qw = new QuietWriter(writer, errorHandler); > > However, the code to actually write to the writer is this: > protected > void subAppend(LoggingEvent event) { > this.qw.write(this.layout.format(event)); > > Unless I'm mistaken there's no happens-before edge between setting the qw > and calling subappend. The code path to get to subAppend appears not to go > through any method synchronized on FileAppender's monitor. this.qw is not > volatile. > > Oh, and based on my cursory inspection of the log4j2 code this exists in > log4j2 as well. I've just raised log4j2-254 to cover it. We'll see if I'm > actually right... > > Cheers, > Edward > > > > > On Mon, May 13, 2013 at 8:45 AM, Edward Sargisson > wrote: > > > Hi Mike, > > Based on my reading of the various logging frameworks' source code and > the > > Java documentation I come to the conclusion that relying on an atomic > move > > is not wise. (Next time I see this I might try and prove that the spooled > > file is incomplete). > > > > So I suggest two things: > > 1) A breach of that check should not cause the entire Flume instance to > > stop passing traffic. > > 2) A configurable wait time might work. If you're using the spooling > > source then you've already decided to have some latency so a little more > is > > fine. However, there is still a risk of a race condition because there is > > no signal that the copy is finished. > > > > Cheers, > > Edward > > > > "Hi Edward, > > Thanks for investigating. I'm definitely open to suggestions for > > improvement with this. Maybe dying is a bit extreme… the goal was to > ensure > > that people could not possibly try to use it to tail a file, which will > > definitely not work correctly! :) > > > > Mike > > > > > > > > On Fri, May 10, 2013 at 5:02 PM, Edward Sargisson > > wrote: > > > > > Hi Mike, > > > I was curious so I went on a bit of a hunt through logger source code. > > > The result is that loggers can't be relied on to atomically roll the > > > file so a feature to allow a delay before checking the file would be > > > of great utility. > > > > > > For that matter, having Flume not die completely in this scenario > > > would also be good. > > > > > > apache-log4j-extras does this [1]: > > > return source.renameTo(destination); > > > > > > logback does this [2]: > > > boolean result = srcFile.renameTo(targetFile); > > > > > > log4j2 does this [3]: > > > srcStream = new FileInputStream(source); > > > destStream = new FileOutputStream(destination); > > > srcChannel = srcStream.getChannel(); > > > destChannel = destStream.getChannel(); > > > destChannel.transferFrom( > > srcChannel, 0, srcChannel.size()); > > > > > > The JavaDoc for File.renameTo says: > > > Many aspects of the behavior of this method are inherently > > > platform-dependent: The rename operation might not be able to move a > > > file from one filesystem to another, it might not be atomic, and it > > > might not succeed if a file with the destination abstract pathname > > > already exists. The return value should always be checked to make > sure > > > that the rename operation was successful. > > > > > > My conclusion is that the loggers (except possibly log4j2) can't be > > > relied on to atomically roll the file. > > > > > > Cheers, > > > Edward > > > > > > > > > Links: > > > [1] > > > > > > http://svn.apache.org/viewvc/logging/log4j/companions/extras/trunk/src/main/java/org/apache/log4j/rolling/helper/FileRenameAction.java?view=markup > > > l77 > > > > > > [2] > > > > > > https://github.com/qos-ch/logback/blob/master/logback-core/src/main/java/ch/qos/logback/core/rolling/helper/RenameUtil.java > > > , > > > l63 > > > [3] > > > > > > https://svn.apache.org/repos/asf/logging/log4j/log4j2/trunk/core/src/main/java/org/apache/logging/log4j/core/appender/rolling/helper/FileRenameAction.java > > > > > > > > > >Hi Edw