[jira] [Commented] (FLUME-1485) File Channel should support checksum
[ https://issues.apache.org/jira/browse/FLUME-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434811#comment-13434811 ] Hari Shreedharan commented on FLUME-1485: - I'm ok with file level checksum. Transaction level might need considerable changes in the file format etc. > File Channel should support checksum > > > Key: FLUME-1485 > URL: https://issues.apache.org/jira/browse/FLUME-1485 > Project: Flume > Issue Type: New Feature > Components: Channel >Reporter: Hari Shreedharan > Fix For: v1.3.0 > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-1485) File Channel should support checksum
[ https://issues.apache.org/jira/browse/FLUME-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Shreedharan updated FLUME-1485: Issue Type: New Feature (was: Bug) Ideally, this will be implemented in such a way that the channel format would not change. So the checksum will either need to go at the end or in a different file. So the new code can check for the (optional) checksum and use it. If no checksum is found, alert of channel corruption. > File Channel should support checksum > > > Key: FLUME-1485 > URL: https://issues.apache.org/jira/browse/FLUME-1485 > Project: Flume > Issue Type: New Feature > Components: Channel >Reporter: Hari Shreedharan > Fix For: v1.3.0 > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1485) File Channel should support checksum
[ https://issues.apache.org/jira/browse/FLUME-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434809#comment-13434809 ] Denny Ye commented on FLUME-1485: - which level? each transaction or identified file? > File Channel should support checksum > > > Key: FLUME-1485 > URL: https://issues.apache.org/jira/browse/FLUME-1485 > Project: Flume > Issue Type: Bug > Components: Channel >Reporter: Hari Shreedharan > Fix For: v1.3.0 > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-1485) File Channel should support checksum
[ https://issues.apache.org/jira/browse/FLUME-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Shreedharan updated FLUME-1485: Component/s: Channel Fix Version/s: v1.3.0 > File Channel should support checksum > > > Key: FLUME-1485 > URL: https://issues.apache.org/jira/browse/FLUME-1485 > Project: Flume > Issue Type: Bug > Components: Channel >Reporter: Hari Shreedharan > Fix For: v1.3.0 > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (FLUME-1485) File Channel should support checksum
Hari Shreedharan created FLUME-1485: --- Summary: File Channel should support checksum Key: FLUME-1485 URL: https://issues.apache.org/jira/browse/FLUME-1485 Project: Flume Issue Type: Bug Reporter: Hari Shreedharan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Flume adopt message from existing local Scribe
> On Aug. 15, 2012, 4:01 a.m., Juhani Connolly wrote: > > Ok, I had a look over, and tested the new code and it seems fine. Other > > additions look good too > > > > It would have been nice to have the documentation as part of the review > > too, but this issue has been in review long enough, so I might open another > > ticket later to review the documentation and add more information on usage. Thanks Juhani, I also consider that it need another case to review document. Need your advice in usage of ScribeSource - Denny --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6089/#review10316 --- On Aug. 14, 2012, 6:53 a.m., Denny Ye wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6089/ > --- > > (Updated Aug. 14, 2012, 6:53 a.m.) > > > Review request for Flume and Hari Shreedharan. > > > Description > --- > > There may someone like me that want to replace central Scribe with Flume to > adopt existing ingest system, using smooth changes for application user. > Here is the ScribeSource put into legacy folder without deserializing. > > > This addresses bug https://issues.apache.org/jira/browse/FLUME-1382. > > https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/FLUME-1382 > > > Diffs > - > > trunk/flume-ng-dist/pom.xml 1370121 > trunk/flume-ng-sources/flume-scribe-source/pom.xml PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/LogEntry.java > PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/ResultCode.java > PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/Scribe.java > PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/ScribeSource.java > PRE-CREATION > trunk/flume-ng-sources/pom.xml PRE-CREATION > trunk/pom.xml 1370121 > > Diff: https://reviews.apache.org/r/6089/diff/ > > > Testing > --- > > I already used ScribeSource into local environment and tested in past week. > It can use the existing local Scribe interface > > > Thanks, > > Denny Ye > >
Re: Review Request: Flume adopt message from existing local Scribe
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6089/#review10316 --- Ship it! Ok, I had a look over, and tested the new code and it seems fine. Other additions look good too It would have been nice to have the documentation as part of the review too, but this issue has been in review long enough, so I might open another ticket later to review the documentation and add more information on usage. - Juhani Connolly On Aug. 14, 2012, 6:53 a.m., Denny Ye wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6089/ > --- > > (Updated Aug. 14, 2012, 6:53 a.m.) > > > Review request for Flume and Hari Shreedharan. > > > Description > --- > > There may someone like me that want to replace central Scribe with Flume to > adopt existing ingest system, using smooth changes for application user. > Here is the ScribeSource put into legacy folder without deserializing. > > > This addresses bug https://issues.apache.org/jira/browse/FLUME-1382. > > https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/FLUME-1382 > > > Diffs > - > > trunk/flume-ng-dist/pom.xml 1370121 > trunk/flume-ng-sources/flume-scribe-source/pom.xml PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/LogEntry.java > PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/ResultCode.java > PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/Scribe.java > PRE-CREATION > > trunk/flume-ng-sources/flume-scribe-source/src/main/java/org/apache/flume/source/scribe/ScribeSource.java > PRE-CREATION > trunk/flume-ng-sources/pom.xml PRE-CREATION > trunk/pom.xml 1370121 > > Diff: https://reviews.apache.org/r/6089/diff/ > > > Testing > --- > > I already used ScribeSource into local environment and tested in past week. > It can use the existing local Scribe interface > > > Thanks, > > Denny Ye > >
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel
Yongkun Wang, You're welcome! Very happy to hear your thoughts. Regards, Mike On Tue, Aug 14, 2012 at 8:03 PM, Wang, Yongkun | Yongkun | BDD < yongkun.w...@mail.rakuten.com> wrote: > Thanks Mike. > > This is really a nice reply based on the thorough understanding of my > proposal. > > I agree that it might be a potential design change. So I will carefully > evaluate it before submitting it to you guys to make the decision. > > Cheers, > Yongkun Wang > > On 12/08/13 9:17, "Mike Percy" wrote: > > >Hi, > >Due to design decisions made very early on in Flume NG - specifically the > >fact that Sink only has a simple process() method - I don't see a good way > >to get multiple sinks pulling from the same channel in a way that is > >backwards-compatible with the current implementation. > > > >Probably the "right" way to support this would be to have an interface > >where the SinkRunner (or something outside of each Sink) is in control of > >the transaction, and then it can easily send events to each sink serially > >or in parallel within a single transaction. I think that is basically what > >you are describing. If you look at SourceRunner and SourceProcessor you > >will see similar ideas to what you are describing but they are only > >implemented at the Source->Channel level. The current SinkProcessor is not > >an analog of SourceProcessor, but if it was then I think that's where this > >functionality might fit. However what happens when you do that is you have > >to handle a ton of failure cases and threading models in a very general > >way, which might be tough to get right for all use cases. I'm not 100% > >sure, but I think that's why this was not pursued at the time. > > > >To me, this seems like a potential design change (it would have to be very > >carefully thought out) to consider for a future major Flume code line > >(maybe a Flume 2.x). > > > >By the way, if one is trying to get maximum throughput, then duplicating > >events onto multiple channels, and having different threads running the > >sinks (the current design) will be faster and more resilient in general > >than a single thread and a single channel writing to multiple > >sinks/destinations. The multiple-channel design pattern will allow > >periodic > >downtimes or delays on a single sink to not affect the others, assuming > >the > >channel sizes are large enough for buffering during downtime and assuming > >that each sink is fast enough to recover from temporary delays. Without a > >dedicated buffer per destination, one is at the mercy of the slowest sink > >at every stage in the transaction. > > > >One last thing worth noting is that the current channels are all well > >ordered. This means that Flume currently provides a weak ordering > >guarantee > >(across a single hop). That is a helpful property in the context of > >testing > >and validation, as well as is what many people expect if they are storing > >logs on a single hop. I hope we don't backpedal on that weak ordering > >guarantee without a really good reason. > > > >Regards, > >Mike > > > >On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD < > >yongkun.w...@mail.rakuten.com> wrote: > > > >> Hi Jhhani, > >> > >> Yes, we can use two (or several) channels to fan out data to different > >> sinks. Then we will have two channels with same data, which may not be > >>an > >> optimized solution. So I want to use just ONE channel, creating a > >> processor to pull the data once from the channel, then distributing to > >> different sinks. > >> > >> Regards, > >> Yongkun Wang > >> > >> On 12/08/10 18:07, "Juhani Connolly" > >> wrote: > >> > >> >Hi Yongkun, > >> > > >> >I'm curious why you need to pull the data twice from the sink? Do you > >> >need all sinks to have read the same amount of data? Normally for the > >> >case of splitting data into batch and analytics, we will send data from > >> >the source to two separate channels and have the sinks read from > >> >separate channels. > >> > > >> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: > >> >> Hi Denny, > >> >> > >> >> I am working on the patch now, it's not difficult. I have listed the > >> >> changes in that JIRA. > >> >> I think you misunderstand my design, I didn't maintain the order of > >>the > >> >> events. Instead I make sure that each sink will get the same events > >>(or > >> >> different events specified by selector). > >> >> > >> >> Suppose Channel (mc) contains the following events: 4,3,2,1 > >> >> > >> >> If simply enable it by configuration, it may work like this: > >> >> Sink "hsa" may get 1,3; > >> >> Sink "hsb" may get 2,4; > >> >> So different sink will get different data. Is this what user wants? > >> >> > >> >> > >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a > >>typical > >> >> case when user want to fan-out the data into two places (eg. One for > >> >>batch > >> >> and and another for real-time analysis). > >> >> > >> >> Regards, > >> >> Yongkun Wang > >> >> > >> >
[jira] [Created] (FLUME-1484) Flume support throughput in Agent, Source, Sink level at JMX
Denny Ye created FLUME-1484: --- Summary: Flume support throughput in Agent, Source, Sink level at JMX Key: FLUME-1484 URL: https://issues.apache.org/jira/browse/FLUME-1484 Project: Flume Issue Type: Improvement Components: Node, Sinks+Sources Affects Versions: v1.2.0 Reporter: Denny Ye >From user's view of point, we would like to know the current throughput from >one of monitoring tools. WebUI is best, of course. JMX is simple way to >implement throughput monitoring. Agent should have input and output throughput based on several Sources and Sinks. Here is just simple code in my environment to monitoring throughput of Source. {code:title=ThroughputCounter.java|borderStyle=solid} import java.util.concurrent.atomic.AtomicInteger; import org.apache.flume.instrumentation.SourceCounter; public class ThroughputCounter { private volatile boolean isRunning; private AtomicInteger cache = new AtomicInteger(); SourceCounter sourceCounter; public ThroughputCounter(SourceCounter sourceCounter) { this.sourceCounter = sourceCounter; } public void start() { isRunning = true; Counter counter = new Counter(); counter.start(); } public void stop() { isRunning = false; } public void addWriteBytes(int bytes) { cache.getAndAdd(bytes); } private class Counter extends Thread { Counter() { super("ThroughputCounterThread"); } public void run() { while (isRunning) { try { Thread.sleep(1000); sourceCounter.incrementSourceThroughput( cache.getAndSet(0)); } catch (Exception e) { e.printStackTrace(); } } } } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FLUME-1435) Proposal of Transactional Multiplex (fan out) Sink
[ https://issues.apache.org/jira/browse/FLUME-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434783#comment-13434783 ] Yongkun Wang commented on FLUME-1435: - Nice comment from Mike. On 12/08/13 9:17, "Mike Percy" wrote: Hi, Due to design decisions made very early on in Flume NG - specifically the fact that Sink only has a simple process() method - I don't see a good way to get multiple sinks pulling from the same channel in a way that is backwards-compatible with the current implementation. Probably the "right" way to support this would be to have an interface where the SinkRunner (or something outside of each Sink) is in control of the transaction, and then it can easily send events to each sink serially or in parallel within a single transaction. I think that is basically what you are describing. If you look at SourceRunner and SourceProcessor you will see similar ideas to what you are describing but they are only implemented at the Source->Channel level. The current SinkProcessor is not an analog of SourceProcessor, but if it was then I think that's where this functionality might fit. However what happens when you do that is you have to handle a ton of failure cases and threading models in a very general way, which might be tough to get right for all use cases. I'm not 100% sure, but I think that's why this was not pursued at the time. To me, this seems like a potential design change (it would have to be very carefully thought out) to consider for a future major Flume code line (maybe a Flume 2.x). By the way, if one is trying to get maximum throughput, then duplicating events onto multiple channels, and having different threads running the sinks (the current design) will be faster and more resilient in general than a single thread and a single channel writing to multiple sinks/destinations. The multiple-channel design pattern will allow periodic downtimes or delays on a single sink to not affect the others, assuming the channel sizes are large enough for buffering during downtime and assuming that each sink is fast enough to recover from temporary delays. Without a dedicated buffer per destination, one is at the mercy of the slowest sink at every stage in the transaction. One last thing worth noting is that the current channels are all well ordered. This means that Flume currently provides a weak ordering guarantee (across a single hop). That is a helpful property in the context of testing and validation, as well as is what many people expect if they are storing logs on a single hop. I hope we don't backpedal on that weak ordering guarantee without a really good reason. Regards, Mike > Proposal of Transactional Multiplex (fan out) Sink > -- > > Key: FLUME-1435 > URL: https://issues.apache.org/jira/browse/FLUME-1435 > Project: Flume > Issue Type: New Feature > Components: Channel, Sinks+Sources >Affects Versions: v1.2.0 >Reporter: Yongkun Wang >Assignee: Yongkun Wang > Labels: features > > Hi, > I have proposed this design by email several weeks ago. I received comment > from Brock. I guess your guys are very busy, so I think I'd better create > this JIRA, and put slides and patch here to explain it more clearly. > Regards, > Yongkun > Following is the design from previous email, I will attach slides later. > From: "Wang, Yongkun" > Date: Wed, 25 Jul 2012 10:32:31 GMT > To: "dev@flume.apache.org" > Cc: "u...@flume.apache.org" > Subject: Transactional Multiplex (fan out) Sink > Hi, > In our system, we need to fan out the aggregated flow to several > destinations. Usually the flow to each destination is identical. > There is a nice feature of NG, the "multiplexing flow", which can satisfy our > requirements. It is implemented by using separated channels, which is easy to > do transaction control. > But in our case, the fan out is replicating in most cases. If using the > current "Replicating to Channels" configuration, we will get several > identical channels on the same host, which may consume a large amount > resources (memory, disk, etc.). The performance may possibly drop. And the > events to each destination may not be synchronized. > I read NG source, I think I could move the multiplex from Channel to Sink, > that is, using single Channel, fan out to different Sinks, which may solve > the problems (resource usage, performance, event synchronization) of multiple > Channels. > I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, > but they cannot be used to achieve the target of replicating events from one > Channel to different Sinks. > The following is an optional implementation of the Transactional Multiplex > (or fan out) Sink: > 1. Add a Transactional Multiplex Sink Processor, which will
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel
Thanks Mike. This is really a nice reply based on the thorough understanding of my proposal. I agree that it might be a potential design change. So I will carefully evaluate it before submitting it to you guys to make the decision. Cheers, Yongkun Wang On 12/08/13 9:17, "Mike Percy" wrote: >Hi, >Due to design decisions made very early on in Flume NG - specifically the >fact that Sink only has a simple process() method - I don't see a good way >to get multiple sinks pulling from the same channel in a way that is >backwards-compatible with the current implementation. > >Probably the "right" way to support this would be to have an interface >where the SinkRunner (or something outside of each Sink) is in control of >the transaction, and then it can easily send events to each sink serially >or in parallel within a single transaction. I think that is basically what >you are describing. If you look at SourceRunner and SourceProcessor you >will see similar ideas to what you are describing but they are only >implemented at the Source->Channel level. The current SinkProcessor is not >an analog of SourceProcessor, but if it was then I think that's where this >functionality might fit. However what happens when you do that is you have >to handle a ton of failure cases and threading models in a very general >way, which might be tough to get right for all use cases. I'm not 100% >sure, but I think that's why this was not pursued at the time. > >To me, this seems like a potential design change (it would have to be very >carefully thought out) to consider for a future major Flume code line >(maybe a Flume 2.x). > >By the way, if one is trying to get maximum throughput, then duplicating >events onto multiple channels, and having different threads running the >sinks (the current design) will be faster and more resilient in general >than a single thread and a single channel writing to multiple >sinks/destinations. The multiple-channel design pattern will allow >periodic >downtimes or delays on a single sink to not affect the others, assuming >the >channel sizes are large enough for buffering during downtime and assuming >that each sink is fast enough to recover from temporary delays. Without a >dedicated buffer per destination, one is at the mercy of the slowest sink >at every stage in the transaction. > >One last thing worth noting is that the current channels are all well >ordered. This means that Flume currently provides a weak ordering >guarantee >(across a single hop). That is a helpful property in the context of >testing >and validation, as well as is what many people expect if they are storing >logs on a single hop. I hope we don't backpedal on that weak ordering >guarantee without a really good reason. > >Regards, >Mike > >On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD < >yongkun.w...@mail.rakuten.com> wrote: > >> Hi Jhhani, >> >> Yes, we can use two (or several) channels to fan out data to different >> sinks. Then we will have two channels with same data, which may not be >>an >> optimized solution. So I want to use just ONE channel, creating a >> processor to pull the data once from the channel, then distributing to >> different sinks. >> >> Regards, >> Yongkun Wang >> >> On 12/08/10 18:07, "Juhani Connolly" >> wrote: >> >> >Hi Yongkun, >> > >> >I'm curious why you need to pull the data twice from the sink? Do you >> >need all sinks to have read the same amount of data? Normally for the >> >case of splitting data into batch and analytics, we will send data from >> >the source to two separate channels and have the sinks read from >> >separate channels. >> > >> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: >> >> Hi Denny, >> >> >> >> I am working on the patch now, it's not difficult. I have listed the >> >> changes in that JIRA. >> >> I think you misunderstand my design, I didn't maintain the order of >>the >> >> events. Instead I make sure that each sink will get the same events >>(or >> >> different events specified by selector). >> >> >> >> Suppose Channel (mc) contains the following events: 4,3,2,1 >> >> >> >> If simply enable it by configuration, it may work like this: >> >> Sink "hsa" may get 1,3; >> >> Sink "hsb" may get 2,4; >> >> So different sink will get different data. Is this what user wants? >> >> >> >> >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a >>typical >> >> case when user want to fan-out the data into two places (eg. One for >> >>batch >> >> and and another for real-time analysis). >> >> >> >> Regards, >> >> Yongkun Wang >> >> >> >> >> >> On 12/08/10 14:29, "Denny Ye" wrote: >> >> >> >>> hi Yongkun, >> >>> >> >>>JIRA can be accessed now. >> >>> >> >>>I think it might be difficult to understand the order of events >>from >> >>> your thought. If we don't care about the order, can discuss the >>value >> >>>and >> >>> feasibility. In my opinion, data ingest flow is order unawareness, >>at >> >>> least, not such important for us.
Flume builds back online
The flume builds were previously disabled due to repository change. Updating the configuration and restricting it to the nodes that have Git support seems to have worked: https://builds.apache.org/job/flume-trunk/281/ I also took the liberty to enabling email notifications but in order to minimize the overall mails generated reduced the frequency to daily instead of the previous hourly frequency. Regards, Arvind Prabhakar
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ --- (Updated Aug. 14, 2012, 10:02 p.m.) Review request for Flume. Changes --- Updating description. Description (updated) --- This patch adds a spooling directory based source. The idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested, the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. This helps fill the gap for people who want a way to get reliable delivery of events into flume, but don't want to directly write their application against the flume API. They can simply drop log files off in a spooldir and let flume ingest asynchronously (using some shell scripts or other automated process). Unlike the prior iteration, this patch implements a first-class source. It also extends the avro client to support spooling in a similar manner. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs - flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc Diff: https://reviews.apache.org/r/6377/diff/ Testing --- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
[jira] [Updated] (FLUME-1425) Create a SpoolDirectory Source and Client
[ https://issues.apache.org/jira/browse/FLUME-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated FLUME-1425: --- Attachment: FLUME-1425.v5.patch.txt This patch is ready for review. It creates both a new source (SpoolDirectorySource) and adds a spooling directory capability to the existing avro client. Includes extensive unit tests - probably best to start with those. > Create a SpoolDirectory Source and Client > - > > Key: FLUME-1425 > URL: https://issues.apache.org/jira/browse/FLUME-1425 > Project: Flume > Issue Type: Improvement >Reporter: Patrick Wendell >Assignee: Patrick Wendell > Attachments: FLUME-1425.avro-conf-file.txt, FLUME-1425.patch.v1.txt, > FLUME-1425.v5.patch.txt > > > The proposal is to create a small executable client which reads logs from a > spooling directory and sends them to a flume sink, then performs cleanup on > the directory (either by deleting or moving the logs). It would make the > following assumptions > - Files placed in the directory are uniquely named > - Files placed in the directory are immutable > The problem this is trying to solve is that there is currently no way to do > guaranteed event delivery across flume agent restarts when the data is being > collected through an asynchronous source (and not directly from the client > API). Say, for instance, you are using a exec("tail -F") source. If the agent > restarts due to error or intentionally, tail may pick up at a new location > and you lose the intermediate data. > At the same time, there are users who want at-least-once semantics, and > expect those to apply as soon as the data is written to disk from the initial > logger process (e.g. apache logs), not just once it has reached a flume > agent. This idea would bridge that gap, assuming the user is able to copy > immutable logs to a spooling directory through a cron script or something. > The basic internal logic of such a client would be as follows: > - Scan the directory for files > - Chose a file and read through, while sending events to an agent > - Close the file and delete it (or rename, or otherwise mark completed) > That's about it. We could add sync-points to make recovery more efficient in > the case of failure. > A key question is whether this should be implemented as a standalone client > or as a source. My instinct is actually to do this as a source, but there > could be some benefit to not requiring an entire agent in order to run this, > specifically that it would become platform independent and you could stick it > on Windows machines. Others I have talked to have also sided on a standalone > executable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/ --- (Updated Aug. 14, 2012, 9:54 p.m.) Review request for Flume. Changes --- Updating title to change scope and adding whitespace fixes. Summary (updated) - FLUME-1425: Create a SpoolDirectory Source and Client Description --- This patch extends the existing avro client to support a file-based spooling mechanism. See in-line documentation for precise details, but the basic idea is that a user can have a spool directory where files are deposited for ingestion into flume. Once ingested the files are clearly renamed and the implementation guarantees at-least-once delivery semantics similar to those achieved within flume itself, even across failures and restarts of the JVM running the code. I feel vaguely uneasy about building this as part of the standlone avro client rather than as its own source. An alternative would be to build this as a proper source (in fact, there are some ad-hoc transaction semantics used here which would really be a better fit for a source). Interested in hearing feedback on that as well. The benefit of having this in the avro client is that you don't need the flume runner scripts which are not windows compatible. This addresses bug FlUME-1425. https://issues.apache.org/jira/browse/FlUME-1425 Diffs (updated) - flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java da804d7 flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java abbbf1c flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 4a5ecae flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java PRE-CREATION flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java PRE-CREATION flume-ng-doc/sphinx/FlumeUserGuide.rst 45dd7cc Diff: https://reviews.apache.org/r/6377/diff/ Testing --- Extensive unit tests and I also built and played with this using a stub flume agent. If you look at the JIRA I have a configuration file for an agent that will print out Avro events to the command line - that's helpful when testing this. Thanks, Patrick Wendell
Re: Transforming 1 event to n events
Hi Mike, I think I'm still blocked on this or I'll have to move the splitting of the data up to the source which I know will work for sure. I've just been trying to avoid it because I didn't want to deploy this to all of the web servers. I'm looking into the EventSerializer and I don't think it's going to work for me either. All of the examples I've seen so far write data to an output stream that seems to be the raw data file. It looks like append is only called once per event. This prevents me from writing multiple events as separate records in the squencefile on HDFS. https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java#L72 Am I off base here? J On Mon, Aug 13, 2012 at 8:59 PM, Mike Percy wrote: > On Mon, Aug 13, 2012 at 3:34 PM, Jeremy Custenborder < > jcustenbor...@gmail.com> wrote: > >> I need to have the multiple objects available to >> hive. The upstream object is actually a protobuf with hierarchy. I was >> planning on flattening the object for hive. Here is an example of what >> I'm collecting. The actual protobuf has many more fields, but this >> gives you an idea. >> >> requestid >> page >> timestamp >> useragent >> impressions =[12345, 43212,12344,12345,43122, etc] >> >> transforming for each impression. >> >> requestid >> page >> timestamp >> useragent >> index >> objectid >> >> This gives me one row in hive per impression. This might be a little >> more contextual. I picked the earlier example because I didn't want to >> get caught up in my use case. I could move this code to serializers >> buy I need to do similar logic twice since I'm incrementing a counter >> in hbase per impression and adding a row per impression in hdfs(hive). >> If I transformed the event to multiple events earlier in the pipe. I >> would only have to write code to generate keys per event. At this >> point I'm going to implement two serializers. One to handle hdfs and >> one for hbase. >> > > Hi Jeremy, > > Thanks for the extra color. It's an interesting flow. As more people > continue to adopt Flume, I think we'll start to see patterns where the > design or implementation of Flume is lacking and we can work towards > bridging those gaps, and your use case provides valuable data on that. As > for where we are now, I'm happy to hear that you have found a way forward. > > If you can keep us apprised as things progress with your Flume deployment I > would love to hear about it! > > Regards, > Mike
WriteableEvent and WriteableEventKey
In flume-og there was a sink called seqfile which wrote to HDFS using a key of WriteableEvent and WriteableEventKey. Is there an equivalent in flume-ng? I currently have a few terabytes of data stored in this format. I'd prefer to not have to move all of it to a different format. Thanks! j
[jira] [Updated] (FLUME-1425) Create a SpoolDirectory Source and Client
[ https://issues.apache.org/jira/browse/FLUME-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated FLUME-1425: --- Summary: Create a SpoolDirectory Source and Client (was: Create Standalone Spooling Client) > Create a SpoolDirectory Source and Client > - > > Key: FLUME-1425 > URL: https://issues.apache.org/jira/browse/FLUME-1425 > Project: Flume > Issue Type: Improvement >Reporter: Patrick Wendell >Assignee: Patrick Wendell > Attachments: FLUME-1425.avro-conf-file.txt, FLUME-1425.patch.v1.txt > > > The proposal is to create a small executable client which reads logs from a > spooling directory and sends them to a flume sink, then performs cleanup on > the directory (either by deleting or moving the logs). It would make the > following assumptions > - Files placed in the directory are uniquely named > - Files placed in the directory are immutable > The problem this is trying to solve is that there is currently no way to do > guaranteed event delivery across flume agent restarts when the data is being > collected through an asynchronous source (and not directly from the client > API). Say, for instance, you are using a exec("tail -F") source. If the agent > restarts due to error or intentionally, tail may pick up at a new location > and you lose the intermediate data. > At the same time, there are users who want at-least-once semantics, and > expect those to apply as soon as the data is written to disk from the initial > logger process (e.g. apache logs), not just once it has reached a flume > agent. This idea would bridge that gap, assuming the user is able to copy > immutable logs to a spooling directory through a cron script or something. > The basic internal logic of such a client would be as follows: > - Scan the directory for files > - Chose a file and read through, while sending events to an agent > - Close the file and delete it (or rename, or otherwise mark completed) > That's about it. We could add sync-points to make recovery more efficient in > the case of failure. > A key question is whether this should be implemented as a standalone client > or as a source. My instinct is actually to do this as a source, but there > could be some benefit to not requiring an entire agent in order to run this, > specifically that it would become platform independent and you could stick it > on Windows machines. Others I have talked to have also sided on a standalone > executable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Annotating interface scope/stability in Flume
We definitely need something like this, given that extensability is the expectation in Flume (not the exception). It's likely that the flume developer community will overlap with Hadoop developer community, so using a subset of the Hadoop annotations makes sense. @Private and @Public seem like a good fit for what we need right now. - Patrick On Tue, Aug 14, 2012 at 10:29 AM, Ralph Goers wrote: > > On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > >> Something to consider: Eclipse uses the string 'internal' in their package >> path to denote packages (public or otherwise) that are not intended to be >> used as a public API: >> >> "All packages that are part of the platform implementation but contain no >> API that should be exposed to ISVs are considered internal implementation >> packages. All implementation packages should be flagged as internal, with >> the tag occurring just after the major package name. ISVs will be told that >> all packages marked internal are out of bounds. (A simple text search for >> ".internal." detects suspicious reference in source files; likewise, >> "/internal/" is suspicious in .class files). " > > FWIW, I also think this is a good idea. > > >> [Ref: >> http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages] >> >> Here are some additional links on evolving Java APIs: >> http://wiki.eclipse.org/Evolving_Java-based_APIs >> http://wiki.eclipse.org/Evolving_Java-based_APIs_2 >> http://wiki.eclipse.org/Evolving_Java-based_APIs_3 >> Flume configuration is actually pretty brittle >> FLUME-1051 might address this concern. >> > I'm not sure how. That issues seems more about guaranteeing validity than > making the configuration provider more flexible. > > > Ralph >
Re: Review Request: HDFS file handle not closed properly when date bucketing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6578/ --- (Updated Aug. 14, 2012, 6:39 p.m.) Review request for Flume. Changes --- Address Brock's two comments. Description --- This resolves Flume-1350 which describes a problem of Flume where HDFS file handle does not close properly when date bucketing. The fix adds a map between the sink's path and its real path. Whenever a new real path is generated due to data bucketing, it closes the bucketwriter associated with existing real path and update the link between the sink's path and its real path. This addresses bug Flume-1350. https://issues.apache.org/jira/browse/Flume-1350 Diffs (updated) - flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java fcb9642 Diff: https://reviews.apache.org/r/6578/diff/ Testing --- The fix has been manually tested. Thanks, Yongcheng Li
Re: [DISCUSS] Annotating interface scope/stability in Flume
On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > Something to consider: Eclipse uses the string 'internal' in their package > path to denote packages (public or otherwise) that are not intended to be > used as a public API: > > "All packages that are part of the platform implementation but contain no > API that should be exposed to ISVs are considered internal implementation > packages. All implementation packages should be flagged as internal, with > the tag occurring just after the major package name. ISVs will be told that > all packages marked internal are out of bounds. (A simple text search for > ".internal." detects suspicious reference in source files; likewise, > "/internal/" is suspicious in .class files). " FWIW, I also think this is a good idea. > [Ref: > http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages] > > Here are some additional links on evolving Java APIs: > http://wiki.eclipse.org/Evolving_Java-based_APIs > http://wiki.eclipse.org/Evolving_Java-based_APIs_2 > http://wiki.eclipse.org/Evolving_Java-based_APIs_3 > >>> Flume configuration is actually pretty brittle > FLUME-1051 might address this concern. > I'm not sure how. That issues seems more about guaranteeing validity than making the configuration provider more flexible. Ralph
Re: [DISCUSS] Annotating interface scope/stability in Flume
On Aug 14, 2012, at 9:47 AM, Will McQueen wrote: > > >>> if you want to create a configuration using XML, JSON > Flume is currently hardcoded to read from a Java Properties file. Please > see Application.run(): > AbstractFileConfigurationProvider configurationProvider = new > PropertiesFileConfigurationProvider(); > Yes, I know. That is only part of it. FlumeConfiguration also expects properties. PropertiesFileConfigurationProvider cannot be extended, even though it sort of looks like it can. All the useful methods are private. It turns out I can't really leverage the way Flume startup works anyway. When the configuration changes it basically does a restart. That works fine in a standalone case since clients will fail over to another agent until reconfiguration is complete. But an embedded agent can't shutdown like that so I'm pretty much having to roll my own. Ralph
Re: [DISCUSS] Annotating interface scope/stability in Flume
On Aug 14, 2012, at 2:38 AM, Mike Percy wrote: > Right now, I feel we would get most of the bang for the buck simply by > adding two annotations: @Public and @Internal, which to me means "you can > subclass or instantiate this directly", or "you can't directly use this if > you expect future compatibility". Why not keep the same nomenclature as in Hadoop? It will help keeping some sort of commonality in the ecosystem? Arun
Re: [DISCUSS] Annotating interface scope/stability in Flume
Something to consider: Eclipse uses the string 'internal' in their package path to denote packages (public or otherwise) that are not intended to be used as a public API: "All packages that are part of the platform implementation but contain no API that should be exposed to ISVs are considered internal implementation packages. All implementation packages should be flagged as internal, with the tag occurring just after the major package name. ISVs will be told that all packages marked internal are out of bounds. (A simple text search for ".internal." detects suspicious reference in source files; likewise, "/internal/" is suspicious in .class files). " [Ref: http://wiki.eclipse.org/Naming_Conventions#Internal_Implementation_Packages] Here are some additional links on evolving Java APIs: http://wiki.eclipse.org/Evolving_Java-based_APIs http://wiki.eclipse.org/Evolving_Java-based_APIs_2 http://wiki.eclipse.org/Evolving_Java-based_APIs_3 >> Flume configuration is actually pretty brittle FLUME-1051 might address this concern. >>if you want to create a configuration using XML, JSON Flume is currently hardcoded to read from a Java Properties file. Please see Application.run(): AbstractFileConfigurationProvider configurationProvider = new PropertiesFileConfigurationProvider(); Cheers, Will On Tue, Aug 14, 2012 at 9:14 AM, Ralph Goers wrote: > I have a slightly different take on this. > > I've been trying to embed Flume within the Log4j 2 client and have found > that things that look like they are extendable actually aren't and that > Flume configuration is actually pretty brittle (the properties don't seem > to be isolated to one place but percolate through several), such that if > you want to create a configuration using XML, JSON or anything else you are > forced to convert that into the property syntax. I've also noticed that > the Sources, Sinks and Channels are all defined in enums - another brittle > construct that puts third party add-ons at a disadvantage from stuff > packaged with Flume. > > I tackled a lot of these issues in Log4j 2 and came up with different > solutions for them. I used annotations, but not specifically to mark > things as public or private but to identify "plugins". For example, > Sources, Sinks and Channels could all be annotated and be made available by > a short name specified on the annotation which would get rid of the need > for the enums. Of course, these also identify components that can be used > as models for developers and users to emulate to add their own components. > > While addition annotations as guidance to programmers is OK, you can > accomplish the same thing just by writing good Javadoc. I'm more a fan of > using annotations for things that are a bit more useful. > > Ralph > > > On Aug 14, 2012, at 2:38 AM, Mike Percy wrote: > > > It seems we have reached a point in some of the Flume components where we > > want to add features but that means adding new interfaces to maintain > > backwards compatibility. While this is natural, the more we do it, and > the > > more we cast from interface to interface, the less readable and > consistent > > the codebase becomes. Also, we have exposed much of our code as public > > class + public method, even when APIs may not be intended as stable > > extension points, for testing or other reasons. A few years ago, Hadoop > > faced this problem and ended up implementing annotations to document APIs > > as @Stable/@Evolving, @Public/@Limited/@Private. See < > > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on > that. > > > > I would like to propose the adoption of a similar mechanism in Flume, in > > order to give us more wiggle room in the future for evolutionary > > development. Thoughts? > > > > Right now, I feel we would get most of the bang for the buck simply by > > adding two annotations: @Public and @Internal, which to me means "you can > > subclass or instantiate this directly", or "you can't directly use this > if > > you expect future compatibility". > > > > Regards, > > Mike > >
Re: [DISCUSS] Annotating interface scope/stability in Flume
I have a slightly different take on this. I've been trying to embed Flume within the Log4j 2 client and have found that things that look like they are extendable actually aren't and that Flume configuration is actually pretty brittle (the properties don't seem to be isolated to one place but percolate through several), such that if you want to create a configuration using XML, JSON or anything else you are forced to convert that into the property syntax. I've also noticed that the Sources, Sinks and Channels are all defined in enums - another brittle construct that puts third party add-ons at a disadvantage from stuff packaged with Flume. I tackled a lot of these issues in Log4j 2 and came up with different solutions for them. I used annotations, but not specifically to mark things as public or private but to identify "plugins". For example, Sources, Sinks and Channels could all be annotated and be made available by a short name specified on the annotation which would get rid of the need for the enums. Of course, these also identify components that can be used as models for developers and users to emulate to add their own components. While addition annotations as guidance to programmers is OK, you can accomplish the same thing just by writing good Javadoc. I'm more a fan of using annotations for things that are a bit more useful. Ralph On Aug 14, 2012, at 2:38 AM, Mike Percy wrote: > It seems we have reached a point in some of the Flume components where we > want to add features but that means adding new interfaces to maintain > backwards compatibility. While this is natural, the more we do it, and the > more we cast from interface to interface, the less readable and consistent > the codebase becomes. Also, we have exposed much of our code as public > class + public method, even when APIs may not be intended as stable > extension points, for testing or other reasons. A few years ago, Hadoop > faced this problem and ended up implementing annotations to document APIs > as @Stable/@Evolving, @Public/@Limited/@Private. See < > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on that. > > I would like to propose the adoption of a similar mechanism in Flume, in > order to give us more wiggle room in the future for evolutionary > development. Thoughts? > > Right now, I feel we would get most of the bang for the buck simply by > adding two annotations: @Public and @Internal, which to me means "you can > subclass or instantiate this directly", or "you can't directly use this if > you expect future compatibility". > > Regards, > Mike
[jira] [Resolved] (FLUME-1420) Exception should be thrown if we cannot instaniate an EventSerializer
[ https://issues.apache.org/jira/browse/FLUME-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved FLUME-1420. - Resolution: Fixed Fix Version/s: v1.3.0 Committed here https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=e01403004af5aa5964f1df3312549c7bb9e2c686 Thank you for your contribution Patrick! > Exception should be thrown if we cannot instaniate an EventSerializer > - > > Key: FLUME-1420 > URL: https://issues.apache.org/jira/browse/FLUME-1420 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources >Affects Versions: v1.2.0 >Reporter: Brock Noland >Assignee: Patrick Wendell > Fix For: v1.3.0 > > Attachments: FLUME-1420.patch.v1.txt, FLUME-1420.patch.v2.txt > > > Currently EventSerailizerFactory returns null if it cannot instantiate the > class. Then the caller NPEs because they don't expect null. If we cannot > satisfy the caller we should throw an exception > {noformat} > 2012-08-02 16:38:26,489 ERROR serialization.EventSerializerFactory: Unable to > instantiate Builder from > org.apache.flume.serialization.BodyTextEventSerializer > 2012-08-02 16:38:26,490 WARN hdfs.HDFSEventSink: HDFS IO error > java.io.IOException: java.lang.NullPointerException > at > org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:202) > at > org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48) > at > org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155) > at > org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152) > at > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125) > at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152) > at > org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307) > at > org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717) > at > org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.NullPointerException > at > org.apache.flume.sink.hdfs.HDFSDataStream.open(HDFSDataStream.java:75) > at > org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:188) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] Annotating interface scope/stability in Flume
I'm also in favour of that idea. Jarcec On Tue, Aug 14, 2012 at 11:40:17AM +0100, Brock Noland wrote: > Hi, > > I think this makes sense. Can we clearly state and define a contract for > Public and Internal? > > Brock > > On Tue, Aug 14, 2012 at 10:38 AM, Mike Percy wrote: > > > It seems we have reached a point in some of the Flume components where we > > want to add features but that means adding new interfaces to maintain > > backwards compatibility. While this is natural, the more we do it, and the > > more we cast from interface to interface, the less readable and consistent > > the codebase becomes. Also, we have exposed much of our code as public > > class + public method, even when APIs may not be intended as stable > > extension points, for testing or other reasons. A few years ago, Hadoop > > faced this problem and ended up implementing annotations to document APIs > > as @Stable/@Evolving, @Public/@Limited/@Private. See < > > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on > > that. > > > > I would like to propose the adoption of a similar mechanism in Flume, in > > order to give us more wiggle room in the future for evolutionary > > development. Thoughts? > > > > Right now, I feel we would get most of the bang for the buck simply by > > adding two annotations: @Public and @Internal, which to me means "you can > > subclass or instantiate this directly", or "you can't directly use this if > > you expect future compatibility". > > > > Regards, > > Mike > > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ signature.asc Description: Digital signature
Re: [DISCUSS] Annotating interface scope/stability in Flume
Hi, I think this makes sense. Can we clearly state and define a contract for Public and Internal? Brock On Tue, Aug 14, 2012 at 10:38 AM, Mike Percy wrote: > It seems we have reached a point in some of the Flume components where we > want to add features but that means adding new interfaces to maintain > backwards compatibility. While this is natural, the more we do it, and the > more we cast from interface to interface, the less readable and consistent > the codebase becomes. Also, we have exposed much of our code as public > class + public method, even when APIs may not be intended as stable > extension points, for testing or other reasons. A few years ago, Hadoop > faced this problem and ended up implementing annotations to document APIs > as @Stable/@Evolving, @Public/@Limited/@Private. See < > https://issues.apache.org/jira/browse/HADOOP-5073> for the history on > that. > > I would like to propose the adoption of a similar mechanism in Flume, in > order to give us more wiggle room in the future for evolutionary > development. Thoughts? > > Right now, I feel we would get most of the bang for the buck simply by > adding two annotations: @Public and @Internal, which to me means "you can > subclass or instantiate this directly", or "you can't directly use this if > you expect future compatibility". > > Regards, > Mike > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
[DISCUSS] Annotating interface scope/stability in Flume
It seems we have reached a point in some of the Flume components where we want to add features but that means adding new interfaces to maintain backwards compatibility. While this is natural, the more we do it, and the more we cast from interface to interface, the less readable and consistent the codebase becomes. Also, we have exposed much of our code as public class + public method, even when APIs may not be intended as stable extension points, for testing or other reasons. A few years ago, Hadoop faced this problem and ended up implementing annotations to document APIs as @Stable/@Evolving, @Public/@Limited/@Private. See < https://issues.apache.org/jira/browse/HADOOP-5073> for the history on that. I would like to propose the adoption of a similar mechanism in Flume, in order to give us more wiggle room in the future for evolutionary development. Thoughts? Right now, I feel we would get most of the bang for the buck simply by adding two annotations: @Public and @Internal, which to me means "you can subclass or instantiate this directly", or "you can't directly use this if you expect future compatibility". Regards, Mike
[jira] [Updated] (FLUME-1479) Multiple Sinks can connect to single Channel
[ https://issues.apache.org/jira/browse/FLUME-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denny Ye updated FLUME-1479: Attachment: FLUME-1479.patch > Multiple Sinks can connect to single Channel > > > Key: FLUME-1479 > URL: https://issues.apache.org/jira/browse/FLUME-1479 > Project: Flume > Issue Type: Bug > Components: Configuration >Affects Versions: v1.2.0 >Reporter: Denny Ye >Assignee: Denny Ye > Fix For: v1.3.0 > > Attachments: FLUME-1479.patch > > > If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be > connected with each other with configuration example > {quote} > agent.sinks.hsa.channel = mc > agent.sinks.hsb.channel = mc > {quote} > It means that there have multiple Sinks can connect to single Channel. > Normally, one Sink only can connect to unified Channel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Moved] (FLUME-1483) Implement command "show framework" end to end
Too bad. Thanks for the info Jarcec! Regards, Mike On Mon, Aug 13, 2012 at 11:49 PM, Jarek Jarcec Cecho wrote: > The manual has mentioned two different ways how to fix this issue (move > issue around, run integrity check). Neither of them actually worked :-/ > > I'm afraid that we will need to delete the problematic JIRAs and recreate > them manually. > > Jarcec > > On Mon, Aug 13, 2012 at 10:50:50PM -0700, Mike Percy wrote: > > Hi Jarcec, > > Seems moving it between projects did not work? > > > > Regards, > > Mike > > > > On Mon, Aug 13, 2012 at 1:33 AM, Jarek Jarcec Cecho >wrote: > > > > > I've moved it to flume so that I can move it back to sqoop to solve > > > missing Workflow actions: > > > > > > > > > > https://confluence.atlassian.com/display/JIRAKB/Missing+Available+Workflow+Actions > > > > > > Apparently I do not have enough privileges to move it back :-) Might I > ask > > > for higher JIRA permissions? > > > > > > Jarcec > > > > > > On Mon, Aug 13, 2012 at 07:30:37PM +1100, Jarek Jarcec Cecho (JIRA) > wrote: > > > > > > > > [ > > > > https://issues.apache.org/jira/browse/FLUME-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > > > > > > > Jarek Jarcec Cecho moved SQOOP-569 to FLUME-1483: > > > > - > > > > > > > > Release Note: > > > > (Move ticket around to solve workflow issues on JIRA side: > > > > > > > > > > > > https://confluence.atlassian.com/display/JIRAKB/Missing+Available+Workflow+Actions > > > > Key: FLUME-1483 (was: SQOOP-569) > > > > Project: Flume (was: Sqoop) > > > > > > > > > Implement command "show framework" end to end > > > > > - > > > > > > > > > > Key: FLUME-1483 > > > > > URL: > https://issues.apache.org/jira/browse/FLUME-1483 > > > > > Project: Flume > > > > > Issue Type: Task > > > > >Reporter: Jarek Jarcec Cecho > > > > >Assignee: Jarek Jarcec Cecho > > > > >Priority: Minor > > > > > > > > > > Implement command "show framework" to show framework metadata from > end > > > to end. > > > > > > > > -- > > > > This message is automatically generated by JIRA. > > > > If you think it was sent incorrectly, please contact your JIRA > > > administrators: > > > > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > > > > For more information on JIRA, see: > > > http://www.atlassian.com/software/jira > > > > > > > > > > > >
[jira] [Updated] (FLUME-1382) Flume adopt message from existing local Scribe
[ https://issues.apache.org/jira/browse/FLUME-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denny Ye updated FLUME-1382: Attachment: (was: FLUME-1382-doc.patch) > Flume adopt message from existing local Scribe > -- > > Key: FLUME-1382 > URL: https://issues.apache.org/jira/browse/FLUME-1382 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.2.0 >Reporter: Denny Ye >Assignee: Denny Ye >Priority: Minor > Labels: scribe, thrift > Fix For: v1.3.0 > > Attachments: FLUME-1382-3.patch, FLUME-1382-doc-2.patch > > > Currently, we are using Scribe in data ingest system. Central Scribe is hard > to maintain and upgrade. Thus, we would like to replace central Scribe with > Flume and adopt message from existing and amounts of local Scribe. This can > be treated as legacy part. > We have generated ScribeSource and used with more effective Thrift code > without deserializing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-1382) Flume adopt message from existing local Scribe
[ https://issues.apache.org/jira/browse/FLUME-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denny Ye updated FLUME-1382: Attachment: FLUME-1382-3.patch FLUME-1382-doc-2.patch > Flume adopt message from existing local Scribe > -- > > Key: FLUME-1382 > URL: https://issues.apache.org/jira/browse/FLUME-1382 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.2.0 >Reporter: Denny Ye >Assignee: Denny Ye >Priority: Minor > Labels: scribe, thrift > Fix For: v1.3.0 > > Attachments: FLUME-1382-3.patch, FLUME-1382-doc-2.patch > > > Currently, we are using Scribe in data ingest system. Central Scribe is hard > to maintain and upgrade. Thus, we would like to replace central Scribe with > Flume and adopt message from existing and amounts of local Scribe. This can > be treated as legacy part. > We have generated ScribeSource and used with more effective Thrift code > without deserializing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (FLUME-1382) Flume adopt message from existing local Scribe
[ https://issues.apache.org/jira/browse/FLUME-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denny Ye updated FLUME-1382: Attachment: (was: FLUME-1382.patch) > Flume adopt message from existing local Scribe > -- > > Key: FLUME-1382 > URL: https://issues.apache.org/jira/browse/FLUME-1382 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Affects Versions: v1.2.0 >Reporter: Denny Ye >Assignee: Denny Ye >Priority: Minor > Labels: scribe, thrift > Fix For: v1.3.0 > > Attachments: FLUME-1382-3.patch, FLUME-1382-doc-2.patch > > > Currently, we are using Scribe in data ingest system. Central Scribe is hard > to maintain and upgrade. Thus, we would like to replace central Scribe with > Flume and adopt message from existing and amounts of local Scribe. This can > be treated as legacy part. > We have generated ScribeSource and used with more effective Thrift code > without deserializing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira