[jira] [Commented] (FLUME-2917) Provide netcat UDP source as alternative to TCP
[ https://issues.apache.org/jira/browse/FLUME-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062094#comment-16062094 ] Chris Horrocks commented on FLUME-2917: --- Tristan, is there any help you need with this? The existing SyslogUDP source has timestamp handling which prevents it's use with non RFC-3164 formatted logs, and this source would be highly beneficial in getting around that constraint. > Provide netcat UDP source as alternative to TCP > --- > > Key: FLUME-2917 > URL: https://issues.apache.org/jira/browse/FLUME-2917 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Affects Versions: 1.6.0 >Reporter: Tristan Stevens >Assignee: Tristan Stevens >Priority: Minor > Attachments: FLUME-2917.patch > > > Currently Flume provides a Netcat TCP source, however Netcat is often used > with UDP. There is an implementation of a UDP client in the SyslogUDP source, > this request takes this implementation and strips out the Syslog parts, thus > forming a Netcat UDP source - where each datagram is recorded as a Flume > event. > The implementation is provided for this at > https://github.com/tmgstevens/FlumeNetcatUDPSource and also provided as an > attached patch for inclusion. > N.B. Unit tests are provided for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Flume bechmarks
I've got a pretty well resourced pre-production environment that might do the trick quite nicely. -- Chris Horrocks On Thu, Oct 13, 2016 at 4:55 pm, Lior Zeno <'liorz...@gmail.com'> wrote: I think that we can come up with an initial version with little efforts. The simplest scenario I can think of is running a Flume instance (with a SeqGen source and a Null sink) for one minute, and then report the average events per second. On Thu, Oct 13, 2016 at 6:43 PM, Attila Simon <s...@cloudera.com> wrote: > Good idea! What would be required to set up something similar for Flume? > ie initial time cost for setting up the infrastructure and periodic time > cost to add new use-cases. > > Cheers, > Attila > > > > On Thu, Oct 13, 2016 at 5:19 PM, Lior Zeno <liorz...@gmail.com> wrote: > > > Hi All, > > > > Monitoring Flume's performance over time is an important step in every > > production-level application. Benchmarking Flume on a nightly basis has > > the following advantages: > > > > * Better understanding of Flume's bottlenecks. > > * Allow users to compare the performance of different solutions, such as > > Logstash and Fluentd. > > * Better understanding of the influence of recent commits on performance. > > > > Logstash already conducts various performance tests, more details in this > > link: > > http://logstash-benchmarks.elastic.co/ > > > > I propose adding a few micro-benchmarks showing Flume's TPS vs date (of > > course, in the ideal case where the input and/or output do not bottleneck > > the system), e.g. using the SeqGen source. > > > > Thoughts? > > > > Thanks > > >
[jira] [Commented] (FLUME-2171) Add Interceptor to remove headers from event
[ https://issues.apache.org/jira/browse/FLUME-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418653#comment-15418653 ] Chris Horrocks commented on FLUME-2171: --- This would be incredibly helpful considering the lack of flexibility in some of the sources with adding their own headers which may conflict with others further downstream. I'll try the patch out in our test environment later today and push Cloudera to backport it. > Add Interceptor to remove headers from event > > > Key: FLUME-2171 > URL: https://issues.apache.org/jira/browse/FLUME-2171 > Project: Flume > Issue Type: New Feature > Components: Easy >Affects Versions: v1.4.0 >Reporter: Gabriel Commeau >Assignee: Gabriel Commeau > Attachments: FLUME-2171-0.patch > > > I found Flume OG's decorators to handle event headers useful and some to be > missing from Flume NG. More specifically, we could have an interceptor to > remove headers from an event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2173) Exactly once semantics for Flume
[ https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390761#comment-15390761 ] Chris Horrocks commented on FLUME-2173: --- I would have assumed that the writing to the sink would be conditional on having successfully incremented the offset. Obviously not ideal as is eitherway, an upstream source receiving events from multiple sinks would need to indepedently track the state of each offset, presenting scalability and persistence challenges. > Exactly once semantics for Flume > > > Key: FLUME-2173 > URL: https://issues.apache.org/jira/browse/FLUME-2173 > Project: Flume > Issue Type: Bug >Reporter: Hari Shreedharan >Assignee: Hari Shreedharan > Fix For: v2.0.0 > > > Currently Flume guarantees only at least once semantics. This jira is meant > to track exactly once semantics for Flume. My initial idea is to include uuid > event ids on events at the original source (use a config to mark a source an > original source) and identify destination sinks. At the destination sinks, > use a unique ZK Znode to track the events. If once seen (and configured), > pull the duplicate out. > This might need some refactoring, but my belief is we can do this in a > backward compatible way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2173) Exactly once semantics for Flume
[ https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390753#comment-15390753 ] Chris Horrocks commented on FLUME-2173: --- Is there not precedent for this in the way Kafka uses ZK to track offsets for the consumption of each topic? Could each Flume agent not register it's sinks as ZK endpoints and update ZK with the offsets of messages (prepended to the header of flume events) as they are pulled from the channel? > Exactly once semantics for Flume > > > Key: FLUME-2173 > URL: https://issues.apache.org/jira/browse/FLUME-2173 > Project: Flume > Issue Type: Bug >Reporter: Hari Shreedharan >Assignee: Hari Shreedharan > Fix For: v2.0.0 > > > Currently Flume guarantees only at least once semantics. This jira is meant > to track exactly once semantics for Flume. My initial idea is to include uuid > event ids on events at the original source (use a config to mark a source an > original source) and identify destination sinks. At the destination sinks, > use a unique ZK Znode to track the events. If once seen (and configured), > pull the duplicate out. > This might need some refactoring, but my belief is we can do this in a > backward compatible way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
SyslogUDPSource input format
Hi, I’ve been using flume-ng to ingest data into Kafka for approx 18 months. I’ve deployed agents into a multi-vendor environment and have been struggling with processing a number of syslog formats (rfc and non-rfc compliant) in a consistent way. Looking at the SyslogUtils code influme-ng-core there are regex parsers for rfc3164 and rfc5424 message formats. Currently whenever I have to ingest events transported via syslog which are non-rfc compliant I am having to use a regex interceptor to provide the event header extraction. I can’t prove this is less efficient but reason would suggest it; especially if the source is always going to try to extract the rfc-compliant headers regardless of whether an interceptor is confgured. My question is it worth looking at the extension of supported input formats; could the source be made configurable with a library of input formats (i.e. grok dictionary as with the MorphlineSolr sink), or is this a case of moving the event formatting upstream away from flume? Thanks, Chris signature.asc Description: Message signed with OpenPGP using GPGMail
[jira] [Commented] (FLUME-2755) Kafka Source reading multiple topics
[ https://issues.apache.org/jira/browse/FLUME-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064995#comment-15064995 ] Chris Horrocks commented on FLUME-2755: --- Just add another source for the additional topic bound to the same memory channel? > Kafka Source reading multiple topics > > > Key: FLUME-2755 > URL: https://issues.apache.org/jira/browse/FLUME-2755 > Project: Flume > Issue Type: Improvement > Components: Sinks+Sources >Reporter: Guillaume Desbieys >Priority: Minor > Fix For: v1.7.0 > > Attachments: FLUME-2755.patch > > > At the time Kafka source can only reads messages from a single topics. > I attached a patch to read from multiple topics using wildcard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)