[jira] [Commented] (FLUME-2917) Provide netcat UDP source as alternative to TCP

2017-06-24 Thread Chris Horrocks (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062094#comment-16062094
 ] 

Chris Horrocks commented on FLUME-2917:
---

Tristan, is there any help you need with this?

The existing SyslogUDP source has timestamp handling which prevents it's use 
with non RFC-3164 formatted logs, and this source would be highly beneficial in 
getting around that constraint.

> Provide netcat UDP source as alternative to TCP
> ---
>
> Key: FLUME-2917
> URL: https://issues.apache.org/jira/browse/FLUME-2917
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: 1.6.0
>Reporter: Tristan Stevens
>Assignee: Tristan Stevens
>Priority: Minor
> Attachments: FLUME-2917.patch
>
>
> Currently Flume provides a Netcat TCP source, however Netcat is often used 
> with UDP. There is an implementation of a UDP client in the SyslogUDP source, 
> this request takes this implementation and strips out the Syslog parts, thus 
> forming a Netcat UDP source - where each datagram is recorded as a Flume 
> event.
> The implementation is provided for this at 
> https://github.com/tmgstevens/FlumeNetcatUDPSource and also provided as an 
> attached patch for inclusion.
> N.B. Unit tests are provided for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Flume bechmarks

2016-10-14 Thread Chris Horrocks
I've got a pretty well resourced pre-production environment that might do the 
trick quite nicely.


-- Chris Horrocks


On Thu, Oct 13, 2016 at 4:55 pm, Lior Zeno <'liorz...@gmail.com'> wrote:
I think that we can come up with an initial version with little efforts.
The simplest scenario I can think of is running a Flume instance (with a
SeqGen source and a Null sink) for one minute, and then report the average
events per second.

On Thu, Oct 13, 2016 at 6:43 PM, Attila Simon <s...@cloudera.com> wrote:

> Good idea! What would be required to set up something similar for Flume?
> ie initial time cost for setting up the infrastructure and periodic time
> cost to add new use-cases.
>
> Cheers,
> Attila
>
>
>
> On Thu, Oct 13, 2016 at 5:19 PM, Lior Zeno <liorz...@gmail.com> wrote:
>
> > Hi All,
> >
> > Monitoring Flume's performance over time is an important step in every
> > production-level application. Benchmarking Flume on a nightly basis has
> > the following advantages:
> >
> > * Better understanding of Flume's bottlenecks.
> > * Allow users to compare the performance of different solutions, such as
> > Logstash and Fluentd.
> > * Better understanding of the influence of recent commits on performance.
> >
> > Logstash already conducts various performance tests, more details in this
> > link:
> > http://logstash-benchmarks.elastic.co/
> >
> > I propose adding a few micro-benchmarks showing Flume's TPS vs date (of
> > course, in the ideal case where the input and/or output do not bottleneck
> > the system), e.g. using the SeqGen source.
> >
> > Thoughts?
> >
> > Thanks
> >
>

[jira] [Commented] (FLUME-2171) Add Interceptor to remove headers from event

2016-08-12 Thread Chris Horrocks (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418653#comment-15418653
 ] 

Chris Horrocks commented on FLUME-2171:
---

This would be incredibly helpful considering the lack of flexibility in some of 
the sources with adding their own headers which may conflict with others 
further downstream. I'll try the patch out in our test environment later today 
and push Cloudera to backport it.

> Add Interceptor to remove headers from event
> 
>
> Key: FLUME-2171
> URL: https://issues.apache.org/jira/browse/FLUME-2171
> Project: Flume
>  Issue Type: New Feature
>  Components: Easy
>Affects Versions: v1.4.0
>Reporter: Gabriel Commeau
>Assignee: Gabriel Commeau
> Attachments: FLUME-2171-0.patch
>
>
> I found Flume OG's decorators to handle event headers useful and some to be 
> missing from Flume NG. More specifically, we could have an interceptor to 
> remove headers from an event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2173) Exactly once semantics for Flume

2016-07-23 Thread Chris Horrocks (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390761#comment-15390761
 ] 

Chris Horrocks commented on FLUME-2173:
---

I would have assumed that the writing to the sink would be conditional on 
having successfully incremented the offset. Obviously not ideal as is 
eitherway, an upstream source receiving events from multiple sinks would need 
to indepedently track the state of each offset, presenting scalability and 
persistence challenges.

> Exactly once semantics for Flume
> 
>
> Key: FLUME-2173
> URL: https://issues.apache.org/jira/browse/FLUME-2173
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Fix For: v2.0.0
>
>
> Currently Flume guarantees only at least once semantics. This jira is meant 
> to track exactly once semantics for Flume. My initial idea is to include uuid 
> event ids on events at the original source (use a config to mark a source an 
> original source) and identify destination sinks. At the destination sinks, 
> use a unique ZK Znode to track the events. If once seen (and configured), 
> pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a 
> backward compatible way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2173) Exactly once semantics for Flume

2016-07-23 Thread Chris Horrocks (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390753#comment-15390753
 ] 

Chris Horrocks commented on FLUME-2173:
---

Is there not precedent for this in the way Kafka uses ZK to track offsets for 
the consumption of each topic? Could each Flume agent not register it's sinks 
as ZK endpoints and update ZK with the offsets of messages (prepended to the 
header of flume events) as they are pulled from the channel?

> Exactly once semantics for Flume
> 
>
> Key: FLUME-2173
> URL: https://issues.apache.org/jira/browse/FLUME-2173
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Fix For: v2.0.0
>
>
> Currently Flume guarantees only at least once semantics. This jira is meant 
> to track exactly once semantics for Flume. My initial idea is to include uuid 
> event ids on events at the original source (use a config to mark a source an 
> original source) and identify destination sinks. At the destination sinks, 
> use a unique ZK Znode to track the events. If once seen (and configured), 
> pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a 
> backward compatible way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


SyslogUDPSource input format

2016-03-20 Thread Chris Horrocks
Hi,

I’ve been using flume-ng to ingest data into Kafka for approx 18 months. I’ve 
deployed agents into a multi-vendor environment and have been struggling with 
processing a number of syslog formats (rfc and non-rfc compliant) in a 
consistent way.

Looking at the SyslogUtils code influme-ng-core there are regex parsers for 
rfc3164 and rfc5424 message formats. Currently whenever I have to ingest events 
transported via syslog which are non-rfc compliant I am having to use a regex 
interceptor to provide the event header extraction. I can’t prove this is less 
efficient but reason would suggest it; especially if the source is always going 
to try to extract the rfc-compliant headers regardless of whether an 
interceptor is confgured.

My question is it worth looking at the extension of supported input formats; 
could the source be made configurable with a library of input formats (i.e. 
grok dictionary as with the MorphlineSolr sink), or is this a case of moving 
the event formatting upstream away from flume?


Thanks,

Chris


signature.asc
Description: Message signed with OpenPGP using GPGMail


[jira] [Commented] (FLUME-2755) Kafka Source reading multiple topics

2015-12-18 Thread Chris Horrocks (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064995#comment-15064995
 ] 

Chris Horrocks commented on FLUME-2755:
---

Just add another source for the additional topic bound to the same memory 
channel?

> Kafka Source reading multiple topics
> 
>
> Key: FLUME-2755
> URL: https://issues.apache.org/jira/browse/FLUME-2755
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Reporter: Guillaume Desbieys
>Priority: Minor
> Fix For: v1.7.0
>
> Attachments: FLUME-2755.patch
>
>
> At the time Kafka source can only reads messages from a single topics.
> I attached a patch to read from multiple topics using wildcard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)