Ingesting data from an API
Hi, I have an API that emits output that I want to use as a data source for Flink. I have written a custom source function that is as follows - public class DynamicRuleSource extends AlertingRuleSource { private ArrayList rules = new ArrayList(); public void run(SourceContext ctx) throws Exception { System.out.println("In run "); while(true) { while (!rules.isEmpty()) { Rule rule = rules.remove(0); ctx.collectWithTimestamp(rule, 0); ctx.emitWatermark(new Watermark(0)); } Thread.sleep(1000); } } public void addRule(Rule rule) { rules.add(rule); } @Override public void cancel() { } } When the API is invoked, it calls the addRule method in my CustomSource function. The run method in CustomSource polls for any data to be ingested. The same object instance is shared with the API and the Flink Execution environment, however, the output of the API does not get ingested into the Flink DataStream. Is this the right pattern to use, or is Kafka the recommended way of streaming data into Flink ? --Aarti -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>
Re: Grok and Flink
Interesting, thanks Lehuede. Will take a look. --Aarti On Thu, Aug 30, 2018 at 5:59 PM, Lehuede sebastien wrote: > Hi, > > To parse my logs and reuse all my Grok pattern, i use the Java Grok API > directly in my DataStream. Please see : https://github.com/thekrakken/ > java-grok > > With that you should be able to get rid of the full Logstash piece and use > only the Grok part. > > Another solution, for example if you have logs/events in CEF Format, you > can just use 'split' in the flatmap function for example. > > Hope will help. > > Regards, > Sebastien. > -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>
Grok and Flink
Hi, We are using the Grok filter in Logstash to parse and enrich our data. Grok provides inbuilt parsing for common log sources such as Apache, this allows us to add structure to unstructured data. After the data has been parsed in Logstash, we then stream the data over Kafka to Flink for further CEP processing. We are looking to see if we can get rid of the Logstash piece and do all of the data enrichment and parsing in Flink. Our question - does Flink have an inbuilt library similar to Grok that provides out of the box parsing for common log formats. Thanks in advance, Aarti -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>
Re: Dynamic Rule Evaluation in Flink
Hi, We are evaluating Esper <http://www.espertech.com/> to use as a CEP plugged into Flink. We would want to use Flink's connected streams to connect our rules and events streams and then invoke Esper CEP in the co-process function to evaluate the rules against the events. Would there be any gotchas if we did this ? --Aarti On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske wrote: > Hi, > > > Flink doesn't support connecting multiple streams with heterogeneous > schema > > This is not correct. > Flink is very well able to connect streams with different schema. However, > you cannot union two streams with different schema. > In order to reconfigure an operator with changing rules, you can use > BroadcastProcessFunction or KeyedBroadcastProcessFunction [1]. > > In order to dynamically reconfigure aggregations and windowing, you would > need to implement the processing logic yourself in the process function > using state and timers. > There is no built-in support to reconfigure such operators. > > Best, > Fabian > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.5/dev/stream/state/broadcast_state.html > > > 2018-07-05 14:41 GMT+02:00 Puneet Kinra > : > >> Hi Aarti >> >> Flink doesn't support connecting multiple streams with heterogeneous >> schema ,you can try the below solution >> >> a) If stream A is sending some events make the output of that as >> String/JsonString. >> >> b) If stream B is sending some events make the output of that as >> String/JsonString. >> >> c) Now Using union function you can connect all the streams & use >> FlatMap or process function to >> evaluate all these streams against your defined rules. >> >> d) For Storing your aggregations and rules you can build your cache layer >> and pass as a argument >> to the constructor of that flatmap. >> >> >> >> >> >> >> >> >> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta wrote: >> >>> Hi, >>> >>> We are currently evaluating Flink to build a real time rule engine that >>> looks at events in a stream and evaluates them against a set of rules. >>> >>> The rules are dynamically configured and can be of three types - >>> 1. Simple Conditions - these require you to look inside a single event. >>> Example, match rule if A happens. >>> 2. Aggregations - these require you to aggregate multiple events. >>> Example, match rule if more than five A's happen. >>> 3. Complex patterns - these require you to look at multiple events and >>> detect patterns. Example, match rule if A happens and then B happens. >>> >>> Since the rules are dynamically configured, we cannot use CEP. >>> >>> As an alternative, we are using connected streams and the CoFlatMap >>> function to store the rules in shared state, and evaluate each incoming >>> event against the stored rules. Implementation is similar to what's >>> outlined here >>> <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink> >>> . >>> >>> My questions - >>> >>>1. Since the CoFlatMap function works on a single event, how do we >>>evaluate rules that require aggregations across events. (Match rule if >>> more >>>than 5 A events happen) >>>2. Since the CoFlatMap function works on a single event, how do we >>>evaluate rules that require pattern detection across events (Match rule >>> if >>>A happens, followed by B). >>>3. How do you dynamically define a window function. >>> >>> >>> --Aarti >>> >>> >>> -- >>> Aarti Gupta <https://www.linkedin.com/company/qualys> >>> Director, Engineering, Correlation >>> >>> >>> aagu...@qualys.com >>> T >>> >>> >>> Qualys, Inc. – Blog <https://qualys.com/blog> | Community >>> <https://community.qualys.com> | Twitter <https://twitter.com/qualys> >>> >>> >>> <https://www.qualys.com/email-banner> >>> >> >> >> >> -- >> *Cheers * >> >> *Puneet Kinra* >> >> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com >> * >> >> *e-mail :puneet.ki...@customercentria.com >> * >> >> >> > -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>
Re: Dynamic Rule Evaluation in Flink
Thanks everyone, will take a look. --Aarti On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske wrote: > Hi, > > > Flink doesn't support connecting multiple streams with heterogeneous > schema > > This is not correct. > Flink is very well able to connect streams with different schema. However, > you cannot union two streams with different schema. > In order to reconfigure an operator with changing rules, you can use > BroadcastProcessFunction or KeyedBroadcastProcessFunction [1]. > > In order to dynamically reconfigure aggregations and windowing, you would > need to implement the processing logic yourself in the process function > using state and timers. > There is no built-in support to reconfigure such operators. > > Best, > Fabian > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.5/dev/stream/state/broadcast_state.html > > > 2018-07-05 14:41 GMT+02:00 Puneet Kinra > : > >> Hi Aarti >> >> Flink doesn't support connecting multiple streams with heterogeneous >> schema ,you can try the below solution >> >> a) If stream A is sending some events make the output of that as >> String/JsonString. >> >> b) If stream B is sending some events make the output of that as >> String/JsonString. >> >> c) Now Using union function you can connect all the streams & use >> FlatMap or process function to >> evaluate all these streams against your defined rules. >> >> d) For Storing your aggregations and rules you can build your cache layer >> and pass as a argument >> to the constructor of that flatmap. >> >> >> >> >> >> >> >> >> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta wrote: >> >>> Hi, >>> >>> We are currently evaluating Flink to build a real time rule engine that >>> looks at events in a stream and evaluates them against a set of rules. >>> >>> The rules are dynamically configured and can be of three types - >>> 1. Simple Conditions - these require you to look inside a single event. >>> Example, match rule if A happens. >>> 2. Aggregations - these require you to aggregate multiple events. >>> Example, match rule if more than five A's happen. >>> 3. Complex patterns - these require you to look at multiple events and >>> detect patterns. Example, match rule if A happens and then B happens. >>> >>> Since the rules are dynamically configured, we cannot use CEP. >>> >>> As an alternative, we are using connected streams and the CoFlatMap >>> function to store the rules in shared state, and evaluate each incoming >>> event against the stored rules. Implementation is similar to what's >>> outlined here >>> <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink> >>> . >>> >>> My questions - >>> >>>1. Since the CoFlatMap function works on a single event, how do we >>>evaluate rules that require aggregations across events. (Match rule if >>> more >>>than 5 A events happen) >>>2. Since the CoFlatMap function works on a single event, how do we >>>evaluate rules that require pattern detection across events (Match rule >>> if >>>A happens, followed by B). >>>3. How do you dynamically define a window function. >>> >>> >>> --Aarti >>> >>> >>> -- >>> Aarti Gupta <https://www.linkedin.com/company/qualys> >>> Director, Engineering, Correlation >>> >>> >>> aagu...@qualys.com >>> T >>> >>> >>> Qualys, Inc. – Blog <https://qualys.com/blog> | Community >>> <https://community.qualys.com> | Twitter <https://twitter.com/qualys> >>> >>> >>> <https://www.qualys.com/email-banner> >>> >> >> >> >> -- >> *Cheers * >> >> *Puneet Kinra* >> >> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com >> * >> >> *e-mail :puneet.ki...@customercentria.com >> * >> >> >> > -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>
Re: Dynamic Rule Evaluation in Flink
+Ken. --Aarti On Thu, Jul 5, 2018 at 6:48 PM, Aarti Gupta wrote: > Thanks everyone, will take a look. > > --Aarti > > On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske wrote: > >> Hi, >> >> > Flink doesn't support connecting multiple streams with heterogeneous >> schema >> >> This is not correct. >> Flink is very well able to connect streams with different schema. >> However, you cannot union two streams with different schema. >> In order to reconfigure an operator with changing rules, you can use >> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1]. >> >> In order to dynamically reconfigure aggregations and windowing, you would >> need to implement the processing logic yourself in the process function >> using state and timers. >> There is no built-in support to reconfigure such operators. >> >> Best, >> Fabian >> >> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >> dev/stream/state/broadcast_state.html >> >> >> 2018-07-05 14:41 GMT+02:00 Puneet Kinra > >: >> >>> Hi Aarti >>> >>> Flink doesn't support connecting multiple streams with heterogeneous >>> schema ,you can try the below solution >>> >>> a) If stream A is sending some events make the output of that as >>> String/JsonString. >>> >>> b) If stream B is sending some events make the output of that as >>> String/JsonString. >>> >>> c) Now Using union function you can connect all the streams & use >>> FlatMap or process function to >>> evaluate all these streams against your defined rules. >>> >>> d) For Storing your aggregations and rules you can build your cache >>> layer and pass as a argument >>> to the constructor of that flatmap. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta wrote: >>> >>>> Hi, >>>> >>>> We are currently evaluating Flink to build a real time rule engine that >>>> looks at events in a stream and evaluates them against a set of rules. >>>> >>>> The rules are dynamically configured and can be of three types - >>>> 1. Simple Conditions - these require you to look inside a single event. >>>> Example, match rule if A happens. >>>> 2. Aggregations - these require you to aggregate multiple events. >>>> Example, match rule if more than five A's happen. >>>> 3. Complex patterns - these require you to look at multiple events and >>>> detect patterns. Example, match rule if A happens and then B happens. >>>> >>>> Since the rules are dynamically configured, we cannot use CEP. >>>> >>>> As an alternative, we are using connected streams and the CoFlatMap >>>> function to store the rules in shared state, and evaluate each incoming >>>> event against the stored rules. Implementation is similar to what's >>>> outlined here >>>> <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink> >>>> . >>>> >>>> My questions - >>>> >>>>1. Since the CoFlatMap function works on a single event, how do we >>>>evaluate rules that require aggregations across events. (Match rule if >>>> more >>>>than 5 A events happen) >>>>2. Since the CoFlatMap function works on a single event, how do we >>>>evaluate rules that require pattern detection across events (Match rule >>>> if >>>>A happens, followed by B). >>>>3. How do you dynamically define a window function. >>>> >>>> >>>> --Aarti >>>> >>>> >>>> -- >>>> Aarti Gupta <https://www.linkedin.com/company/qualys> >>>> Director, Engineering, Correlation >>>> >>>> >>>> aagu...@qualys.com >>>> T >>>> >>>> >>>> Qualys, Inc. – Blog <https://qualys.com/blog> | Community >>>> <https://community.qualys.com> | Twitter <https://twitter.com/qualys> >>>> >>>> >>>> <https://www.qualys.com/email-banner> >>>> >>> >>> >>> >>> -- >>> *Cheers * >>> >>> *Puneet Kinra* >>> >>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com >>> * >>> >>> *e-mail :puneet.ki...@customercentria.com >>> * >>> >>> >>> >> > > > -- > Aarti Gupta <https://www.linkedin.com/company/qualys> > Director, Engineering, Correlation > > > aagu...@qualys.com > T > > > Qualys, Inc. – Blog <https://qualys.com/blog> | Community > <https://community.qualys.com> | Twitter <https://twitter.com/qualys> > > > <https://www.qualys.com/email-banner> > -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>
Dynamic Rule Evaluation in Flink
Hi, We are currently evaluating Flink to build a real time rule engine that looks at events in a stream and evaluates them against a set of rules. The rules are dynamically configured and can be of three types - 1. Simple Conditions - these require you to look inside a single event. Example, match rule if A happens. 2. Aggregations - these require you to aggregate multiple events. Example, match rule if more than five A's happen. 3. Complex patterns - these require you to look at multiple events and detect patterns. Example, match rule if A happens and then B happens. Since the rules are dynamically configured, we cannot use CEP. As an alternative, we are using connected streams and the CoFlatMap function to store the rules in shared state, and evaluate each incoming event against the stored rules. Implementation is similar to what's outlined here <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink>. My questions - 1. Since the CoFlatMap function works on a single event, how do we evaluate rules that require aggregations across events. (Match rule if more than 5 A events happen) 2. Since the CoFlatMap function works on a single event, how do we evaluate rules that require pattern detection across events (Match rule if A happens, followed by B). 3. How do you dynamically define a window function. --Aarti -- Aarti Gupta <https://www.linkedin.com/company/qualys> Director, Engineering, Correlation aagu...@qualys.com T Qualys, Inc. – Blog <https://qualys.com/blog> | Community <https://community.qualys.com> | Twitter <https://twitter.com/qualys> <https://www.qualys.com/email-banner>