Ingesting data from an API

2018-11-11 Thread Aarti Gupta
Hi,

I have an API that emits output that I want to use as a data source for
Flink.

I have written a custom source function that is as follows -

public class DynamicRuleSource extends AlertingRuleSource {
private ArrayList rules = new ArrayList();


public void run(SourceContext ctx) throws Exception {
System.out.println("In run ");
while(true) {
while (!rules.isEmpty()) {
Rule rule = rules.remove(0);
ctx.collectWithTimestamp(rule, 0);
ctx.emitWatermark(new Watermark(0));
}
Thread.sleep(1000);
}
}

public void addRule(Rule rule) {
rules.add(rule);
}

@Override
public void cancel() {
}
}


When the API is invoked, it calls the addRule method in my CustomSource
function.

The run method in CustomSource polls for any data to be ingested.

The same object instance is shared with the API and the Flink Execution
environment, however, the output of the API does not get ingested into the
Flink DataStream.

Is this the right pattern to use, or is Kafka the recommended way of
streaming data into Flink ?

--Aarti

-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>


Re: Grok and Flink

2018-08-30 Thread Aarti Gupta
Interesting, thanks Lehuede. Will take a look.

--Aarti

On Thu, Aug 30, 2018 at 5:59 PM, Lehuede sebastien 
wrote:

> Hi,
>
> To parse my logs and reuse all my Grok pattern, i use the Java Grok API
> directly in my DataStream. Please see : https://github.com/thekrakken/
> java-grok
>
> With that you should be able to get rid of the full Logstash piece and use
> only the Grok part.
>
> Another solution, for example if you have logs/events in CEF Format, you
> can just use 'split' in the flatmap function for example.
>
> Hope will help.
>
> Regards,
> Sebastien.
>



-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>


Grok and Flink

2018-08-30 Thread Aarti Gupta
Hi,

We are using the Grok filter in Logstash to parse and enrich our data. Grok
provides inbuilt parsing for common log sources such as Apache, this allows
us to add structure to unstructured data.

After the data has been parsed in Logstash, we then stream the data over
Kafka to Flink for further CEP processing.

We are looking to see if we can get rid of the Logstash piece and do all of
the data enrichment and parsing in Flink.

Our question - does Flink have an inbuilt library similar to Grok that
provides out of the box parsing for common log formats.

Thanks in advance,
Aarti

-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>


Re: Dynamic Rule Evaluation in Flink

2018-07-09 Thread Aarti Gupta
Hi,

We are evaluating Esper <http://www.espertech.com/> to use as a CEP plugged
into Flink.

We would want to use Flink's connected streams to connect our rules and
events streams and then invoke Esper CEP in the co-process function to
evaluate the rules against the events.

Would there be any gotchas if we did this ?

--Aarti





On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske  wrote:

> Hi,
>
> > Flink doesn't support connecting multiple streams with heterogeneous
> schema
>
> This is not correct.
> Flink is very well able to connect streams with different schema. However,
> you cannot union two streams with different schema.
> In order to reconfigure an operator with changing rules, you can use
> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].
>
> In order to dynamically reconfigure aggregations and windowing, you would
> need to implement the processing logic yourself in the process function
> using state and timers.
> There is no built-in support to reconfigure such operators.
>
> Best,
> Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.5/dev/stream/state/broadcast_state.html
>
>
> 2018-07-05 14:41 GMT+02:00 Puneet Kinra 
> :
>
>> Hi Aarti
>>
>> Flink doesn't support connecting multiple streams with heterogeneous
>> schema ,you can try the below solution
>>
>> a) If stream A is sending some events make the output of that as
>> String/JsonString.
>>
>> b) If stream B is sending some events make the output of that as
>> String/JsonString.
>>
>> c) Now Using union function you can connect all the streams & use
>> FlatMap or process function to
>> evaluate all these streams against your defined rules.
>>
>> d) For Storing your aggregations and rules you can build your cache layer
>> and pass as a argument
>> to the constructor of that flatmap.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>>
>>> Hi,
>>>
>>> We are currently evaluating Flink to build a real time rule engine that
>>> looks at events in a stream and evaluates them against a set of rules.
>>>
>>> The rules are dynamically configured and can be of three types -
>>> 1. Simple Conditions - these require you to look inside a single event.
>>> Example, match rule if A happens.
>>> 2. Aggregations - these require you to aggregate multiple events.
>>> Example, match rule if more than five A's happen.
>>> 3. Complex patterns - these require you to look at multiple events and
>>> detect patterns. Example, match rule if A happens and then B happens.
>>>
>>> Since the rules are dynamically configured, we cannot use CEP.
>>>
>>> As an alternative, we are using connected streams and the CoFlatMap
>>> function to store the rules in shared state, and evaluate each incoming
>>> event against the stored rules.  Implementation is similar to what's
>>> outlined here
>>> <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink>
>>> .
>>>
>>> My questions -
>>>
>>>1. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require aggregations across events. (Match rule if 
>>> more
>>>than 5 A events happen)
>>>2. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require pattern detection across events (Match rule 
>>> if
>>>A happens, followed by B).
>>>3. How do you dynamically define a window function.
>>>
>>>
>>> --Aarti
>>>
>>>
>>> --
>>> Aarti Gupta <https://www.linkedin.com/company/qualys>
>>> Director, Engineering, Correlation
>>>
>>>
>>> aagu...@qualys.com
>>> T
>>>
>>>
>>> Qualys, Inc. – Blog <https://qualys.com/blog> | Community
>>> <https://community.qualys.com> | Twitter <https://twitter.com/qualys>
>>>
>>>
>>> <https://www.qualys.com/email-banner>
>>>
>>
>>
>>
>> --
>> *Cheers *
>>
>> *Puneet Kinra*
>>
>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
>> *
>>
>> *e-mail :puneet.ki...@customercentria.com
>> *
>>
>>
>>
>


-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>


Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Aarti Gupta
Thanks everyone, will take a look.

--Aarti

On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske  wrote:

> Hi,
>
> > Flink doesn't support connecting multiple streams with heterogeneous
> schema
>
> This is not correct.
> Flink is very well able to connect streams with different schema. However,
> you cannot union two streams with different schema.
> In order to reconfigure an operator with changing rules, you can use
> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].
>
> In order to dynamically reconfigure aggregations and windowing, you would
> need to implement the processing logic yourself in the process function
> using state and timers.
> There is no built-in support to reconfigure such operators.
>
> Best,
> Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.5/dev/stream/state/broadcast_state.html
>
>
> 2018-07-05 14:41 GMT+02:00 Puneet Kinra 
> :
>
>> Hi Aarti
>>
>> Flink doesn't support connecting multiple streams with heterogeneous
>> schema ,you can try the below solution
>>
>> a) If stream A is sending some events make the output of that as
>> String/JsonString.
>>
>> b) If stream B is sending some events make the output of that as
>> String/JsonString.
>>
>> c) Now Using union function you can connect all the streams & use
>> FlatMap or process function to
>> evaluate all these streams against your defined rules.
>>
>> d) For Storing your aggregations and rules you can build your cache layer
>> and pass as a argument
>> to the constructor of that flatmap.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>>
>>> Hi,
>>>
>>> We are currently evaluating Flink to build a real time rule engine that
>>> looks at events in a stream and evaluates them against a set of rules.
>>>
>>> The rules are dynamically configured and can be of three types -
>>> 1. Simple Conditions - these require you to look inside a single event.
>>> Example, match rule if A happens.
>>> 2. Aggregations - these require you to aggregate multiple events.
>>> Example, match rule if more than five A's happen.
>>> 3. Complex patterns - these require you to look at multiple events and
>>> detect patterns. Example, match rule if A happens and then B happens.
>>>
>>> Since the rules are dynamically configured, we cannot use CEP.
>>>
>>> As an alternative, we are using connected streams and the CoFlatMap
>>> function to store the rules in shared state, and evaluate each incoming
>>> event against the stored rules.  Implementation is similar to what's
>>> outlined here
>>> <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink>
>>> .
>>>
>>> My questions -
>>>
>>>1. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require aggregations across events. (Match rule if 
>>> more
>>>than 5 A events happen)
>>>2. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require pattern detection across events (Match rule 
>>> if
>>>A happens, followed by B).
>>>3. How do you dynamically define a window function.
>>>
>>>
>>> --Aarti
>>>
>>>
>>> --
>>> Aarti Gupta <https://www.linkedin.com/company/qualys>
>>> Director, Engineering, Correlation
>>>
>>>
>>> aagu...@qualys.com
>>> T
>>>
>>>
>>> Qualys, Inc. – Blog <https://qualys.com/blog> | Community
>>> <https://community.qualys.com> | Twitter <https://twitter.com/qualys>
>>>
>>>
>>> <https://www.qualys.com/email-banner>
>>>
>>
>>
>>
>> --
>> *Cheers *
>>
>> *Puneet Kinra*
>>
>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
>> *
>>
>> *e-mail :puneet.ki...@customercentria.com
>> *
>>
>>
>>
>


-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>


Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Aarti Gupta
+Ken.

--Aarti

On Thu, Jul 5, 2018 at 6:48 PM, Aarti Gupta  wrote:

> Thanks everyone, will take a look.
>
> --Aarti
>
> On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske  wrote:
>
>> Hi,
>>
>> > Flink doesn't support connecting multiple streams with heterogeneous
>> schema
>>
>> This is not correct.
>> Flink is very well able to connect streams with different schema.
>> However, you cannot union two streams with different schema.
>> In order to reconfigure an operator with changing rules, you can use
>> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].
>>
>> In order to dynamically reconfigure aggregations and windowing, you would
>> need to implement the processing logic yourself in the process function
>> using state and timers.
>> There is no built-in support to reconfigure such operators.
>>
>> Best,
>> Fabian
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>> dev/stream/state/broadcast_state.html
>>
>>
>> 2018-07-05 14:41 GMT+02:00 Puneet Kinra > >:
>>
>>> Hi Aarti
>>>
>>> Flink doesn't support connecting multiple streams with heterogeneous
>>> schema ,you can try the below solution
>>>
>>> a) If stream A is sending some events make the output of that as
>>> String/JsonString.
>>>
>>> b) If stream B is sending some events make the output of that as
>>> String/JsonString.
>>>
>>> c) Now Using union function you can connect all the streams & use
>>> FlatMap or process function to
>>> evaluate all these streams against your defined rules.
>>>
>>> d) For Storing your aggregations and rules you can build your cache
>>> layer and pass as a argument
>>> to the constructor of that flatmap.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>>>
>>>> Hi,
>>>>
>>>> We are currently evaluating Flink to build a real time rule engine that
>>>> looks at events in a stream and evaluates them against a set of rules.
>>>>
>>>> The rules are dynamically configured and can be of three types -
>>>> 1. Simple Conditions - these require you to look inside a single event.
>>>> Example, match rule if A happens.
>>>> 2. Aggregations - these require you to aggregate multiple events.
>>>> Example, match rule if more than five A's happen.
>>>> 3. Complex patterns - these require you to look at multiple events and
>>>> detect patterns. Example, match rule if A happens and then B happens.
>>>>
>>>> Since the rules are dynamically configured, we cannot use CEP.
>>>>
>>>> As an alternative, we are using connected streams and the CoFlatMap
>>>> function to store the rules in shared state, and evaluate each incoming
>>>> event against the stored rules.  Implementation is similar to what's
>>>> outlined here
>>>> <https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink>
>>>> .
>>>>
>>>> My questions -
>>>>
>>>>1. Since the CoFlatMap function works on a single event, how do we
>>>>evaluate rules that require aggregations across events. (Match rule if 
>>>> more
>>>>than 5 A events happen)
>>>>2. Since the CoFlatMap function works on a single event, how do we
>>>>evaluate rules that require pattern detection across events (Match rule 
>>>> if
>>>>A happens, followed by B).
>>>>3. How do you dynamically define a window function.
>>>>
>>>>
>>>> --Aarti
>>>>
>>>>
>>>> --
>>>> Aarti Gupta <https://www.linkedin.com/company/qualys>
>>>> Director, Engineering, Correlation
>>>>
>>>>
>>>> aagu...@qualys.com
>>>> T
>>>>
>>>>
>>>> Qualys, Inc. – Blog <https://qualys.com/blog> | Community
>>>> <https://community.qualys.com> | Twitter <https://twitter.com/qualys>
>>>>
>>>>
>>>> <https://www.qualys.com/email-banner>
>>>>
>>>
>>>
>>>
>>> --
>>> *Cheers *
>>>
>>> *Puneet Kinra*
>>>
>>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
>>> *
>>>
>>> *e-mail :puneet.ki...@customercentria.com
>>> *
>>>
>>>
>>>
>>
>
>
> --
> Aarti Gupta <https://www.linkedin.com/company/qualys>
> Director, Engineering, Correlation
>
>
> aagu...@qualys.com
> T
>
>
> Qualys, Inc. – Blog <https://qualys.com/blog> | Community
> <https://community.qualys.com> | Twitter <https://twitter.com/qualys>
>
>
> <https://www.qualys.com/email-banner>
>



-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>


Dynamic Rule Evaluation in Flink

2018-07-02 Thread Aarti Gupta
 Hi,

We are currently evaluating Flink to build a real time rule engine that
looks at events in a stream and evaluates them against a set of rules.

The rules are dynamically configured and can be of three types -
1. Simple Conditions - these require you to look inside a single event.
Example, match rule if A happens.
2. Aggregations - these require you to aggregate multiple events. Example,
match rule if more than five A's happen.
3. Complex patterns - these require you to look at multiple events and
detect patterns. Example, match rule if A happens and then B happens.

Since the rules are dynamically configured, we cannot use CEP.

As an alternative, we are using connected streams and the CoFlatMap
function to store the rules in shared state, and evaluate each incoming
event against the stored rules.  Implementation is similar to what's
outlined here
<https://data-artisans.com/blog/bettercloud-dynamic-alerting-apache-flink>.

My questions -

   1. Since the CoFlatMap function works on a single event, how do we
   evaluate rules that require aggregations across events. (Match rule if more
   than 5 A events happen)
   2. Since the CoFlatMap function works on a single event, how do we
   evaluate rules that require pattern detection across events (Match rule if
   A happens, followed by B).
   3. How do you dynamically define a window function.


--Aarti


-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>