multi tenant workflow execution

2017-01-23 Thread Chen Qin
Hi there,

I am researching running one flink job to support customized event driven
workflow executions. The use case is to support running various workflows
that listen to a set of kafka topics and performing various rpc checks, a
user travel through multiple stages in a rule execution(workflow
execution). e.g

kafka topic : user click stream
rpc checks:

if user is member,
if user has shown interest of signup


​workflows:
​

workflow 1: user click -> if user is member do A then do B
workflow 2: user click -> if user has shown interest of signup then do A
otherwise wait for 60 mins and try recheck, expire in 24 hours

The goal is as I said to run workflow1 & workflow2 in one flink job.

Initial thinking describes below

sources are series of kafka topics, all events go through coMap,cache
lookup event -> rules mapping and fan out to multiple {rule, user} tuple.
Based on rule definition and stage user is in a given rule, it do series of
async rpc check and side outputs to various of sinks.

   - If a {rule, user} tuple needs to stay in a operator states longer (1
   day), there should be a window following async rpc checks with customized
   purgetrigger firing those passes and purge either pass check or expired
   tuples.
   - If a {rule, user} execute to a stage which waits for a kafka event, it
   should be added to cache and hookup with coMap lookups near sources


 Does that makes sense?

Thanks,
Chen


Re: multi tenant workflow execution

2017-01-24 Thread Fabian Hueske
Hi Chen,

if you plan to implement your application on top of the upcoming Flink
1.2.0 release, you might find the new AsyncFunction [1] and the
ProcessFunction [2] helpful.
AsyncFunction can be used for non-blocking calls to external services and
maintains the checkpointing semantics.
ProcessFunction allows to register and react to timers. This might easier
to use than a window for the 24h timeout.

Best,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html

2017-01-24 0:41 GMT+01:00 Chen Qin :

> Hi there,
>
> I am researching running one flink job to support customized event driven
> workflow executions. The use case is to support running various workflows
> that listen to a set of kafka topics and performing various rpc checks, a
> user travel through multiple stages in a rule execution(workflow
> execution). e.g
>
> kafka topic : user click stream
> rpc checks:
>
> if user is member,
> if user has shown interest of signup
>
>
> ​workflows:
> ​
>
> workflow 1: user click -> if user is member do A then do B
> workflow 2: user click -> if user has shown interest of signup then do A
> otherwise wait for 60 mins and try recheck, expire in 24 hours
>
> The goal is as I said to run workflow1 & workflow2 in one flink job.
>
> Initial thinking describes below
>
> sources are series of kafka topics, all events go through coMap,cache
> lookup event -> rules mapping and fan out to multiple {rule, user} tuple.
> Based on rule definition and stage user is in a given rule, it do series of
> async rpc check and side outputs to various of sinks.
>
>- If a {rule, user} tuple needs to stay in a operator states longer (1
>day), there should be a window following async rpc checks with customized
>purgetrigger firing those passes and purge either pass check or expired
>tuples.
>- If a {rule, user} execute to a stage which waits for a kafka event,
>it should be added to cache and hookup with coMap lookups near sources
>
>
>  Does that makes sense?
>
> Thanks,
> Chen
>
>
>


Re: multi tenant workflow execution

2017-01-24 Thread Chen Qin
Hi Fabian,

AsyncFunction and ProcessFunction do help!

I assume per event timers I created in implement RichProcessFunction will
be part of key grouped states & cached in memory during runtime right? I am
interested in this because we are targeting large deployment of million TPS
event source. I would like to understand checkpoint size and speed
implications.

How about checkpointing iteration stream? Can we achieve at least once
semantic in 1.2 on integration jobs?

Thanks,
Chen

On Tue, Jan 24, 2017 at 2:26 AM, Fabian Hueske  wrote:

> Hi Chen,
>
> if you plan to implement your application on top of the upcoming Flink
> 1.2.0 release, you might find the new AsyncFunction [1] and the
> ProcessFunction [2] helpful.
> AsyncFunction can be used for non-blocking calls to external services and
> maintains the checkpointing semantics.
> ProcessFunction allows to register and react to timers. This might easier
> to use than a window for the 24h timeout.
>
> Best,
> Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/dev/stream/asyncio.html
> [2] https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/dev/stream/process_function.html
>
> 2017-01-24 0:41 GMT+01:00 Chen Qin :
>
>> Hi there,
>>
>> I am researching running one flink job to support customized event driven
>> workflow executions. The use case is to support running various workflows
>> that listen to a set of kafka topics and performing various rpc checks, a
>> user travel through multiple stages in a rule execution(workflow
>> execution). e.g
>>
>> kafka topic : user click stream
>> rpc checks:
>>
>> if user is member,
>> if user has shown interest of signup
>>
>>
>> ​workflows:
>> ​
>>
>> workflow 1: user click -> if user is member do A then do B
>> workflow 2: user click -> if user has shown interest of signup then do A
>> otherwise wait for 60 mins and try recheck, expire in 24 hours
>>
>> The goal is as I said to run workflow1 & workflow2 in one flink job.
>>
>> Initial thinking describes below
>>
>> sources are series of kafka topics, all events go through coMap,cache
>> lookup event -> rules mapping and fan out to multiple {rule, user} tuple.
>> Based on rule definition and stage user is in a given rule, it do series of
>> async rpc check and side outputs to various of sinks.
>>
>>- If a {rule, user} tuple needs to stay in a operator states longer
>>(1 day), there should be a window following async rpc checks with
>>customized purgetrigger firing those passes and purge either pass check or
>>expired tuples.
>>- If a {rule, user} execute to a stage which waits for a kafka event,
>>it should be added to cache and hookup with coMap lookups near sources
>>
>>
>>  Does that makes sense?
>>
>> Thanks,
>> Chen
>>
>>
>>
>


Re: multi tenant workflow execution

2017-01-25 Thread Fabian Hueske
Hi Chen,

yes, timers of a ProcessFunction are organized by key (you can have
multiple timers per key as well), stored in the keyed state, checkpointed,
and restored.

I'm not sure about the guarantees for iterative streams.

Best, Fabian

2017-01-25 8:18 GMT+01:00 Chen Qin :

> Hi Fabian,
>
> AsyncFunction and ProcessFunction do help!
>
> I assume per event timers I created in implement RichProcessFunction will
> be part of key grouped states & cached in memory during runtime right? I am
> interested in this because we are targeting large deployment of million TPS
> event source. I would like to understand checkpoint size and speed
> implications.
>
> How about checkpointing iteration stream? Can we achieve at least once
> semantic in 1.2 on integration jobs?
>
> Thanks,
> Chen
>
> On Tue, Jan 24, 2017 at 2:26 AM, Fabian Hueske  wrote:
>
>> Hi Chen,
>>
>> if you plan to implement your application on top of the upcoming Flink
>> 1.2.0 release, you might find the new AsyncFunction [1] and the
>> ProcessFunction [2] helpful.
>> AsyncFunction can be used for non-blocking calls to external services and
>> maintains the checkpointing semantics.
>> ProcessFunction allows to register and react to timers. This might easier
>> to use than a window for the 24h timeout.
>>
>> Best,
>> Fabian
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/
>> dev/stream/asyncio.html
>> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.2/
>> dev/stream/process_function.html
>>
>> 2017-01-24 0:41 GMT+01:00 Chen Qin :
>>
>>> Hi there,
>>>
>>> I am researching running one flink job to support customized event
>>> driven workflow executions. The use case is to support running various
>>> workflows that listen to a set of kafka topics and performing various rpc
>>> checks, a user travel through multiple stages in a rule execution(workflow
>>> execution). e.g
>>>
>>> kafka topic : user click stream
>>> rpc checks:
>>>
>>> if user is member,
>>> if user has shown interest of signup
>>>
>>>
>>> ​workflows:
>>> ​
>>>
>>> workflow 1: user click -> if user is member do A then do B
>>> workflow 2: user click -> if user has shown interest of signup then do A
>>> otherwise wait for 60 mins and try recheck, expire in 24 hours
>>>
>>> The goal is as I said to run workflow1 & workflow2 in one flink job.
>>>
>>> Initial thinking describes below
>>>
>>> sources are series of kafka topics, all events go through coMap,cache
>>> lookup event -> rules mapping and fan out to multiple {rule, user} tuple.
>>> Based on rule definition and stage user is in a given rule, it do series of
>>> async rpc check and side outputs to various of sinks.
>>>
>>>- If a {rule, user} tuple needs to stay in a operator states longer
>>>(1 day), there should be a window following async rpc checks with
>>>customized purgetrigger firing those passes and purge either pass check 
>>> or
>>>expired tuples.
>>>- If a {rule, user} execute to a stage which waits for a kafka
>>>event, it should be added to cache and hookup with coMap lookups near
>>>sources
>>>
>>>
>>>  Does that makes sense?
>>>
>>> Thanks,
>>> Chen
>>>
>>>
>>>
>>
>