Re: Question a possible use can for Iterative Streams.

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi Marco,

In the ideal setup, enrichment data existing in external databases is
bootstrapped into the streaming job via Flink's State Processor API, and any
follow-up changes to the enrichment data is streamed into the job as a
second union input on the enrichment operator.
For this solution to scale, lookups to the enrichment data needs to be by
the same key as the input data, i.e. the enrichment data is co-partitioned
with the input data stream.

I assume you've already thought about whether or not this would work for
your case, as it's a common setup for streaming enrichment.

Otherwise, I believe your brainstorming is heading in the right direction,
in the case that remote database lookups + local caching in state is a must.
I'm personally not familiar with the iterative streams in Flink, but in
general I think it is currently discouraged to use it.

On the other hand, I think using Stateful Function's [1] programing
abstraction might work here, as it allows arbitrary messaging between
functions and cyclic dataflows.
There's also an SDK that allows you to embed StateFun functions within a
Flink DataStream job [2].

Very briefly, the way you would model this database cache hit / remote
lookup is by implementing a function, e.g. called DatabaseCache.
The function would expect message types of Lookup(lookupKey), and replies
with a response of Result(lookupKey, value). The abstraction allows you, for
on incoming message, to register state (similar to vanilla Flink), as well
as register async operations with which you'll use to perform remote
database lookups in case of cache / state miss. It also provides means for
"timers" in the form of delayed messages being sent to itself, if you need
some mechanism for cache invalidation.

Hope this provides some direction for you to think about!

Cheers,
Gordon

[1] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/
[2]
https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/sdk/flink-datastream.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Question a possible use can for Iterative Streams.

2021-02-02 Thread Marco Villalobos
Hi everybody,

I am brainstorming how it might be possible to perform database enrichment
with the DataStream API, use keyed state for caching, and also utilize
Async IO.

Since AsyncIO does not support keyed state, then is it possible to use an
Iterative Stream that uses keyed state for caching in the main body, and
uses feedback to fetch cache misses with AsyncIO?

I hope this diagram conveys my idea.

[image: image.png]

I am thinking of leveraging an Iterative Stream in this manner, but I am
not quite sure how Iterative Steams work since the example is not clear to
me.