Hello Flink devs!

My feature suggestion is to allow HybridSource component to have a multiway
graph of sources, each node might have multiple branches instead of a
linear sources structure.
On each step of the way, there can be multiple sources that runs in
parallel.

For example:

[batch-source]-> [live-source1, live-source2]

[batch-source1, batch-source2] -> live-source

As a Flink user, a lot of my use cases required "state warm up", which
allowed me to ingest all of the pre existing state to Flink when booting up
my application without a checkpoint (for example because of a structure
change in the graph or a change in the state structure that is not backward
compatible).
>From my experience, the easiest way to implement such a state warm up was
through the HybridSource component, which allowed me to connect batch
source that ingested the state first, and then after it's done allow the
real time streams to start reading the messages.

The problem is that, when the real time streams composes multiple sources
from different places which has to be unioned afterwards, the HybridSource
component doesn't support that. It only supports putting a single live
source after the batch one.
Plus, from the warm up side, there isn't an option to set multiple batch
sources in parallel.

I know that there's a way to do that with a batch application that creates
a savepoint beforehand, and than starting the live application from that
savepoint. That solution requires a lot of ops overhead for the developer,
which has to create a process with an outside orchestrator.

Any feedback would be welcomed! Thank you so much

Reply via email to