Hi Chirag, Hi Vadim,

from the top of my head, I see two options here:

* Buffer the "fast" stream inside the KeyedBroadcastProcessFunction until
relevant (whatever this means in your use case) broadcast events have
arrived. Advantage: operationally easy, events are emitted as early as
possible. Disadvantage: state size might become very large, depending on
the nature of the broadcast stream it might be hard to know, when the
"relevant broadcast events have arrived".

* Start your job and only consume the broadcast stream (by configuration).
Once the stream is "fully processed", i.e. has caught up, take a savepoint.
Finally, start the job from this savepoint with the correct "fast" stream.
There is a small race condition between taking the savepoint and restarting
the job, which might matter (or not) depending on your use case.

This topic is related to event-time alignment in sources, which has been
actively discussed in the community in the past and we might be able to
solve this in a similar way in the future.

Cheers,

Konstantin

On Fri, Feb 8, 2019 at 5:48 PM Chirag Dewan <chirag.dewa...@yahoo.in> wrote:

> Hi Vadim,
>
> I would be interested in this too.
>
> Presently, I have to read my lookup source in the *open *method and keep
> it in a cache. By doing that I cannot make use of the broadcast state until
> ofcourse the first emit comes on the *Broadcast *stream.
>
> The problem with waiting the event stream is the lack of knowledge that I
> have read all the data from the lookup source. There is no possibility of
> having a special marker in the data as well for my use case.
>
> So pre loading the data seems to be the only option right now.
>
> Thanks,
>
> Chirag
>
>
>
> On Friday, 8 February, 2019, 7:45:37 pm IST, Vadim Vararu <
> vadim.var...@adswizz.com> wrote:
>
>
> Hi all,
>
> I need to use the broadcast state mechanism (
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html)
> for the next scenario.
>
> I have a reference data stream (slow) and an events stream (fast running)
> and I want to do a kind of lookup in the reference stream for each
> event. The broadcast state mechanism seems to fit perfect the scenario.
>
> From documentation:
> *As an example where broadcast state can emerge as a natural fit, one can
> imagine a low-throughput stream containing a set of rules which we want to
> evaluate against all elements coming from another stream.*
>
> However, I am not sure what is the correct way to delay the consumption of
> the fast running stream until the slow one is fully read (in case of a
> file) or until a marker is emitted (in case of some other source). Is there
> any way to accomplish that? It doesn't seem to be a rare use case.
>
> Thanks, Vadim.
>


-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Reply via email to