Handling Large Broadcast States

2021-06-16 Thread Rion Williams
Hey Flink folks,

I was discussing the use of the Broadcast Pattern with some colleagues today 
for a potential enrichment use-case and noticed that it wasn’t currently backed 
by RocksDB. This seems to indicate that it would be solely limited to the 
memory allocated, which might not support a large enrichment data set that our 
use case might run into (thousands of tenants with users and various other 
entities to enrich by).

Are there any plans to eventually add support for BroadcastState to be backed 
by a non-memory source? Or perhaps some technical limitations that might not 
make that possible? If the latter is true, is there a preferred pattern for 
handling enrichment/lookups for a very large set of data that may not be 
memory-bound?

Any advice or thoughts would be welcome!

Rion

Re: Handling Large Broadcast States

2021-06-18 Thread Piotr Nowojski
Hi,

As far as I know there are no plans to support other state backends with
BroadcastState. I don't know about any particular technical limitation, it
probably just hasn't been done. Also I don't know how much effort that
would be. Probably it wouldn't be easy.

 Timo, can you chip in how for example Table API/SQL is solving this
problem? I'm pretty sure Tablie API is using broadcast joins after all?

Best,
Piotrek

czw., 17 cze 2021 o 02:53 Rion Williams  napisał(a):

> Hey Flink folks,
>
> I was discussing the use of the Broadcast Pattern with some colleagues
> today for a potential enrichment use-case and noticed that it wasn’t
> currently backed by RocksDB. This seems to indicate that it would be solely
> limited to the memory allocated, which might not support a large enrichment
> data set that our use case might run into (thousands of tenants with users
> and various other entities to enrich by).
>
> Are there any plans to eventually add support for BroadcastState to be
> backed by a non-memory source? Or perhaps some technical limitations that
> might not make that possible? If the latter is true, is there a preferred
> pattern for handling enrichment/lookups for a very large set of data that
> may not be memory-bound?
>
> Any advice or thoughts would be welcome!
>
> Rion


Re: Handling Large Broadcast States

2021-06-18 Thread Timo Walther

Hi Rion,

as far as I know we also don't support broadcast streaming joins in 
Table API/SQL.


Are you sure that you need a broadcast pattern? Or would a regular hash 
join using connect() with a CoProcessFunction also work for you? Maybe 
with an artifical key to spread the load more evently?


What we support both in Table API/SQL and DataStream API are async 
lookups to external sources. E.g. Table API JDBC has lookup functionality.


I hope this helps a bit.

Regards,
Timo



On 18.06.21 14:00, Piotr Nowojski wrote:

Hi,

As far as I know there are no plans to support other state backends with 
BroadcastState. I don't know about any particular technical limitation, 
it probably just hasn't been done. Also I don't know how much effort 
that would be. Probably it wouldn't be easy.


  Timo, can you chip in how for example Table API/SQL is solving this 
problem? I'm pretty sure Tablie API is using broadcast joins after all?


Best,
Piotrek

czw., 17 cze 2021 o 02:53 Rion Williams > napisał(a):


Hey Flink folks,

I was discussing the use of the Broadcast Pattern with some
colleagues today for a potential enrichment use-case and noticed
that it wasn’t currently backed by RocksDB. This seems to indicate
that it would be solely limited to the memory allocated, which might
not support a large enrichment data set that our use case might run
into (thousands of tenants with users and various other entities to
enrich by).

Are there any plans to eventually add support for BroadcastState to
be backed by a non-memory source? Or perhaps some technical
limitations that might not make that possible? If the latter is
true, is there a preferred pattern for handling enrichment/lookups
for a very large set of data that may not be memory-bound?

Any advice or thoughts would be welcome!

Rion