Hi guys,

The subject is a bit provocative but the topic is real and coming
again and again with the beam usage: how a dofn can handle some
"chunking".

The need is to be able to commit each N records but with N not too big.

The natural API for that in beam is the bundle one but bundles are not
reliable since they can be very small (flink) - we can say it is "ok"
even if it has some perf impacts - or too big (spark does full size /
#workers).

The workaround is what we see in the ES I/O: a maxSize which does an
eager flush. The issue is that then the checkpoint is not respected
and you can process multiple times the same records.

Any plan to make this API reliable and controllable from a beam point
of view (at least in a max manner)?

Thanks,
Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn

Reply via email to