I agree I might be too quick to call DoFn output need to fit in memory.
Actually I am not sure what Beam model say on this matter and what output
managers of particular runners do about it.

But SparkRunner definitely has an issue here. I did try set small
`fetchSize` for JdbcIO as well as change `storageLevel` to MEMORY_AND_DISK.
All fails on OOM.
When looking at the heap, most of it is used by linked list multi-map of
DoFnOutputManager here:
https://github.com/apache/beam/blob/v2.15.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/MultiDoFnFunction.java#L234

Reply via email to