Re: JdbcIO read needs to fit in memory

Eugene Kirpichov Fri, 25 Oct 2019 08:27:26 -0700

Yeah - in this case your primary option is to use JdbcIO.readAll() and shard
your query, as suggested above.

Alternative hypothesis: is the result set of your query actually big enough
that it *shouldn't* fit in memory? Or could it be a matter of inefficient
storage of its elements? Could you briefly describe how big is the result
set and in what form do you store its elements?

On Fri, Oct 25, 2019 at 5:47 AM Jozef Vilcek <jozo.vil...@gmail.com> wrote:

> I agree I might be too quick to call DoFn output need to fit in memory.
> Actually I am not sure what Beam model say on this matter and what output
> managers of particular runners do about it.
>
> But SparkRunner definitely has an issue here. I did try set small
> `fetchSize` for JdbcIO as well as change `storageLevel` to MEMORY_AND_DISK.
> All fails on OOM.
> When looking at the heap, most of it is used by linked list multi-map of
> DoFnOutputManager here:
>
> https://github.com/apache/beam/blob/v2.15.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/MultiDoFnFunction.java#L234
>
>
>

Re: JdbcIO read needs to fit in memory

Reply via email to