Hi,

I am in a need to read a big-ish data set via JdbcIO. This forced me to
bump up memory for my executor (right now using SparkRunner). It seems that
JdbcIO has a requirement to fit all data in memory as it is using DoFn to
unfold query to list of elements.

BoundedSource would not face the need to fit result in memory, but JdbcIO
is using DoFn. Also, in recent discussion [1] it was suggested that
BoudnedSource should not be used as it is obsolete.

Does anyone faced this issue? What would be the best way to solve it? If
DoFn should be kept, then I can only think of splitting the query to ranges
and try to find most fitting number of rows to read at once.

I appreciate any thoughts.

[1]
https://lists.apache.org/list.html?dev@beam.apache.org:lte=1M:Reading%20from%20RDB%2C%20ParDo%20or%20BoundedSource

Reply via email to