Re: JdbcIO read needs to fit in memory

Alexey Romanenko Thu, 24 Oct 2019 09:32:10 -0700

Jozef, do you have any NPE stacktrace to share?


> On 24 Oct 2019, at 15:26, Jozef Vilcek <jozo.vil...@gmail.com> wrote:
> 
> Hi,
> 
> I am in a need to read a big-ish data set via JdbcIO. This forced me to bump 
> up memory for my executor (right now using SparkRunner). It seems that JdbcIO 
> has a requirement to fit all data in memory as it is using DoFn to unfold 
> query to list of elements.
> 
> BoundedSource would not face the need to fit result in memory, but JdbcIO is 
> using DoFn. Also, in recent discussion [1] it was suggested that 
> BoudnedSource should not be used as it is obsolete.
> 
> Does anyone faced this issue? What would be the best way to solve it? If DoFn 
> should be kept, then I can only think of splitting the query to ranges and 
> try to find most fitting number of rows to read at once.
> 
> I appreciate any thoughts. 
> 
> [1] 
> https://lists.apache.org/list.html?dev@beam.apache.org:lte=1M:Reading%20from%20RDB%2C%20ParDo%20or%20BoundedSource
>  
> <https://lists.apache.org/list.html?dev@beam.apache.org:lte=1M:Reading%20from%20RDB%2C%20ParDo%20or%20BoundedSource>

Re: JdbcIO read needs to fit in memory

Reply via email to