Re: Connection leaks with PostgreSQL instance

Reuven Lax Thu, 17 Jan 2019 07:22:17 -0800

But I think it will be called unless the process crashes though.

On Thu, Jan 17, 2019, 7:05 AM Ismaël Mejía <ieme...@gmail.com wrote:


> Couldn't this be related also to the fact that @Teardown is
> best-effort in Dataflow?
>
> On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko
> <aromanenko....@gmail.com> wrote:
> >
> > Kenn,
> >
> > I’m not sure that we have a connection leak in JdbcIO since new
> connection is being obtained from an instance of javax.sql.DataSource
> (created in @Setup) and which is org.apache.commons.dbcp2.BasicDataSource
> by default. BasicDataSource uses connection pool and closes all idle
> connections in "close()”.
> >
> > In its turn, JdbcIO calls DataSource.close() in @Teardown, so all idle
> connections should be closed and released there in case of fails. Though,
> potentially some connections, that has been delegated to client before and
> were not not properly returned to pool, could be leaked… Anyway, I think it
> could be a good idea to call "connection.close()” (return to connection
> pool) explicitly in case of any exception happened during bundle processing.
> >
> > Probably JB may provide more details as original author of JdbcIO.
> >
> > On 14 Jan 2019, at 21:37, Kenneth Knowles <k...@apache.org> wrote:
> >
> > Hi Jonathan,
> >
> > JdbcIO.write() just invokes this DoFn:
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
> >
> > It establishes a connection in @StartBundle and then in @FinishBundle it
> commits a batch and closes the connection. If an error happens in
> @StartBundle or @ProcessElement there will be a retry with a fresh instance
> of the DoFn, which will establish a new connection. It looks like neither
> @StartBundle nor @ProcessElement closes the connection, so I'm guessing
> that the old connection sticks around because the worker process was not
> terminated. So the Beam runner and Dataflow service are working as intended
> and this is an issue with JdbcIO, unless I've made a mistake in my reading
> or analysis.
> >
> > Would you mind reporting these details to
> https://issues.apache.org/jira/projects/BEAM/ ?
> >
> > Kenn
> >
> > On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron <
> jonathan.per...@lumapps.com> wrote:
> >>
> >> Hello !
> >>
> >> My question is maybe mainly GCP-oriented, so I apologize if it is not
> fully related to the Beam community.
> >>
> >> We have a streaming pipeline running on Dataflow which writes data to a
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from
> connection leak spikes on a regular basis:
> >>
> >> <ofbkcnmdfbgcoooc.png>
> >>
> >> The connections are kept alive until the pipeline is canceled/drained:
> >>
> >> <gngklddbhnckgpni.png>
> >>
> >> We are writing to the database with:
> >>
> >> - individual DoFn where we open/close the connection using the standard
> JDBC try/catch (SQLException ex)/finally statements;
> >>
> >> - a Pipeline.apply(JdbcIO.<SessionData>write()) operations.
> >>
> >> I observed that these spikes happens most of the time after I/O errors
> with the database. Has anyone observed the same situation ?
> >>
> >> I have several questions/observations, please correct me if I am wrong
> (I am not from the java environment, so some can seem pretty trivial) :
> >>
> >> - Does the apply method handles SQLException or I/O errors ?
> >>
> >> - Would the use of a connection pool prevents such behaviours ? If so,
> how would one implement it to allow all workers to use it ? Could it be
> implemented with JDBC Connection pooling ?
> >>
> >> I am worrying about the serialization if one would pass a Connection
> item as an argument of a DoFn.
> >>
> >> Thank you in advance for your comments and reactions.
> >>
> >> Best regards,
> >>
> >> Jonathan
> >
> >
>

Re: Connection leaks with PostgreSQL instance

Reply via email to