But I think it will be called unless the process crashes though. On Thu, Jan 17, 2019, 7:05 AM Ismaël Mejía <ieme...@gmail.com wrote:
> Couldn't this be related also to the fact that @Teardown is > best-effort in Dataflow? > > On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko > <aromanenko....@gmail.com> wrote: > > > > Kenn, > > > > I’m not sure that we have a connection leak in JdbcIO since new > connection is being obtained from an instance of javax.sql.DataSource > (created in @Setup) and which is org.apache.commons.dbcp2.BasicDataSource > by default. BasicDataSource uses connection pool and closes all idle > connections in "close()”. > > > > In its turn, JdbcIO calls DataSource.close() in @Teardown, so all idle > connections should be closed and released there in case of fails. Though, > potentially some connections, that has been delegated to client before and > were not not properly returned to pool, could be leaked… Anyway, I think it > could be a good idea to call "connection.close()” (return to connection > pool) explicitly in case of any exception happened during bundle processing. > > > > Probably JB may provide more details as original author of JdbcIO. > > > > On 14 Jan 2019, at 21:37, Kenneth Knowles <k...@apache.org> wrote: > > > > Hi Jonathan, > > > > JdbcIO.write() just invokes this DoFn: > https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765 > > > > It establishes a connection in @StartBundle and then in @FinishBundle it > commits a batch and closes the connection. If an error happens in > @StartBundle or @ProcessElement there will be a retry with a fresh instance > of the DoFn, which will establish a new connection. It looks like neither > @StartBundle nor @ProcessElement closes the connection, so I'm guessing > that the old connection sticks around because the worker process was not > terminated. So the Beam runner and Dataflow service are working as intended > and this is an issue with JdbcIO, unless I've made a mistake in my reading > or analysis. > > > > Would you mind reporting these details to > https://issues.apache.org/jira/projects/BEAM/ ? > > > > Kenn > > > > On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron < > jonathan.per...@lumapps.com> wrote: > >> > >> Hello ! > >> > >> My question is maybe mainly GCP-oriented, so I apologize if it is not > fully related to the Beam community. > >> > >> We have a streaming pipeline running on Dataflow which writes data to a > PostgreSQL instance hosted on Cloud SQL. This database is suffering from > connection leak spikes on a regular basis: > >> > >> <ofbkcnmdfbgcoooc.png> > >> > >> The connections are kept alive until the pipeline is canceled/drained: > >> > >> <gngklddbhnckgpni.png> > >> > >> We are writing to the database with: > >> > >> - individual DoFn where we open/close the connection using the standard > JDBC try/catch (SQLException ex)/finally statements; > >> > >> - a Pipeline.apply(JdbcIO.<SessionData>write()) operations. > >> > >> I observed that these spikes happens most of the time after I/O errors > with the database. Has anyone observed the same situation ? > >> > >> I have several questions/observations, please correct me if I am wrong > (I am not from the java environment, so some can seem pretty trivial) : > >> > >> - Does the apply method handles SQLException or I/O errors ? > >> > >> - Would the use of a connection pool prevents such behaviours ? If so, > how would one implement it to allow all workers to use it ? Could it be > implemented with JDBC Connection pooling ? > >> > >> I am worrying about the serialization if one would pass a Connection > item as an argument of a DoFn. > >> > >> Thank you in advance for your comments and reactions. > >> > >> Best regards, > >> > >> Jonathan > > > > >