Couldn't this be related also to the fact that @Teardown is
best-effort in Dataflow?

On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko
<[email protected]> wrote:
>
> Kenn,
>
> I’m not sure that we have a connection leak in JdbcIO since new connection is 
> being obtained from an instance of javax.sql.DataSource (created in @Setup) 
> and which is org.apache.commons.dbcp2.BasicDataSource by default. 
> BasicDataSource uses connection pool and closes all idle connections in 
> "close()”.
>
> In its turn, JdbcIO calls DataSource.close() in @Teardown, so all idle 
> connections should be closed and released there in case of fails. Though, 
> potentially some connections, that has been delegated to client before and 
> were not not properly returned to pool, could be leaked… Anyway, I think it 
> could be a good idea to call "connection.close()” (return to connection pool) 
> explicitly in case of any exception happened during bundle processing.
>
> Probably JB may provide more details as original author of JdbcIO.
>
> On 14 Jan 2019, at 21:37, Kenneth Knowles <[email protected]> wrote:
>
> Hi Jonathan,
>
> JdbcIO.write() just invokes this DoFn: 
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
>
> It establishes a connection in @StartBundle and then in @FinishBundle it 
> commits a batch and closes the connection. If an error happens in 
> @StartBundle or @ProcessElement there will be a retry with a fresh instance 
> of the DoFn, which will establish a new connection. It looks like neither 
> @StartBundle nor @ProcessElement closes the connection, so I'm guessing that 
> the old connection sticks around because the worker process was not 
> terminated. So the Beam runner and Dataflow service are working as intended 
> and this is an issue with JdbcIO, unless I've made a mistake in my reading or 
> analysis.
>
> Would you mind reporting these details to 
> https://issues.apache.org/jira/projects/BEAM/ ?
>
> Kenn
>
> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron 
> <[email protected]> wrote:
>>
>> Hello !
>>
>> My question is maybe mainly GCP-oriented, so I apologize if it is not fully 
>> related to the Beam community.
>>
>> We have a streaming pipeline running on Dataflow which writes data to a 
>> PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
>> connection leak spikes on a regular basis:
>>
>> <ofbkcnmdfbgcoooc.png>
>>
>> The connections are kept alive until the pipeline is canceled/drained:
>>
>> <gngklddbhnckgpni.png>
>>
>> We are writing to the database with:
>>
>> - individual DoFn where we open/close the connection using the standard JDBC 
>> try/catch (SQLException ex)/finally statements;
>>
>> - a Pipeline.apply(JdbcIO.<SessionData>write()) operations.
>>
>> I observed that these spikes happens most of the time after I/O errors with 
>> the database. Has anyone observed the same situation ?
>>
>> I have several questions/observations, please correct me if I am wrong (I am 
>> not from the java environment, so some can seem pretty trivial) :
>>
>> - Does the apply method handles SQLException or I/O errors ?
>>
>> - Would the use of a connection pool prevents such behaviours ? If so, how 
>> would one implement it to allow all workers to use it ? Could it be 
>> implemented with JDBC Connection pooling ?
>>
>> I am worrying about the serialization if one would pass a Connection item as 
>> an argument of a DoFn.
>>
>> Thank you in advance for your comments and reactions.
>>
>> Best regards,
>>
>> Jonathan
>
>

Reply via email to