Re: Connection leaks with PostgreSQL instance

2019-01-21 Thread Jonathan Perron

Dear all,

It happened again on Friday morning:

You can see a baseline in the connection amount from the 16th to the 18th.

Looking at the pg_stat_activity, all connections are used, even when the 
pipeline is not used at 100 % (my use case is processing data from a 
platform which is not used a lot during the week-end).


I opened a ticket on the project JIRA: 
https://issues.apache.org/jira/browse/BEAM-6475. I added as many 
information as I could gather, but let me know if anything else is required.


Best regards,

Jonathan

On 18/01/2019 06:17, Kenneth Knowles wrote:
My mistake - using @Teardown in this way is a good approach. It may 
not be executed sometimes, but like Reuven says it means the process 
died.


Kenn

On Thu, Jan 17, 2019 at 9:31 AM Jean-Baptiste Onofré > wrote:


Hi,

I don't think we have connection leak in normal behavior.

The actual SQL statement is executed in @FinishBundle, where the
connection is closed.

The processElement adds record to process.

Does it mean that an Exception occurs in the batch addition ?

Regards
JB

On 17/01/2019 12:41, Alexey Romanenko wrote:
> Kenn,
>
> I’m not sure that we have a connection leak in JdbcIO since new
> connection is being obtained from an instance
of /javax.sql.DataSource/
> (created in @Setup) and which
> is /org.apache.commons.dbcp2.BasicDataSource/ by default.
> /BasicDataSource/ uses connection pool and closes all idle
connections
> in "close()”.
>
> In its turn, JdbcIO calls/DataSource.close()/ in @Teardown, so
all idle
> connections should be closed and released there in case of fails.
> Though, potentially some connections, that has been delegated to
client
> before and were not not properly returned to pool, could be leaked…
> Anyway, I think it could be a good idea to call
"/connection.close()/”
> (return to connection pool) explicitly in case of any exception
happened
> during bundle processing.
>
> Probably JB may provide more details as original author of JdbcIO.
>
>> On 14 Jan 2019, at 21:37, Kenneth Knowles mailto:k...@apache.org>
>> >> wrote:
>>
>> Hi Jonathan,
>>
>> JdbcIO.write() just invokes this
>> DoFn:

https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
>>
>> It establishes a connection in @StartBundle and then in
@FinishBundle
>> it commits a batch and closes the connection. If an error happens
>> in @StartBundle or @ProcessElement there will be a retry with a
fresh
>> instance of the DoFn, which will establish a new connection. It
looks
>> like neither @StartBundle nor @ProcessElement closes the
connection,
>> so I'm guessing that the old connection sticks around because the
>> worker process was not terminated. So the Beam runner and Dataflow
>> service are working as intended and this is an issue with JdbcIO,
>> unless I've made a mistake in my reading or analysis.
>>
>> Would you mind reporting these details
>> to https://issues.apache.org/jira/projects/BEAM/ ?
>>
>> Kenn
>>
>> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron
>> mailto:jonathan.per...@lumapps.com>
>> wrote:
>>
>>     Hello !
>>
>>     My question is maybe mainly GCP-oriented, so I apologize if
it is
>>     not fully related to the Beam community.
>>
>>     We have a streaming pipeline running on Dataflow which
writes data
>>     to a PostgreSQL instance hosted on Cloud SQL. This database is
>>     suffering from connection leak spikes on a regular basis:
>>
>>     
>>
>>     The connections are kept alive until the pipeline is
canceled/drained:
>>
>>     
>>
>>     We are writing to the database with:
>>
>>     - individual DoFn where we open/close the connection using the
>>     standard JDBC try/catch (SQLException ex)/finally statements;
>>
>>     - a Pipeline.apply(JdbcIO.write()) operations.
>>
>>     I observed that these spikes happens most of the time after I/O
>>     errors with the database. Has anyone observed the same
situation ?
>>
>>     I have several questions/observations, please correct me if
I am
>>     wrong (I am not from the java environment, so some can seem
pretty
>>     trivial) :
>>
>>     - Does the apply method handles SQLException or I/O errors ?
>>
>>     - Would the use of a connection pool prevents such
behaviours ? If
>>     so, how would one implement it to allow all workers to use it ?
>>     Could it be implemented with JDBC Connection pooling ?
>>
>>     I am 

Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Kenneth Knowles
My mistake - using @Teardown in this way is a good approach. It may not be
executed sometimes, but like Reuven says it means the process died.

Kenn

On Thu, Jan 17, 2019 at 9:31 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> I don't think we have connection leak in normal behavior.
>
> The actual SQL statement is executed in @FinishBundle, where the
> connection is closed.
>
> The processElement adds record to process.
>
> Does it mean that an Exception occurs in the batch addition ?
>
> Regards
> JB
>
> On 17/01/2019 12:41, Alexey Romanenko wrote:
> > Kenn,
> >
> > I’m not sure that we have a connection leak in JdbcIO since new
> > connection is being obtained from an instance of /javax.sql.DataSource/
> > (created in @Setup) and which
> > is /org.apache.commons.dbcp2.BasicDataSource/ by default.
> > /BasicDataSource/ uses connection pool and closes all idle connections
> > in "close()”.
> >
> > In its turn, JdbcIO calls/DataSource.close()/ in @Teardown, so all idle
> > connections should be closed and released there in case of fails.
> > Though, potentially some connections, that has been delegated to client
> > before and were not not properly returned to pool, could be leaked…
> > Anyway, I think it could be a good idea to call "/connection.close()/”
> > (return to connection pool) explicitly in case of any exception happened
> > during bundle processing.
> >
> > Probably JB may provide more details as original author of JdbcIO.
> >
> >> On 14 Jan 2019, at 21:37, Kenneth Knowles  >> > wrote:
> >>
> >> Hi Jonathan,
> >>
> >> JdbcIO.write() just invokes this
> >> DoFn:
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
> >>
> >> It establishes a connection in @StartBundle and then in @FinishBundle
> >> it commits a batch and closes the connection. If an error happens
> >> in @StartBundle or @ProcessElement there will be a retry with a fresh
> >> instance of the DoFn, which will establish a new connection. It looks
> >> like neither @StartBundle nor @ProcessElement closes the connection,
> >> so I'm guessing that the old connection sticks around because the
> >> worker process was not terminated. So the Beam runner and Dataflow
> >> service are working as intended and this is an issue with JdbcIO,
> >> unless I've made a mistake in my reading or analysis.
> >>
> >> Would you mind reporting these details
> >> to https://issues.apache.org/jira/projects/BEAM/ ?
> >>
> >> Kenn
> >>
> >> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron
> >> mailto:jonathan.per...@lumapps.com>>
> wrote:
> >>
> >> Hello !
> >>
> >> My question is maybe mainly GCP-oriented, so I apologize if it is
> >> not fully related to the Beam community.
> >>
> >> We have a streaming pipeline running on Dataflow which writes data
> >> to a PostgreSQL instance hosted on Cloud SQL. This database is
> >> suffering from connection leak spikes on a regular basis:
> >>
> >> 
> >>
> >> The connections are kept alive until the pipeline is
> canceled/drained:
> >>
> >> 
> >>
> >> We are writing to the database with:
> >>
> >> - individual DoFn where we open/close the connection using the
> >> standard JDBC try/catch (SQLException ex)/finally statements;
> >>
> >> - a Pipeline.apply(JdbcIO.write()) operations.
> >>
> >> I observed that these spikes happens most of the time after I/O
> >> errors with the database. Has anyone observed the same situation ?
> >>
> >> I have several questions/observations, please correct me if I am
> >> wrong (I am not from the java environment, so some can seem pretty
> >> trivial) :
> >>
> >> - Does the apply method handles SQLException or I/O errors ?
> >>
> >> - Would the use of a connection pool prevents such behaviours ? If
> >> so, how would one implement it to allow all workers to use it ?
> >> Could it be implemented with JDBC Connection pooling ?
> >>
> >> I am worrying about the serialization if one would pass a
> >> Connection item as an argument of a DoFn.
> >>
> >> Thank you in advance for your comments and reactions.
> >>
> >> Best regards,
> >>
> >> Jonathan
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Jean-Baptiste Onofré
Hi,

I don't think we have connection leak in normal behavior.

The actual SQL statement is executed in @FinishBundle, where the
connection is closed.

The processElement adds record to process.

Does it mean that an Exception occurs in the batch addition ?

Regards
JB

On 17/01/2019 12:41, Alexey Romanenko wrote:
> Kenn,
> 
> I’m not sure that we have a connection leak in JdbcIO since new
> connection is being obtained from an instance of /javax.sql.DataSource/
> (created in @Setup) and which
> is /org.apache.commons.dbcp2.BasicDataSource/ by default.
> /BasicDataSource/ uses connection pool and closes all idle connections
> in "close()”. 
> 
> In its turn, JdbcIO calls/DataSource.close()/ in @Teardown, so all idle
> connections should be closed and released there in case of fails.
> Though, potentially some connections, that has been delegated to client
> before and were not not properly returned to pool, could be leaked…
> Anyway, I think it could be a good idea to call "/connection.close()/”
> (return to connection pool) explicitly in case of any exception happened
> during bundle processing.
> 
> Probably JB may provide more details as original author of JdbcIO.
> 
>> On 14 Jan 2019, at 21:37, Kenneth Knowles > > wrote:
>>
>> Hi Jonathan,
>>
>> JdbcIO.write() just invokes this
>> DoFn: 
>> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
>>
>> It establishes a connection in @StartBundle and then in @FinishBundle
>> it commits a batch and closes the connection. If an error happens
>> in @StartBundle or @ProcessElement there will be a retry with a fresh
>> instance of the DoFn, which will establish a new connection. It looks
>> like neither @StartBundle nor @ProcessElement closes the connection,
>> so I'm guessing that the old connection sticks around because the
>> worker process was not terminated. So the Beam runner and Dataflow
>> service are working as intended and this is an issue with JdbcIO,
>> unless I've made a mistake in my reading or analysis.
>>
>> Would you mind reporting these details
>> to https://issues.apache.org/jira/projects/BEAM/ ?
>>
>> Kenn
>>
>> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron
>> mailto:jonathan.per...@lumapps.com>> wrote:
>>
>> Hello !
>>
>> My question is maybe mainly GCP-oriented, so I apologize if it is
>> not fully related to the Beam community.
>>
>> We have a streaming pipeline running on Dataflow which writes data
>> to a PostgreSQL instance hosted on Cloud SQL. This database is
>> suffering from connection leak spikes on a regular basis:
>>
>> 
>>
>> The connections are kept alive until the pipeline is canceled/drained:
>>
>> 
>>
>> We are writing to the database with:
>>
>> - individual DoFn where we open/close the connection using the
>> standard JDBC try/catch (SQLException ex)/finally statements;
>>
>> - a Pipeline.apply(JdbcIO.write()) operations.
>>
>> I observed that these spikes happens most of the time after I/O
>> errors with the database. Has anyone observed the same situation ?
>>
>> I have several questions/observations, please correct me if I am
>> wrong (I am not from the java environment, so some can seem pretty
>> trivial) :
>>
>> - Does the apply method handles SQLException or I/O errors ?
>>
>> - Would the use of a connection pool prevents such behaviours ? If
>> so, how would one implement it to allow all workers to use it ?
>> Could it be implemented with JDBC Connection pooling ?
>>
>> I am worrying about the serialization if one would pass a
>> Connection item as an argument of a DoFn.
>>
>> Thank you in advance for your comments and reactions.
>>
>> Best regards,
>>
>> Jonathan
>>
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Reuven Lax
But I think it will be called unless the process crashes though.

On Thu, Jan 17, 2019, 7:05 AM Ismaël Mejía  Couldn't this be related also to the fact that @Teardown is
> best-effort in Dataflow?
>
> On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko
>  wrote:
> >
> > Kenn,
> >
> > I’m not sure that we have a connection leak in JdbcIO since new
> connection is being obtained from an instance of javax.sql.DataSource
> (created in @Setup) and which is org.apache.commons.dbcp2.BasicDataSource
> by default. BasicDataSource uses connection pool and closes all idle
> connections in "close()”.
> >
> > In its turn, JdbcIO calls DataSource.close() in @Teardown, so all idle
> connections should be closed and released there in case of fails. Though,
> potentially some connections, that has been delegated to client before and
> were not not properly returned to pool, could be leaked… Anyway, I think it
> could be a good idea to call "connection.close()” (return to connection
> pool) explicitly in case of any exception happened during bundle processing.
> >
> > Probably JB may provide more details as original author of JdbcIO.
> >
> > On 14 Jan 2019, at 21:37, Kenneth Knowles  wrote:
> >
> > Hi Jonathan,
> >
> > JdbcIO.write() just invokes this DoFn:
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
> >
> > It establishes a connection in @StartBundle and then in @FinishBundle it
> commits a batch and closes the connection. If an error happens in
> @StartBundle or @ProcessElement there will be a retry with a fresh instance
> of the DoFn, which will establish a new connection. It looks like neither
> @StartBundle nor @ProcessElement closes the connection, so I'm guessing
> that the old connection sticks around because the worker process was not
> terminated. So the Beam runner and Dataflow service are working as intended
> and this is an issue with JdbcIO, unless I've made a mistake in my reading
> or analysis.
> >
> > Would you mind reporting these details to
> https://issues.apache.org/jira/projects/BEAM/ ?
> >
> > Kenn
> >
> > On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron <
> jonathan.per...@lumapps.com> wrote:
> >>
> >> Hello !
> >>
> >> My question is maybe mainly GCP-oriented, so I apologize if it is not
> fully related to the Beam community.
> >>
> >> We have a streaming pipeline running on Dataflow which writes data to a
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from
> connection leak spikes on a regular basis:
> >>
> >> 
> >>
> >> The connections are kept alive until the pipeline is canceled/drained:
> >>
> >> 
> >>
> >> We are writing to the database with:
> >>
> >> - individual DoFn where we open/close the connection using the standard
> JDBC try/catch (SQLException ex)/finally statements;
> >>
> >> - a Pipeline.apply(JdbcIO.write()) operations.
> >>
> >> I observed that these spikes happens most of the time after I/O errors
> with the database. Has anyone observed the same situation ?
> >>
> >> I have several questions/observations, please correct me if I am wrong
> (I am not from the java environment, so some can seem pretty trivial) :
> >>
> >> - Does the apply method handles SQLException or I/O errors ?
> >>
> >> - Would the use of a connection pool prevents such behaviours ? If so,
> how would one implement it to allow all workers to use it ? Could it be
> implemented with JDBC Connection pooling ?
> >>
> >> I am worrying about the serialization if one would pass a Connection
> item as an argument of a DoFn.
> >>
> >> Thank you in advance for your comments and reactions.
> >>
> >> Best regards,
> >>
> >> Jonathan
> >
> >
>


Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Ismaël Mejía
Couldn't this be related also to the fact that @Teardown is
best-effort in Dataflow?

On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko
 wrote:
>
> Kenn,
>
> I’m not sure that we have a connection leak in JdbcIO since new connection is 
> being obtained from an instance of javax.sql.DataSource (created in @Setup) 
> and which is org.apache.commons.dbcp2.BasicDataSource by default. 
> BasicDataSource uses connection pool and closes all idle connections in 
> "close()”.
>
> In its turn, JdbcIO calls DataSource.close() in @Teardown, so all idle 
> connections should be closed and released there in case of fails. Though, 
> potentially some connections, that has been delegated to client before and 
> were not not properly returned to pool, could be leaked… Anyway, I think it 
> could be a good idea to call "connection.close()” (return to connection pool) 
> explicitly in case of any exception happened during bundle processing.
>
> Probably JB may provide more details as original author of JdbcIO.
>
> On 14 Jan 2019, at 21:37, Kenneth Knowles  wrote:
>
> Hi Jonathan,
>
> JdbcIO.write() just invokes this DoFn: 
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
>
> It establishes a connection in @StartBundle and then in @FinishBundle it 
> commits a batch and closes the connection. If an error happens in 
> @StartBundle or @ProcessElement there will be a retry with a fresh instance 
> of the DoFn, which will establish a new connection. It looks like neither 
> @StartBundle nor @ProcessElement closes the connection, so I'm guessing that 
> the old connection sticks around because the worker process was not 
> terminated. So the Beam runner and Dataflow service are working as intended 
> and this is an issue with JdbcIO, unless I've made a mistake in my reading or 
> analysis.
>
> Would you mind reporting these details to 
> https://issues.apache.org/jira/projects/BEAM/ ?
>
> Kenn
>
> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron 
>  wrote:
>>
>> Hello !
>>
>> My question is maybe mainly GCP-oriented, so I apologize if it is not fully 
>> related to the Beam community.
>>
>> We have a streaming pipeline running on Dataflow which writes data to a 
>> PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
>> connection leak spikes on a regular basis:
>>
>> 
>>
>> The connections are kept alive until the pipeline is canceled/drained:
>>
>> 
>>
>> We are writing to the database with:
>>
>> - individual DoFn where we open/close the connection using the standard JDBC 
>> try/catch (SQLException ex)/finally statements;
>>
>> - a Pipeline.apply(JdbcIO.write()) operations.
>>
>> I observed that these spikes happens most of the time after I/O errors with 
>> the database. Has anyone observed the same situation ?
>>
>> I have several questions/observations, please correct me if I am wrong (I am 
>> not from the java environment, so some can seem pretty trivial) :
>>
>> - Does the apply method handles SQLException or I/O errors ?
>>
>> - Would the use of a connection pool prevents such behaviours ? If so, how 
>> would one implement it to allow all workers to use it ? Could it be 
>> implemented with JDBC Connection pooling ?
>>
>> I am worrying about the serialization if one would pass a Connection item as 
>> an argument of a DoFn.
>>
>> Thank you in advance for your comments and reactions.
>>
>> Best regards,
>>
>> Jonathan
>
>


Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Alexey Romanenko
Kenn,

I’m not sure that we have a connection leak in JdbcIO since new connection is 
being obtained from an instance of javax.sql.DataSource (created in @Setup) and 
which is org.apache.commons.dbcp2.BasicDataSource by default. BasicDataSource 
uses connection pool and closes all idle connections in "close()”. 

In its turn, JdbcIO calls DataSource.close() in @Teardown, so all idle 
connections should be closed and released there in case of fails. Though, 
potentially some connections, that has been delegated to client before and were 
not not properly returned to pool, could be leaked… Anyway, I think it could be 
a good idea to call "connection.close()” (return to connection pool) explicitly 
in case of any exception happened during bundle processing.

Probably JB may provide more details as original author of JdbcIO.

> On 14 Jan 2019, at 21:37, Kenneth Knowles  wrote:
> 
> Hi Jonathan,
> 
> JdbcIO.write() just invokes this DoFn: 
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
>  
> 
> 
> It establishes a connection in @StartBundle and then in @FinishBundle it 
> commits a batch and closes the connection. If an error happens in 
> @StartBundle or @ProcessElement there will be a retry with a fresh instance 
> of the DoFn, which will establish a new connection. It looks like neither 
> @StartBundle nor @ProcessElement closes the connection, so I'm guessing that 
> the old connection sticks around because the worker process was not 
> terminated. So the Beam runner and Dataflow service are working as intended 
> and this is an issue with JdbcIO, unless I've made a mistake in my reading or 
> analysis.
> 
> Would you mind reporting these details to 
> https://issues.apache.org/jira/projects/BEAM/ 
>  ?
> 
> Kenn
> 
> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron  > wrote:
> Hello !
> 
> My question is maybe mainly GCP-oriented, so I apologize if it is not fully 
> related to the Beam community.
> 
> We have a streaming pipeline running on Dataflow which writes data to a 
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
> connection leak spikes on a regular basis:
> 
> 
> 
> The connections are kept alive until the pipeline is canceled/drained:
> 
> 
> 
> We are writing to the database with:
> 
> - individual DoFn where we open/close the connection using the standard JDBC 
> try/catch (SQLException ex)/finally statements;
> 
> - a Pipeline.apply(JdbcIO.write()) operations.
> 
> I observed that these spikes happens most of the time after I/O errors with 
> the database. Has anyone observed the same situation ?
> 
> I have several questions/observations, please correct me if I am wrong (I am 
> not from the java environment, so some can seem pretty trivial) :
> 
> - Does the apply method handles SQLException or I/O errors ?
> 
> - Would the use of a connection pool prevents such behaviours ? If so, how 
> would one implement it to allow all workers to use it ? Could it be 
> implemented with JDBC Connection pooling ?
> 
> I am worrying about the serialization if one would pass a Connection item as 
> an argument of a DoFn.
> 
> Thank you in advance for your comments and reactions.
> 
> Best regards,
> 
> Jonathan



Re: Connection leaks with PostgreSQL instance

2019-01-16 Thread Jonathan Perron

Hi Kenneth,

Thank you for your reply. I find out that the leak was coming somehow 
from my individual DoFns. I replaced all the connections by a connection 
pooling and I haven't seen connection leaks since. I will keep 
monitoring the pipeline state and if I see new leaks, I would 
investigate to provide more details to the JIRA board.


Best regards,

Jonathan

On 14/01/2019 21:37, Kenneth Knowles wrote:

Hi Jonathan,

JdbcIO.write() just invokes this DoFn: 
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765


It establishes a connection in @StartBundle and then in @FinishBundle 
it commits a batch and closes the connection. If an error happens 
in @StartBundle or @ProcessElement there will be a retry with a fresh 
instance of the DoFn, which will establish a new connection. It looks 
like neither @StartBundle nor @ProcessElement closes the connection, 
so I'm guessing that the old connection sticks around because the 
worker process was not terminated. So the Beam runner and Dataflow 
service are working as intended and this is an issue with JdbcIO, 
unless I've made a mistake in my reading or analysis.


Would you mind reporting these details to 
https://issues.apache.org/jira/projects/BEAM/ ?


Kenn

On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron 
mailto:jonathan.per...@lumapps.com>> wrote:


Hello !

My question is maybe mainly GCP-oriented, so I apologize if it is
not fully related to the Beam community.

We have a streaming pipeline running on Dataflow which writes data
to a PostgreSQL instance hosted on Cloud SQL. This database is
suffering from connection leak spikes on a regular basis:

The connections are kept alive until the pipeline is canceled/drained:

We are writing to the database with:

- individual DoFn where we open/close the connection using the
standard JDBC try/catch (SQLException ex)/finally statements;

- a Pipeline.apply(JdbcIO.write()) operations.

I observed that these spikes happens most of the time after I/O
errors with the database. Has anyone observed the same situation ?

I have several questions/observations, please correct me if I am
wrong (I am not from the java environment, so some can seem pretty
trivial) :

- Does the apply method handles SQLException or I/O errors ?

- Would the use of a connection pool prevents such behaviours ? If
so, how would one implement it to allow all workers to use it ?
Could it be implemented with JDBC Connection pooling ?

I am worrying about the serialization if one would pass a
Connection item as an argument of a DoFn.

Thank you in advance for your comments and reactions.

Best regards,

Jonathan



Re: Connection leaks with PostgreSQL instance

2019-01-14 Thread Kenneth Knowles
Hi Jonathan,

JdbcIO.write() just invokes this DoFn:
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765

It establishes a connection in @StartBundle and then in @FinishBundle it
commits a batch and closes the connection. If an error happens
in @StartBundle or @ProcessElement there will be a retry with a fresh
instance of the DoFn, which will establish a new connection. It looks like
neither @StartBundle nor @ProcessElement closes the connection, so I'm
guessing that the old connection sticks around because the worker process
was not terminated. So the Beam runner and Dataflow service are working as
intended and this is an issue with JdbcIO, unless I've made a mistake in my
reading or analysis.

Would you mind reporting these details to
https://issues.apache.org/jira/projects/BEAM/ ?

Kenn

On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron <
jonathan.per...@lumapps.com> wrote:

> Hello !
>
> My question is maybe mainly GCP-oriented, so I apologize if it is not
> fully related to the Beam community.
>
> We have a streaming pipeline running on Dataflow which writes data to a
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from
> connection leak spikes on a regular basis:
>
> The connections are kept alive until the pipeline is canceled/drained:
>
> We are writing to the database with:
>
> - individual DoFn where we open/close the connection using the standard
> JDBC try/catch (SQLException ex)/finally statements;
>
> - a Pipeline.apply(JdbcIO.write()) operations.
>
> I observed that these spikes happens most of the time after I/O errors
> with the database. Has anyone observed the same situation ?
>
> I have several questions/observations, please correct me if I am wrong (I
> am not from the java environment, so some can seem pretty trivial) :
>
> - Does the apply method handles SQLException or I/O errors ?
>
> - Would the use of a connection pool prevents such behaviours ? If so, how
> would one implement it to allow all workers to use it ? Could it be
> implemented with JDBC Connection pooling ?
>
> I am worrying about the serialization if one would pass a Connection item
> as an argument of a DoFn.
>
> Thank you in advance for your comments and reactions.
>
> Best regards,
>
> Jonathan
>


Connection leaks with PostgreSQL instance

2019-01-14 Thread Jonathan Perron

Hello !

My question is maybe mainly GCP-oriented, so I apologize if it is not 
fully related to the Beam community.


We have a streaming pipeline running on Dataflow which writes data to a 
PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
connection leak spikes on a regular basis:


The connections are kept alive until the pipeline is canceled/drained:

We are writing to the database with:

- individual DoFn where we open/close the connection using the standard 
JDBC try/catch (SQLException ex)/finally statements;


- a Pipeline.apply(JdbcIO.write()) operations.

I observed that these spikes happens most of the time after I/O errors 
with the database. Has anyone observed the same situation ?


I have several questions/observations, please correct me if I am wrong 
(I am not from the java environment, so some can seem pretty trivial) :


- Does the apply method handles SQLException or I/O errors ?

- Would the use of a connection pool prevents such behaviours ? If so, 
how would one implement it to allow all workers to use it ? Could it be 
implemented with JDBC Connection pooling ?


I am worrying about the serialization if one would pass a Connection 
item as an argument of a DoFn.


Thank you in advance for your comments and reactions.

Best regards,

Jonathan