[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83777 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 19:36 Start Date: 23/Mar/18 19:36 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-375776291 Cool ! Thanks a lot !! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83777) Time Spent: 7.5h (was: 7h 20m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83767 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 19:07 Start Date: 23/Mar/18 19:07 Worklog Time Spent: 10m Work Description: jkff closed pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/io/jdbc/pom.xml b/sdks/java/io/jdbc/pom.xml index 9151cfb80ae..577d4f55a74 100644 --- a/sdks/java/io/jdbc/pom.xml +++ b/sdks/java/io/jdbc/pom.xml @@ -279,6 +279,11 @@ commons-dbcp2 2.1.1 + + org.apache.commons + commons-pool2 + 2.4.2 + joda-time diff --git a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java index f7a66045886..35e4aacd5ac 100644 --- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java +++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java @@ -52,6 +52,11 @@ import org.apache.beam.sdk.values.PCollectionView; import org.apache.beam.sdk.values.PDone; import org.apache.commons.dbcp2.BasicDataSource; +import org.apache.commons.dbcp2.DataSourceConnectionFactory; +import org.apache.commons.dbcp2.PoolableConnectionFactory; +import org.apache.commons.dbcp2.PoolingDataSource; +import org.apache.commons.pool2.impl.GenericObjectPool; +import org.apache.commons.pool2.impl.GenericObjectPoolConfig; import org.joda.time.Duration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -235,17 +240,17 @@ public static DataSourceConfiguration create(DataSource dataSource) { checkArgument(dataSource != null, "dataSource can not be null"); checkArgument(dataSource instanceof Serializable, "dataSource must be Serializable"); return new AutoValue_JdbcIO_DataSourceConfiguration.Builder() - .setDataSource(dataSource) - .build(); + .setDataSource(dataSource) + .build(); } public static DataSourceConfiguration create(String driverClassName, String url) { checkArgument(driverClassName != null, "driverClassName can not be null"); checkArgument(url != null, "url can not be null"); return new AutoValue_JdbcIO_DataSourceConfiguration.Builder() - .setDriverClassName(ValueProvider.StaticValueProvider.of(driverClassName)) - .setUrl(ValueProvider.StaticValueProvider.of(url)) - .build(); + .setDriverClassName(ValueProvider.StaticValueProvider.of(driverClassName)) + .setUrl(ValueProvider.StaticValueProvider.of(url)) + .build(); } public static DataSourceConfiguration create(ValueProvider driverClassName, @@ -254,8 +259,7 @@ public static DataSourceConfiguration create(ValueProvider driverClassNa checkArgument(url != null, "url can not be null"); return new AutoValue_JdbcIO_DataSourceConfiguration.Builder() .setDriverClassName(driverClassName) - .setUrl(url) - .build(); + .setUrl(url).build(); } public DataSourceConfiguration withUsername(String username) { @@ -307,9 +311,10 @@ private void populateDisplayData(DisplayData.Builder builder) { } } -DataSource buildDatasource() throws Exception{ +DataSource buildDatasource() throws Exception { + DataSource current = null; if (getDataSource() != null) { -return getDataSource(); +current = getDataSource(); } else { BasicDataSource basicDataSource = new BasicDataSource(); if (getDriverClassName() != null) { @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(1);
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83766 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 19:06 Start Date: 23/Mar/18 19:06 Worklog Time Spent: 10m Work Description: jkff commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-375769221 Yes, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83766) Time Spent: 7h 10m (was: 7h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83640 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 16:20 Start Date: 23/Mar/18 16:20 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-375720383 @jkff is it OK for you if I merge this one ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83640) Time Spent: 7h (was: 6h 50m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83546 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 10:14 Start Date: 23/Mar/18 10:14 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-375606145 By the way, here's the pipeline I used for the test: https://github.com/jbonofre/beam-samples/blob/master/iot/src/main/java/org/apache/beam/samples/iot/JmsToJdbc.java This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83546) Time Spent: 6h 50m (was: 6h 40m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83545 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 10:13 Start Date: 23/Mar/18 10:13 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-375605850 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83545) Time Spent: 6h 40m (was: 6.5h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83544 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 23/Mar/18 10:13 Start Date: 23/Mar/18 10:13 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-375605773 I tested with monitoring the connections on a local Derby database: ``` ij> show connections; CONNECTION0* - jdbc:derby://localhost:1527/test ``` I confirm that the connection is released while the pipeline doesn't get new data. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83544) Time Spent: 6.5h (was: 6h 20m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=82423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82423 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 20/Mar/18 19:28 Start Date: 20/Mar/18 19:28 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-374726889 I did a first test and so far so good: I can see the connection released. Let me add some instrumentation in the output to paste here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82423) Time Spent: 6h 20m (was: 6h 10m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=81992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81992 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 19/Mar/18 19:50 Start Date: 19/Mar/18 19:50 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-374127575 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81992) Time Spent: 6h 10m (was: 6h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=81990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81990 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 19/Mar/18 19:49 Start Date: 19/Mar/18 19:49 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-374347060 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81990) Time Spent: 6h (was: 5h 50m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=81766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81766 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 19/Mar/18 07:54 Start Date: 19/Mar/18 07:54 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-374127575 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81766) Time Spent: 5h 50m (was: 5h 40m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80984 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 15/Mar/18 21:28 Start Date: 15/Mar/18 21:28 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174938254 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(1); + poolConfig.setMinIdle(0); + poolConfig.setMinEvictableIdleTimeMillis(1); Review comment: I don't know if it's worth it to verify the behavior automatically, I'm just asking to manually run a streaming pipeline with JdbcIO and confirm that it actually creates and releases connections when the pipeline is idle for some time (maybe enable logging in dbcp somewhere. Or at least confirm that the pipeline works at all for a few minutes). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80984) Time Spent: 5.5h (was: 5h 20m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80977 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 15/Mar/18 20:35 Start Date: 15/Mar/18 20:35 Worklog Time Spent: 10m Work Description: jbonofre commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174923818 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(1); + poolConfig.setMinIdle(0); + poolConfig.setMinEvictableIdleTimeMillis(1); Review comment: Basically, you mean trying to implement a test to illustrate/test ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80977) Time Spent: 5h 20m (was: 5h 10m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80607 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 14/Mar/18 22:05 Start Date: 14/Mar/18 22:05 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174624347 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -509,7 +530,9 @@ public void populateDisplayData(DisplayData.Builder builder) { builder.add(DisplayData.item("query", getQuery())); builder.add(DisplayData.item("rowMapper", getRowMapper().getClass().getName())); builder.add(DisplayData.item("coder", getCoder().getClass().getName())); - getDataSourceConfiguration().populateDisplayData(builder); + if (getDataSourceConfiguration() != null) { Review comment: Ditto This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80607) Time Spent: 4h 50m (was: 4h 40m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80609 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 14/Mar/18 22:05 Start Date: 14/Mar/18 22:05 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174624327 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -423,7 +444,9 @@ public void populateDisplayData(DisplayData.Builder builder) { builder.add(DisplayData.item("query", getQuery())); builder.add(DisplayData.item("rowMapper", getRowMapper().getClass().getName())); builder.add(DisplayData.item("coder", getCoder().getClass().getName())); - getDataSourceConfiguration().populateDisplayData(builder); + if (getDataSourceConfiguration() != null) { Review comment: It can't be null, as per the check at line 422 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80609) Time Spent: 5h 10m (was: 5h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80608 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 14/Mar/18 22:05 Start Date: 14/Mar/18 22:05 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174624597 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(1); + poolConfig.setMinIdle(0); + poolConfig.setMinEvictableIdleTimeMillis(1); Review comment: Thanks, this looks reasonable, though is there any way to manually verify that this is working as intended in practice? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80608) Time Spent: 5h (was: 4h 50m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80502 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 14/Mar/18 19:36 Start Date: 14/Mar/18 19:36 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-373148447 Rebased and updated according to @jkff comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80502) Time Spent: 4h 40m (was: 4.5h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79991 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 13/Mar/18 17:58 Start Date: 13/Mar/18 17:58 Worklog Time Spent: 10m Work Description: jbonofre commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174228923 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: Oh sorry, you are right. I thought I added `dataSourceFactory` in another PR ;) In that case, I agree with you, I can be removed ;) I do the cleanup. Thanks ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79991) Time Spent: 4.5h (was: 4h 20m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79989 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 13/Mar/18 17:56 Start Date: 13/Mar/18 17:56 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174228175 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: What do you mean by backwards compatibility? This *is* the PR that adds `dataSourceFactory`, so not adding it in this PR does not break compatibility. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79989) Time Spent: 4h 20m (was: 4h 10m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79926 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 13/Mar/18 16:19 Start Date: 13/Mar/18 16:19 Worklog Time Spent: 10m Work Description: jbonofre commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r174194623 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: Maybe for backward compatibility, it would be better to still support `dataSourceFactory`, right ? In that case, I would "wrap" `dataSourceFactory` in the pool as you suggested previously. Agreed ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79926) Time Spent: 4h 10m (was: 4h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79520 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 12/Mar/18 16:48 Start Date: 12/Mar/18 16:48 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173867345 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: In other words: In my view, dataSourceFactory is rather for cases where the DataSource requires some trickery to even construct, e.g. requires setting some vendor-specific connection parameters to establish the connection. I don't know if such cases exist at all, and I'd be open to simply dropping this feature from this PR. But if we think it's necessary, then I think we still want to wrap that with pooling, and I think for a user to try to configure pooling themselves is always a bad idea. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79520) Time Spent: 4h (was: 3h 50m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79518 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 12/Mar/18 16:44 Start Date: 12/Mar/18 16:44 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173865797 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: Can you think of a case where the current default pooling is not a good choice AND a user can provide a pooling configuration that will reliably give better behavior regardless of runner and of implementation details of JdbcIO? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79518) Time Spent: 3h 50m (was: 3h 40m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79336 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 12/Mar/18 06:31 Start Date: 12/Mar/18 06:31 Worklog Time Spent: 10m Work Description: jbonofre commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173702692 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: In the case where the user provides its own `dataSourceFactory`, I guess he wants to have complete control and can "wrap" himself in a pool no ? The pooling is in the case of the user provides a `dataSource` or configuration to create a `dataSource` internally in the IO (using DBCP). Thoughts ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79336) Time Spent: 3h 40m (was: 3.5h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79305 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 18:26 Start Date: 11/Mar/18 18:26 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173664163 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,12 +342,42 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(getPoolMaxTotal()); + poolConfig.setBlockWhenExhausted(true); Review comment: And you also need to configure eviction as per https://stackoverflow.com/questions/38685092/dbcp2-when-are-idle-connections-removed-from-the-pool - otherwise, with minIdle=0, the idle connections will be closed immediately on release rather than after an idle timeout. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79305) Time Spent: 3h 10m (was: 3h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79306 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 18:26 Start Date: 11/Mar/18 18:26 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173664096 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder builder) { private ReadFn( DataSourceConfiguration dataSourceConfiguration, +DataSourceFactory dataSourceFactory, ValueProvider query, PreparedStatementSetter parameterSetter, RowMapper rowMapper) { this.dataSourceConfiguration = dataSourceConfiguration; + this.dataSourceFactory = dataSourceFactory; this.query = query; this.parameterSetter = parameterSetter; this.rowMapper = rowMapper; } @Setup public void setup() throws Exception { - dataSource = dataSourceConfiguration.buildDatasource(); + if (dataSourceFactory != null) { +dataSource = dataSourceFactory.create(); Review comment: Don't we want to wrap the result of `dataSourceFactory` in a pool? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79306) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79310 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 18:26 Start Date: 11/Mar/18 18:26 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173664103 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -664,9 +751,18 @@ public WriteFn(Write spec) { @Setup public void setup() throws Exception { -dataSource = spec.getDataSourceConfiguration().buildDatasource(); +if (spec.getDataSourceFactory() != null) { + dataSource = spec.getDataSourceFactory().create(); Review comment: Ditto This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79310) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79309 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 18:26 Start Date: 11/Mar/18 18:26 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173663972 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,12 +342,42 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(getPoolMaxTotal()); + poolConfig.setBlockWhenExhausted(true); Review comment: Almost all of these parameters are also irrelevant because there will be at most 1 connection - please keep only the necessary ones. I think really you only need maxTotal=1, minIdle=0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79309) Time Spent: 3.5h (was: 3h 20m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79308 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 18:26 Start Date: 11/Mar/18 18:26 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173664072 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,12 +342,42 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(getPoolMaxTotal()); + poolConfig.setBlockWhenExhausted(true); + poolConfig.setMaxIdle(getPoolMaxTotal()); + poolConfig.setMinIdle(0); + poolConfig.setTestOnBorrow(false); + poolConfig.setTestOnReturn(false); + poolConfig.setTestWhileIdle(true); + poolConfig.setLifo(true); + GenericObjectPool connectionPool = + new GenericObjectPool(poolableConnectionFactory, poolConfig); + poolableConnectionFactory.setPool(connectionPool); + poolableConnectionFactory.setDefaultAutoCommit(false); Review comment: I don't think these settings are necessary either. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79308) Time Spent: 3h 20m (was: 3h 10m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79307 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 18:26 Start Date: 11/Mar/18 18:26 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173663913 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -297,6 +306,10 @@ public DataSourceConfiguration withConnectionProperties( return builder().setConnectionProperties(connectionProperties).build(); } +public DataSourceConfiguration withPoolMaxTotal(int poolMaxTotal) { Review comment: This parameter has no effect, please remove it. A single DoFn instance is used from a single thread, so there will never be more than 1 connection at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79307) Time Spent: 3h 20m (was: 3h 10m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79304 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 11/Mar/18 17:55 Start Date: 11/Mar/18 17:55 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#issuecomment-372134799 Rebased and only expose max number of connections in the pool to end users. @jkff please let me know if it's OK for you ;) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79304) Time Spent: 3h (was: 2h 50m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78957 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 09/Mar/18 18:00 Start Date: 09/Mar/18 18:00 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173522401 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties( return builder().setConnectionProperties(connectionProperties).build(); } +/** + * Allows to tweak the {@link PoolingDataSource} created internally by {@link JdbcIO}. + */ +public DataSourceConfiguration withPoolConfiguration(int maxTotal, Review comment: Yeah I understand the intention, but these parameters are intended to be tuned according to the user's particular usage pattern, and here the usage pattern is almost completely outside the user's control - it's hidden behind: * implementation details of JdbcIO which are allowed to change at any moment without notice (and in fact *are* changing - this PR being an example) * implementation details of the particular runner, which also can do anything it wants as long as it complies with the Beam model; different runners do different things at different times with different data So there does not exist any goal that can be reliably accomplished by changing these parameters, because any user's assumptions of "what will happen if I change this parameter to value X" are going to be wrong. This is covered in https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose - I'm happy to expand that section with the argument above, if you think that'd be useful. validationQuery is not required either. AFAIK since JDBC4 (introduced in 2006, with Java 6), JDBC connections can validate themselves in a driver-specific fashion via connection.isValid(), and dbcp takes advantage of that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 78957) Time Spent: 2h 50m (was: 2h 40m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78955 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 09/Mar/18 17:59 Start Date: 09/Mar/18 17:59 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173522401 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties( return builder().setConnectionProperties(connectionProperties).build(); } +/** + * Allows to tweak the {@link PoolingDataSource} created internally by {@link JdbcIO}. + */ +public DataSourceConfiguration withPoolConfiguration(int maxTotal, Review comment: Yeah I understand the intention, but these parameters are intended to be tuned according to the user's particular usage pattern, and here the usage pattern is almost completely outside the user's control - it's hidden behind: * implementation details of JdbcIO which are allowed to change at any moment without notice (and in fact *are* changing - this PR being an example) * implementation details of the particular runner, which also can do anything it wants as long as it complies with the Beam model; different runners do different things at different times with different data So there does not exist any goal that can be reliably accomplished by changing these parameters. This is covered in https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose - I'm happy to expand that section with the argument above, if you think that'd be useful. validationQuery is not required either. AFAIK since JDBC4 (introduced in 2006, with Java 6), JDBC connections can validate themselves in a driver-specific fashion via connection.isValid(), and dbcp takes advantage of that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 78955) Time Spent: 2h 40m (was: 2.5h) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78790 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 09/Mar/18 05:19 Start Date: 09/Mar/18 05:19 Worklog Time Spent: 10m Work Description: jbonofre commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173367376 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties( return builder().setConnectionProperties(connectionProperties).build(); } +/** + * Allows to tweak the {@link PoolingDataSource} created internally by {@link JdbcIO}. + */ +public DataSourceConfiguration withPoolConfiguration(int maxTotal, Review comment: OK, I wanted to give complete control to users. I think, only the validationQuery is required IMHO. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 78790) Time Spent: 2.5h (was: 2h 20m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management
[ https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78788 ] ASF GitHub Bot logged work on BEAM-3500: Author: ASF GitHub Bot Created on: 09/Mar/18 05:18 Start Date: 09/Mar/18 05:18 Worklog Time Spent: 10m Work Description: jbonofre commented on a change in pull request #4461: [BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory allowing full control of the way the DataSource is created URL: https://github.com/apache/beam/pull/4461#discussion_r173367286 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java ## @@ -327,12 +428,47 @@ DataSource buildDatasource() throws Exception{ if (getConnectionProperties() != null && getConnectionProperties().get() != null) { basicDataSource.setConnectionProperties(getConnectionProperties().get()); } -return basicDataSource; +current = basicDataSource; } + + // wrapping the datasource as a pooling datasource + DataSourceConnectionFactory connectionFactory = new DataSourceConnectionFactory(current); + PoolableConnectionFactory poolableConnectionFactory = + new PoolableConnectionFactory(connectionFactory, null); + GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); + poolConfig.setMaxTotal(getPoolMaxTotal()); + poolConfig.setBlockWhenExhausted(getPoolBlockWhenExhausted()); + poolConfig.setMaxWaitMillis(getPoolMaxWaitMillis()); + poolConfig.setMaxIdle(getPoolMaxIdle()); + poolConfig.setMinIdle(getPoolMinIdle()); + poolConfig.setTestOnBorrow(getPoolTestOnBorrow()); + poolConfig.setTestOnReturn(getPoolTestOnReturn()); + poolConfig.setNumTestsPerEvictionRun(getPoolNumTestsPerEvictionRun()); + poolConfig.setMinEvictableIdleTimeMillis(getPoolMinEvictableIdleTimeMillis()); + poolConfig.setTestWhileIdle(getPoolTestWhileIdle()); + poolConfig.setSoftMinEvictableIdleTimeMillis(getPoolSoftMinEvictableIdleTimeMillis()); + poolConfig.setLifo(getPoolLifo()); + GenericObjectPool connectionPool = + new GenericObjectPool(poolableConnectionFactory, poolConfig); + poolableConnectionFactory.setPool(connectionPool); + poolableConnectionFactory.setValidationQuery("SELECT 1 FROM DUAL"); Review comment: My bad, I forgot to update with the provided value. By default, the `validationQuery` should be null and the user can define it depending of his database. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 78788) Time Spent: 2h 20m (was: 2h 10m) > JdbcIO: Improve connection management > - > > Key: BEAM-3500 > URL: https://issues.apache.org/jira/browse/BEAM-3500 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Affects Versions: 2.2.0 >Reporter: Pawel Bartoszek >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > JdbcIO write DoFn acquires connection in {{@Setup}} and release it in > {{@Teardown}} methods, which means that connection might stay opened for days > in streaming job case. Keeping single connection open for so long might be > very risky as it's exposed to database, network etc issues. > *Taking connection from the pool when it is actually needed* > I suggest that connection would be taken from the connection pool in > {{executeBatch}} method and released when the batch is flushed. This will > allow the pool to take care of any returned unhealthy connections etc. > *Make JdbcIO accept data source factory* > It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource > itself. I am saying that because sink checks if DataSource implements > `Serializable` interface, which make it impossible to pass > BasicDataSource(used internally by sink) as it doesn’t implement this > interface. Something like: > {code:java} > interface DataSourceFactory extends Serializable{ > DataSource create(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)