[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83777
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 19:36
Start Date: 23/Mar/18 19:36
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-375776291
 
 
   Cool ! Thanks a lot !!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83777)
Time Spent: 7.5h  (was: 7h 20m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83767
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 19:07
Start Date: 23/Mar/18 19:07
Worklog Time Spent: 10m 
  Work Description: jkff closed pull request #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/java/io/jdbc/pom.xml b/sdks/java/io/jdbc/pom.xml
index 9151cfb80ae..577d4f55a74 100644
--- a/sdks/java/io/jdbc/pom.xml
+++ b/sdks/java/io/jdbc/pom.xml
@@ -279,6 +279,11 @@
   commons-dbcp2
   2.1.1
 
+
+  org.apache.commons
+  commons-pool2
+  2.4.2
+
 
 
   joda-time
diff --git 
a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java 
b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
index f7a66045886..35e4aacd5ac 100644
--- a/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
+++ b/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
@@ -52,6 +52,11 @@
 import org.apache.beam.sdk.values.PCollectionView;
 import org.apache.beam.sdk.values.PDone;
 import org.apache.commons.dbcp2.BasicDataSource;
+import org.apache.commons.dbcp2.DataSourceConnectionFactory;
+import org.apache.commons.dbcp2.PoolableConnectionFactory;
+import org.apache.commons.dbcp2.PoolingDataSource;
+import org.apache.commons.pool2.impl.GenericObjectPool;
+import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
 import org.joda.time.Duration;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -235,17 +240,17 @@ public static DataSourceConfiguration create(DataSource 
dataSource) {
   checkArgument(dataSource != null, "dataSource can not be null");
   checkArgument(dataSource instanceof Serializable, "dataSource must be 
Serializable");
   return new AutoValue_JdbcIO_DataSourceConfiguration.Builder()
-  .setDataSource(dataSource)
-  .build();
+  .setDataSource(dataSource)
+  .build();
 }
 
 public static DataSourceConfiguration create(String driverClassName, 
String url) {
   checkArgument(driverClassName != null, "driverClassName can not be 
null");
   checkArgument(url != null, "url can not be null");
   return new AutoValue_JdbcIO_DataSourceConfiguration.Builder()
-  
.setDriverClassName(ValueProvider.StaticValueProvider.of(driverClassName))
-  .setUrl(ValueProvider.StaticValueProvider.of(url))
-  .build();
+  
.setDriverClassName(ValueProvider.StaticValueProvider.of(driverClassName))
+  .setUrl(ValueProvider.StaticValueProvider.of(url))
+  .build();
 }
 
 public static DataSourceConfiguration create(ValueProvider 
driverClassName,
@@ -254,8 +259,7 @@ public static DataSourceConfiguration 
create(ValueProvider driverClassNa
   checkArgument(url != null, "url can not be null");
   return new AutoValue_JdbcIO_DataSourceConfiguration.Builder()
   .setDriverClassName(driverClassName)
-  .setUrl(url)
-  .build();
+  .setUrl(url).build();
 }
 
 public DataSourceConfiguration withUsername(String username) {
@@ -307,9 +311,10 @@ private void populateDisplayData(DisplayData.Builder 
builder) {
   }
 }
 
-DataSource buildDatasource() throws Exception{
+DataSource buildDatasource() throws Exception {
+  DataSource current = null;
   if (getDataSource() != null) {
-return getDataSource();
+current = getDataSource();
   } else {
 BasicDataSource basicDataSource = new BasicDataSource();
 if (getDriverClassName() != null) {
@@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(1);

[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83766
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 19:06
Start Date: 23/Mar/18 19:06
Worklog Time Spent: 10m 
  Work Description: jkff commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-375769221
 
 
   Yes, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83766)
Time Spent: 7h 10m  (was: 7h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83640
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 16:20
Start Date: 23/Mar/18 16:20
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-375720383
 
 
   @jkff is it OK for you if I merge this one ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83640)
Time Spent: 7h  (was: 6h 50m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83546
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 10:14
Start Date: 23/Mar/18 10:14
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-375606145
 
 
   By the way, here's the pipeline I used for the test:
   
   
https://github.com/jbonofre/beam-samples/blob/master/iot/src/main/java/org/apache/beam/samples/iot/JmsToJdbc.java


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83546)
Time Spent: 6h 50m  (was: 6h 40m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83545
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 10:13
Start Date: 23/Mar/18 10:13
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-375605850
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83545)
Time Spent: 6h 40m  (was: 6.5h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=83544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83544
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 23/Mar/18 10:13
Start Date: 23/Mar/18 10:13
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-375605773
 
 
   I tested with monitoring the connections on a local Derby database:
   
   ```
   ij> show connections;
   CONNECTION0* -  jdbc:derby://localhost:1527/test
   ```
   I confirm that the connection is released while the pipeline doesn't get new 
data.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83544)
Time Spent: 6.5h  (was: 6h 20m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=82423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82423
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 20/Mar/18 19:28
Start Date: 20/Mar/18 19:28
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-374726889
 
 
   I did a first test and so far so good: I can see the connection released. 
Let me add some instrumentation in the output to paste here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82423)
Time Spent: 6h 20m  (was: 6h 10m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=81992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81992
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 19/Mar/18 19:50
Start Date: 19/Mar/18 19:50
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-374127575
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81992)
Time Spent: 6h 10m  (was: 6h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=81990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81990
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 19/Mar/18 19:49
Start Date: 19/Mar/18 19:49
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-374347060
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81990)
Time Spent: 6h  (was: 5h 50m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=81766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81766
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 19/Mar/18 07:54
Start Date: 19/Mar/18 07:54
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-374127575
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81766)
Time Spent: 5h 50m  (was: 5h 40m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80984
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 15/Mar/18 21:28
Start Date: 15/Mar/18 21:28
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174938254
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(1);
+  poolConfig.setMinIdle(0);
+  poolConfig.setMinEvictableIdleTimeMillis(1);
 
 Review comment:
   I don't know if it's worth it to verify the behavior automatically, I'm just 
asking to manually run a streaming pipeline with JdbcIO and confirm that it 
actually creates and releases connections when the pipeline is idle for some 
time (maybe enable logging in dbcp somewhere. Or at least confirm that the 
pipeline works at all for a few minutes).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80984)
Time Spent: 5.5h  (was: 5h 20m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80977
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 15/Mar/18 20:35
Start Date: 15/Mar/18 20:35
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174923818
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(1);
+  poolConfig.setMinIdle(0);
+  poolConfig.setMinEvictableIdleTimeMillis(1);
 
 Review comment:
   Basically, you mean trying to implement a test to illustrate/test ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80977)
Time Spent: 5h 20m  (was: 5h 10m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80607
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 14/Mar/18 22:05
Start Date: 14/Mar/18 22:05
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174624347
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -509,7 +530,9 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   builder.add(DisplayData.item("query", getQuery()));
   builder.add(DisplayData.item("rowMapper", 
getRowMapper().getClass().getName()));
   builder.add(DisplayData.item("coder", getCoder().getClass().getName()));
-  getDataSourceConfiguration().populateDisplayData(builder);
+  if (getDataSourceConfiguration() != null) {
 
 Review comment:
   Ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80607)
Time Spent: 4h 50m  (was: 4h 40m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80609
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 14/Mar/18 22:05
Start Date: 14/Mar/18 22:05
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174624327
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -423,7 +444,9 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   builder.add(DisplayData.item("query", getQuery()));
   builder.add(DisplayData.item("rowMapper", 
getRowMapper().getClass().getName()));
   builder.add(DisplayData.item("coder", getCoder().getClass().getName()));
-  getDataSourceConfiguration().populateDisplayData(builder);
+  if (getDataSourceConfiguration() != null) {
 
 Review comment:
   It can't be null, as per the check at line 422


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80609)
Time Spent: 5h 10m  (was: 5h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80608
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 14/Mar/18 22:05
Start Date: 14/Mar/18 22:05
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174624597
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,8 +332,25 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(1);
+  poolConfig.setMinIdle(0);
+  poolConfig.setMinEvictableIdleTimeMillis(1);
 
 Review comment:
   Thanks, this looks reasonable, though is there any way to manually verify 
that this is working as intended in practice?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80608)
Time Spent: 5h  (was: 4h 50m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=80502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80502
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 14/Mar/18 19:36
Start Date: 14/Mar/18 19:36
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-373148447
 
 
   Rebased and updated according to @jkff comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80502)
Time Spent: 4h 40m  (was: 4.5h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79991
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 13/Mar/18 17:58
Start Date: 13/Mar/18 17:58
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174228923
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   Oh sorry, you are right. I thought I added `dataSourceFactory` in another PR 
;) In that case, I agree with you, I can be removed ;) I do the cleanup. Thanks 
!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79991)
Time Spent: 4.5h  (was: 4h 20m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79989
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 13/Mar/18 17:56
Start Date: 13/Mar/18 17:56
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174228175
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   What do you mean by backwards compatibility? This *is* the PR that adds 
`dataSourceFactory`, so not adding it in this PR does not break compatibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79989)
Time Spent: 4h 20m  (was: 4h 10m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79926
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 13/Mar/18 16:19
Start Date: 13/Mar/18 16:19
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r174194623
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   Maybe for backward compatibility, it would be better to still support 
`dataSourceFactory`, right ? In that case, I would "wrap" `dataSourceFactory` 
in the pool as you suggested previously. Agreed ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79926)
Time Spent: 4h 10m  (was: 4h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79520
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 12/Mar/18 16:48
Start Date: 12/Mar/18 16:48
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173867345
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   In other words: In my view, dataSourceFactory is rather for cases where the 
DataSource requires some trickery to even construct, e.g. requires setting some 
vendor-specific connection parameters to establish the connection. I don't know 
if such cases exist at all, and I'd be open to simply dropping this feature 
from this PR. But if we think it's necessary, then I think we still want to 
wrap that with pooling, and I think for a user to try to configure pooling 
themselves is always a bad idea.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79520)
Time Spent: 4h  (was: 3h 50m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79518
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 12/Mar/18 16:44
Start Date: 12/Mar/18 16:44
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173865797
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   Can you think of a case where the current default pooling is not a good 
choice AND a user can provide a pooling configuration that will reliably give 
better behavior regardless of runner and of implementation details of JdbcIO?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79518)
Time Spent: 3h 50m  (was: 3h 40m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79336
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 12/Mar/18 06:31
Start Date: 12/Mar/18 06:31
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173702692
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   In the case where the user provides its own `dataSourceFactory`, I guess he 
wants to have complete control and can "wrap" himself in a pool no ? The 
pooling is in the case of the user provides a `dataSource` or configuration to 
create a `dataSource` internally in the IO (using DBCP). Thoughts ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79336)
Time Spent: 3h 40m  (was: 3.5h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79305
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 18:26
Start Date: 11/Mar/18 18:26
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173664163
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,12 +342,42 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(getPoolMaxTotal());
+  poolConfig.setBlockWhenExhausted(true);
 
 Review comment:
   And you also need to configure eviction as per 
https://stackoverflow.com/questions/38685092/dbcp2-when-are-idle-connections-removed-from-the-pool
 - otherwise, with minIdle=0, the idle connections will be closed immediately 
on release rather than after an idle timeout.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79305)
Time Spent: 3h 10m  (was: 3h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79306
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 18:26
Start Date: 11/Mar/18 18:26
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173664096
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   Don't we want to wrap the result of `dataSourceFactory` in a pool?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79306)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79310
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 18:26
Start Date: 11/Mar/18 18:26
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173664103
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -664,9 +751,18 @@ public WriteFn(Write spec) {
 
   @Setup
   public void setup() throws Exception {
-dataSource = spec.getDataSourceConfiguration().buildDatasource();
+if (spec.getDataSourceFactory() != null) {
+  dataSource = spec.getDataSourceFactory().create();
 
 Review comment:
   Ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79310)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79309
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 18:26
Start Date: 11/Mar/18 18:26
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173663972
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,12 +342,42 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(getPoolMaxTotal());
+  poolConfig.setBlockWhenExhausted(true);
 
 Review comment:
   Almost all of these parameters are also irrelevant because there will be at 
most 1 connection - please keep only the necessary ones. I think really you 
only need maxTotal=1, minIdle=0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79309)
Time Spent: 3.5h  (was: 3h 20m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79308
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 18:26
Start Date: 11/Mar/18 18:26
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173664072
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,12 +342,42 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(getPoolMaxTotal());
+  poolConfig.setBlockWhenExhausted(true);
+  poolConfig.setMaxIdle(getPoolMaxTotal());
+  poolConfig.setMinIdle(0);
+  poolConfig.setTestOnBorrow(false);
+  poolConfig.setTestOnReturn(false);
+  poolConfig.setTestWhileIdle(true);
+  poolConfig.setLifo(true);
+  GenericObjectPool connectionPool =
+  new GenericObjectPool(poolableConnectionFactory, poolConfig);
+  poolableConnectionFactory.setPool(connectionPool);
+  poolableConnectionFactory.setDefaultAutoCommit(false);
 
 Review comment:
   I don't think these settings are necessary either.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79308)
Time Spent: 3h 20m  (was: 3h 10m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79307
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 18:26
Start Date: 11/Mar/18 18:26
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173663913
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -297,6 +306,10 @@ public DataSourceConfiguration withConnectionProperties(
   return builder().setConnectionProperties(connectionProperties).build();
 }
 
+public DataSourceConfiguration withPoolMaxTotal(int poolMaxTotal) {
 
 Review comment:
   This parameter has no effect, please remove it. A single DoFn instance is 
used from a single thread, so there will never be more than 1 connection at the 
same time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79307)
Time Spent: 3h 20m  (was: 3h 10m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79304
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 11/Mar/18 17:55
Start Date: 11/Mar/18 17:55
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #4461: [BEAM-3500] "Attach" 
JDBC connection to the bundle and add DataSourceFactory allowing full control 
of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#issuecomment-372134799
 
 
   Rebased and only expose max number of connections in the pool to end users. 
@jkff please let me know if it's OK for you ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79304)
Time Spent: 3h  (was: 2h 50m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78957
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 09/Mar/18 18:00
Start Date: 09/Mar/18 18:00
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173522401
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties(
   return builder().setConnectionProperties(connectionProperties).build();
 }
 
+/**
+ * Allows to tweak the {@link PoolingDataSource} created internally by 
{@link JdbcIO}.
+ */
+public DataSourceConfiguration withPoolConfiguration(int maxTotal,
 
 Review comment:
   Yeah I understand the intention, but these parameters are intended to be 
tuned according to the user's particular usage pattern, and here the usage 
pattern is almost completely outside the user's control - it's hidden behind:
   
   * implementation details of JdbcIO which are allowed to change at any moment 
without notice (and in fact *are* changing - this PR being an example)
   * implementation details of the particular runner, which also can do 
anything it wants as long as it complies with the Beam model; different runners 
do different things at different times with different data
   
   So there does not exist any goal that can be reliably accomplished by 
changing these parameters, because any user's assumptions of "what will happen 
if I change this parameter to value X" are going to be wrong.
   
   This is covered in 
https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose
 - I'm happy to expand that section with the argument above, if you think 
that'd be useful.
   
   validationQuery is not required either. AFAIK since JDBC4 (introduced in 
2006, with Java 6), JDBC connections can validate themselves in a 
driver-specific fashion via connection.isValid(), and dbcp takes advantage of 
that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 78957)
Time Spent: 2h 50m  (was: 2h 40m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78955
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 09/Mar/18 17:59
Start Date: 09/Mar/18 17:59
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173522401
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties(
   return builder().setConnectionProperties(connectionProperties).build();
 }
 
+/**
+ * Allows to tweak the {@link PoolingDataSource} created internally by 
{@link JdbcIO}.
+ */
+public DataSourceConfiguration withPoolConfiguration(int maxTotal,
 
 Review comment:
   Yeah I understand the intention, but these parameters are intended to be 
tuned according to the user's particular usage pattern, and here the usage 
pattern is almost completely outside the user's control - it's hidden behind:
   
   * implementation details of JdbcIO which are allowed to change at any moment 
without notice (and in fact *are* changing - this PR being an example)
   * implementation details of the particular runner, which also can do 
anything it wants as long as it complies with the Beam model; different runners 
do different things at different times with different data
   
   So there does not exist any goal that can be reliably accomplished by 
changing these parameters.
   
   This is covered in 
https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose
 - I'm happy to expand that section with the argument above, if you think 
that'd be useful.
   
   validationQuery is not required either. AFAIK since JDBC4 (introduced in 
2006, with Java 6), JDBC connections can validate themselves in a 
driver-specific fashion via connection.isValid(), and dbcp takes advantage of 
that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 78955)
Time Spent: 2h 40m  (was: 2.5h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-08 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78790
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 09/Mar/18 05:19
Start Date: 09/Mar/18 05:19
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173367376
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties(
   return builder().setConnectionProperties(connectionProperties).build();
 }
 
+/**
+ * Allows to tweak the {@link PoolingDataSource} created internally by 
{@link JdbcIO}.
+ */
+public DataSourceConfiguration withPoolConfiguration(int maxTotal,
 
 Review comment:
   OK, I wanted to give complete control to users. I think, only the 
validationQuery is required IMHO.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 78790)
Time Spent: 2.5h  (was: 2h 20m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-08 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78788
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 09/Mar/18 05:18
Start Date: 09/Mar/18 05:18
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173367286
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -327,12 +428,47 @@ DataSource buildDatasource() throws Exception{
 if (getConnectionProperties() != null && 
getConnectionProperties().get() != null) {
   
basicDataSource.setConnectionProperties(getConnectionProperties().get());
 }
-return basicDataSource;
+current = basicDataSource;
   }
+
+  // wrapping the datasource as a pooling datasource
+  DataSourceConnectionFactory connectionFactory = new 
DataSourceConnectionFactory(current);
+  PoolableConnectionFactory poolableConnectionFactory =
+  new PoolableConnectionFactory(connectionFactory, null);
+  GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
+  poolConfig.setMaxTotal(getPoolMaxTotal());
+  poolConfig.setBlockWhenExhausted(getPoolBlockWhenExhausted());
+  poolConfig.setMaxWaitMillis(getPoolMaxWaitMillis());
+  poolConfig.setMaxIdle(getPoolMaxIdle());
+  poolConfig.setMinIdle(getPoolMinIdle());
+  poolConfig.setTestOnBorrow(getPoolTestOnBorrow());
+  poolConfig.setTestOnReturn(getPoolTestOnReturn());
+  poolConfig.setNumTestsPerEvictionRun(getPoolNumTestsPerEvictionRun());
+  
poolConfig.setMinEvictableIdleTimeMillis(getPoolMinEvictableIdleTimeMillis());
+  poolConfig.setTestWhileIdle(getPoolTestWhileIdle());
+  
poolConfig.setSoftMinEvictableIdleTimeMillis(getPoolSoftMinEvictableIdleTimeMillis());
+  poolConfig.setLifo(getPoolLifo());
+  GenericObjectPool connectionPool =
+  new GenericObjectPool(poolableConnectionFactory, poolConfig);
+  poolableConnectionFactory.setPool(connectionPool);
+  poolableConnectionFactory.setValidationQuery("SELECT 1 FROM DUAL");
 
 Review comment:
   My bad, I forgot to update with the provided value. By default, the 
`validationQuery` should be null and the user can define it depending of his 
database.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 78788)
Time Spent: 2h 20m  (was: 2h 10m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)