[jira] [Commented] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set
[ https://issues.apache.org/jira/browse/BEAM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460042#comment-16460042 ] Jean-Baptiste Onofré commented on BEAM-3714: Just a mistake from my side ;) Sorry about that. > JdbcIO.read() should create a forward-only, read-only result set > > > Key: BEAM-3714 > URL: https://issues.apache.org/jira/browse/BEAM-3714 > Project: Beam > Issue Type: Bug > Components: io-java-jdbc >Reporter: Eugene Kirpichov >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.5.0 > > Time Spent: 4h > Remaining Estimate: 0h > > [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934] > - a user is trying to load a large table from MySQL, and the MySQL JDBC > driver requires special measures when loading large result sets. > JdbcIO currently calls simply "connection.prepareStatement(query)" > https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508 > - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY > - these values should always be used. > Seems that different databases have different requirements for streaming > result sets. > E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must > not be in autocommit mode." > https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . > Oracle, I think, doesn't have any special requirements but I don't know. > Fetch size should probably still be set to a reasonably large value. > Seems that the common denominator of these requirements is: set fetch size to > a reasonably large but not maximum value; disable autocommit (there's nothing > to commit in read() anyway). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set
[ https://issues.apache.org/jira/browse/BEAM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459818#comment-16459818 ] Eugene Kirpichov commented on BEAM-3714: JB, what's the reason for reassigning this issue to yourself? It has been successfully closed after having been implemented by Innocent. > JdbcIO.read() should create a forward-only, read-only result set > > > Key: BEAM-3714 > URL: https://issues.apache.org/jira/browse/BEAM-3714 > Project: Beam > Issue Type: Bug > Components: io-java-jdbc >Reporter: Eugene Kirpichov >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.5.0 > > Time Spent: 4h > Remaining Estimate: 0h > > [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934] > - a user is trying to load a large table from MySQL, and the MySQL JDBC > driver requires special measures when loading large result sets. > JdbcIO currently calls simply "connection.prepareStatement(query)" > https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508 > - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY > - these values should always be used. > Seems that different databases have different requirements for streaming > result sets. > E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must > not be in autocommit mode." > https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . > Oracle, I think, doesn't have any special requirements but I don't know. > Fetch size should probably still be set to a reasonably large value. > Seems that the common denominator of these requirements is: set fetch size to > a reasonably large but not maximum value; disable autocommit (there's nothing > to commit in read() anyway). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set
[ https://issues.apache.org/jira/browse/BEAM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382636#comment-16382636 ] Eugene Kirpichov commented on BEAM-3714: The point of JdbcIO.read() is typically to read *a lot* of data, quickly. Smaller fetch size = lower performance (more database roundtrips to fetch the same amount of data), larger fetch size = better performance but more memory usage (e.g. a fetch size of 1 billion rows is definitely not practical, as is an unlimited fetch size). Basically, we need the largest fetch size that will not cause us to run out of memory. I think something in the vicinity of like 50k rows would be reasonable. > JdbcIO.read() should create a forward-only, read-only result set > > > Key: BEAM-3714 > URL: https://issues.apache.org/jira/browse/BEAM-3714 > Project: Beam > Issue Type: Bug > Components: io-java-jdbc >Reporter: Eugene Kirpichov >Assignee: Innocent >Priority: Major > > [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934] > - a user is trying to load a large table from MySQL, and the MySQL JDBC > driver requires special measures when loading large result sets. > JdbcIO currently calls simply "connection.prepareStatement(query)" > https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508 > - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY > - these values should always be used. > Seems that different databases have different requirements for streaming > result sets. > E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must > not be in autocommit mode." > https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . > Oracle, I think, doesn't have any special requirements but I don't know. > Fetch size should probably still be set to a reasonably large value. > Seems that the common denominator of these requirements is: set fetch size to > a reasonably large but not maximum value; disable autocommit (there's nothing > to commit in read() anyway). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set
[ https://issues.apache.org/jira/browse/BEAM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381404#comment-16381404 ] Innocent commented on BEAM-3714: Hi Eugene, Thanks for your support, I am not sure about the Fetch Size part. according to this documentation from Oracle [https://docs.oracle.com/cd/A81042_01/DOC/java.816/a81354/resltse5.htm] Fetch size is set to 10 by default. I do not have much experience with this did you had a specific value/ range of values in mind when suggesting that it should be set to a big value ? > JdbcIO.read() should create a forward-only, read-only result set > > > Key: BEAM-3714 > URL: https://issues.apache.org/jira/browse/BEAM-3714 > Project: Beam > Issue Type: Bug > Components: io-java-jdbc >Reporter: Eugene Kirpichov >Assignee: Innocent >Priority: Major > > [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934] > - a user is trying to load a large table from MySQL, and the MySQL JDBC > driver requires special measures when loading large result sets. > JdbcIO currently calls simply "connection.prepareStatement(query)" > https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508 > - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY > - these values should always be used. > Seems that different databases have different requirements for streaming > result sets. > E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must > not be in autocommit mode." > https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . > Oracle, I think, doesn't have any special requirements but I don't know. > Fetch size should probably still be set to a reasonably large value. > Seems that the common denominator of these requirements is: set fetch size to > a reasonably large but not maximum value; disable autocommit (there's nothing > to commit in read() anyway). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set
[ https://issues.apache.org/jira/browse/BEAM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381382#comment-16381382 ] Eugene Kirpichov commented on BEAM-3714: Hey Innocent, thanks for taking this! I'll be happy to help if you have any questions, and review your PR when it's ready. > JdbcIO.read() should create a forward-only, read-only result set > > > Key: BEAM-3714 > URL: https://issues.apache.org/jira/browse/BEAM-3714 > Project: Beam > Issue Type: Bug > Components: io-java-jdbc >Reporter: Eugene Kirpichov >Assignee: Innocent >Priority: Major > > [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934] > - a user is trying to load a large table from MySQL, and the MySQL JDBC > driver requires special measures when loading large result sets. > JdbcIO currently calls simply "connection.prepareStatement(query)" > https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508 > - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY > - these values should always be used. > Seems that different databases have different requirements for streaming > result sets. > E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must > not be in autocommit mode." > https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . > Oracle, I think, doesn't have any special requirements but I don't know. > Fetch size should probably still be set to a reasonably large value. > Seems that the common denominator of these requirements is: set fetch size to > a reasonably large but not maximum value; disable autocommit (there's nothing > to commit in read() anyway). -- This message was sent by Atlassian JIRA (v7.6.3#76005)