[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22461 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r221411618 --- Diff: docs/sql-programming-guide.md --- @@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source Guide](avro-data-source-guide.html). * The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java's DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs. * Some databases, such as H2, convert all names to upper case. You'll need to use upper case to refer to those names in Spark SQL. - + * Users can specify vendor-specific JDBC connection properties in the data source options to do special treatment. For example, `spark.read.format("jdbc").option("url", oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. `oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable this flag to avoid Oracle date being resolved as timestamp. --- End diff -- @maropu @gatorsmile Could you please suggest something else to do here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219546151 --- Diff: docs/sql-programming-guide.md --- @@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source Guide](avro-data-source-guide.html). * The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java's DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs. * Some databases, such as H2, convert all names to upper case. You'll need to use upper case to refer to those names in Spark SQL. - + * Users can specify vendor-specific JDBC connection properties in the data source options to do special treatment. For example, `spark.read.format("jdbc").option("url", oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. `oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable this flag to avoid Oracle date being resolved as timestamp. --- End diff -- This looks fine to me. @maropu Your idea is great! We need more examples in the troubleshooting. Currently, our document for Spark SQL and Core needs a major update. Maybe we can do it after we finish the reorg of the documentation. https://issues.apache.org/jira/browse/SPARK-24499? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219542061 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- I see. We still have a date column in the input dataframe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219540966 --- Diff: docs/sql-programming-guide.md --- @@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source Guide](avro-data-source-guide.html). * The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java's DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs. * Some databases, such as H2, convert all names to upper case. You'll need to use upper case to refer to those names in Spark SQL. - + * Users can specify vendor-specific JDBC connection properties in the data source options to do special treatment. For example, `spark.read.format("jdbc").option("url", oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. `oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable this flag to avoid Oracle date being resolved as timestamp. --- End diff -- Ur.. I see. This section is troubleshooting, so I think you'd be better to illustrate a concrete trouble case, then you describe a solution for that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219539062 --- Diff: docs/sql-programming-guide.md --- @@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9. Tables from the remote database can be loaded as a DataFrame or Spark SQL temporary view using the Data Sources API. Users can specify the JDBC connection properties in the data source options. user and password are normally provided as connection properties for -logging into the data sources. In addition to the connection properties, Spark also supports -the following case-insensitive options: +logging into the data sources. Vendor-specific connection properties can also be passed to the +underlying JDBC driver in the same way. For example: + +{% highlight scala %} +// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, a column of Oracle +// DATE type will be resolved as Catalyst TimestampType, which is probably not the desired behavior. +spark.read.format("jdbc") + .option("url", oracleJdbcUrl) + .option("oracle.jdbc.mapDateToTimestamp", "false") +{% endhighlight %} + --- End diff -- I have moved this description to `Troubleshooting` section. I also tried to brush up the description. Writing good documentation is sometimes difficult than writing code. really need your help :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219537942 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +468,12 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d + // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail test result + // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 00:00:00.0. --- End diff -- OK, I will remove duplicate description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219522094 --- Diff: docs/sql-programming-guide.md --- @@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9. Tables from the remote database can be loaded as a DataFrame or Spark SQL temporary view using the Data Sources API. Users can specify the JDBC connection properties in the data source options. user and password are normally provided as connection properties for -logging into the data sources. In addition to the connection properties, Spark also supports -the following case-insensitive options: +logging into the data sources. Vendor-specific connection properties can also be passed to the +underlying JDBC driver in the same way. For example: + +{% highlight scala %} +// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, a column of Oracle +// DATE type will be resolved as Catalyst TimestampType, which is probably not the desired behavior. +spark.read.format("jdbc") + .option("url", oracleJdbcUrl) + .option("oracle.jdbc.mapDateToTimestamp", "false") +{% endhighlight %} + --- End diff -- Also, you'd be better to brush up the description more to make users understood easily... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219520789 --- Diff: docs/sql-programming-guide.md --- @@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9. Tables from the remote database can be loaded as a DataFrame or Spark SQL temporary view using the Data Sources API. Users can specify the JDBC connection properties in the data source options. user and password are normally provided as connection properties for -logging into the data sources. In addition to the connection properties, Spark also supports -the following case-insensitive options: +logging into the data sources. Vendor-specific connection properties can also be passed to the +underlying JDBC driver in the same way. For example: + +{% highlight scala %} +// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, a column of Oracle +// DATE type will be resolved as Catalyst TimestampType, which is probably not the desired behavior. +spark.read.format("jdbc") + .option("url", oracleJdbcUrl) + .option("oracle.jdbc.mapDateToTimestamp", "false") +{% endhighlight %} + --- End diff -- Probably, I thinks its better to put this description in the `Troubleshooting` section: https://spark.apache.org/docs/2.3.1/sql-programming-guide.html#troubleshooting --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219518705 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +468,12 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d + // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail test result + // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 00:00:00.0. --- End diff -- I thinks we don't need this duplicate description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219518607 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +468,12 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d + // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail test result + // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 00:00:00.0. + .option("oracle.jdbc.mapDateToTimestamp", "false") + .option("sessionInitStatement", +"ALTER SESSION SET NLS_TIMESTAMP_FORMAT = '-MM-DD HH24:MI:SS.FF'") --- End diff -- I thinks we don't need this duplicate description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219403696 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- Yes, we need this. Otherwise, Spark will read column `d` values as Catalyst type timestamp, which will fail the test. https://user-images.githubusercontent.com/12194089/45865915-9e730800-bdb1-11e8-9a42-a1394c601166.png";> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219392779 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- Do we need this line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219386919 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -442,6 +442,8 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-06") .option("upperBound", "2018-07-20") .option("numPartitions", 3) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- ok. I will add notes to http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases, and will also add comments to the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219359831 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -442,6 +442,8 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-06") .option("upperBound", "2018-07-20") .option("numPartitions", 3) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- Also leave comments before line 445 to explain why we need to do this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219359638 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -442,6 +442,8 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-06") .option("upperBound", "2018-07-20") .option("numPartitions", 3) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- This is good to know. Could you help update the doc and explain we can pass the options to the underlying JDBC and also give some examples? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org