[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r82490916 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -50,67 +51,52 @@ class JdbcRelationProvider extends CreatableRelationProvider JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } - /* - * The following structure applies to this code: - * |tableExists| !tableExists - * - * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation - * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation - * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation - * | saveTable, BaseRelation | - * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation - * - * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate - */ override def createRelation( sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], - data: DataFrame): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url -val table = jdbcOptions.table - + df: DataFrame): BaseRelation = { +val options = new JDBCOptions(parameters) +val url = options.url +val table = options.table +val createTableOptions = options.createTableOptions +val isTruncate = options.isTruncate val props = new Properties() props.putAll(parameters.asJava) -val conn = JdbcUtils.createConnectionFactory(url, props)() +val conn = JdbcUtils.createConnectionFactory(url, props)() try { val tableExists = JdbcUtils.tableExists(conn, url, table) + if (tableExists) { +mode match { + case SaveMode.Overwrite => +if (isTruncate && isCascadingTruncateTable(url).contains(false)) { --- End diff -- one thing -- please avoid using Option.contains in the future. It is in general actually very confusing and more error prone, especially to people not super familiar with Scala, to use option as a collection (or monad). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15263 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81283911 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import scala.collection.JavaConverters.mapAsJavaMapConverter +import scala.collection.JavaConverters._ --- End diff -- Yup. This is even documented in https://github.com/databricks/scala-style-guide#imports as you might already know. I just thought this is an exceptional case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81283611 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import scala.collection.JavaConverters.mapAsJavaMapConverter +import scala.collection.JavaConverters._ --- End diff -- This might be an exception. My comment is just a general principle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81283082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import scala.collection.JavaConverters.mapAsJavaMapConverter +import scala.collection.JavaConverters._ --- End diff -- Yeap, but it seems `import scala.collection.JavaConverters._` is being used a lot across the codebase. So, I thought both are fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81282421 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import scala.collection.JavaConverters.mapAsJavaMapConverter +import scala.collection.JavaConverters._ --- End diff -- We should avoid using `x.y.z._`, if possible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81278191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import scala.collection.JavaConverters.mapAsJavaMapConverter +import scala.collection.JavaConverters._ --- End diff -- I think both are fine but will fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81275138 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import scala.collection.JavaConverters.mapAsJavaMapConverter +import scala.collection.JavaConverters._ --- End diff -- revert it back? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81274703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -697,4 +697,27 @@ object JdbcUtils extends Logging { getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel) ) } + + /** + * Creates a table according to the given schema. --- End diff -- `Creates a table with a given schema` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81274640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -697,4 +697,27 @@ object JdbcUtils extends Logging { getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel) ) } + + /** + * Creates a table according to the given schema. + */ + def createTable( + schema: StructType, + url: String, + table: String, + createTableOptions: String, + conn: Connection): Unit = { +val strSchema = schemaString(schema, url) +// Create the table if the table didn't exist. --- End diff -- `didn't` -> `does not` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81147277 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -697,4 +697,27 @@ object JdbcUtils extends Logging { getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel) ) } + + /** + * Creates a table according to the given schema. + */ + def createTable( + df: DataFrame, --- End diff -- Yeap. I intended to make the changes minimised. Let me fix them together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81147460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -697,4 +697,27 @@ object JdbcUtils extends Logging { getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel) ) } + + /** + * Creates a table according to the given schema. + */ + def createTable( + df: DataFrame, --- End diff -- Yeap. I intended to make the changes minimised. Let me fix them together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r81124219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -697,4 +697,27 @@ object JdbcUtils extends Logging { getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel) ) } + + /** + * Creates a table according to the given schema. + */ + def createTable( + df: DataFrame, --- End diff -- why `createTable` takes a `DataFrame` instead of a `StructType`? I think it's clearer to make `schemaString` and this method takes `StrcutType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80957517 --- Diff: docs/sql-programming-guide.md --- @@ -1058,7 +1059,21 @@ the Data Sources API. The following options are supported: The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows). - + + +batchsize --- End diff -- Same here. No need to change the document in this PR. How about minimizing the code changes and remove all these unrelated changes? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80957168 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -61,3 +61,12 @@ class JDBCOptions( // TODO: to reuse the existing partition parameters for those partition specific options val createTableOptions = parameters.getOrElse("createTableOptions", "") } + +object JDBCOptions { + // TODO: Theses property names are used in `JdbcUtils`, `PostgresDialect` and `JDBCRDD`. It'd be --- End diff -- If you plan to resolve this in a separate PR, how about remove all these changes from this follow-up PR? These changes are not necessary, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80845890 --- Diff: docs/sql-programming-guide.md --- @@ -1058,7 +1059,21 @@ the Data Sources API. The following options are supported: The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows). - + + +batchsize + + The JDBC batch size, which determines how many rows to insert per round trip. This can help performance on JDBC drivers. + + + + +isolationLevel + + The transaction isolation level, which applies to current connection. Please refer the documenation in java.sql.Connection. --- End diff -- `current connection` -> `the current connection` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80845726 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -50,67 +51,48 @@ class JdbcRelationProvider extends CreatableRelationProvider JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } - /* - * The following structure applies to this code: - * |tableExists| !tableExists - * - * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation - * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation - * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation - * | saveTable, BaseRelation | - * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation - * - * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate - */ override def createRelation( sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], - data: DataFrame): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url -val table = jdbcOptions.table - + df: DataFrame): BaseRelation = { +val options = new JDBCOptions(parameters) +val url = options.url +val table = options.table +val createTableOptions = options.createTableOptions +val isTruncate = options.isTruncate val props = new Properties() props.putAll(parameters.asJava) -val conn = JdbcUtils.createConnectionFactory(url, props)() +val conn = JdbcUtils.createConnectionFactory(url, props)() try { val tableExists = JdbcUtils.tableExists(conn, url, table) + if (tableExists) { +mode match { + case SaveMode.Overwrite => +if (isTruncate && isCascadingTruncateTable(url).contains(false)) { + // In this case, we should truncate table and then load. + truncateTable(conn, table) + saveTable(df, url, table, props) +} else { + // Otherwise, do not truncate but just drop. + dropTable(conn, table) + createTable(df, url, table, createTableOptions, conn) + saveTable(df, url, table, props) +} + case SaveMode.Append => +saveTable(df, url, table, props) - val (doCreate, doSave) = (mode, tableExists) match { -case (SaveMode.Ignore, true) => (false, false) -case (SaveMode.ErrorIfExists, true) => throw new AnalysisException( - s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") -case (SaveMode.Overwrite, true) => - if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { -JdbcUtils.truncateTable(conn, table) -(false, true) - } else { -JdbcUtils.dropTable(conn, table) -(true, true) - } -case (SaveMode.Append, true) => (false, true) -case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," + - " for handling existing tables.") -case (_, false) => (true, true) - } + case SaveMode.ErrorIfExists => +throw new AnalysisException( + s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") --- End diff -- `Table or view '$table' already exists. SaveMode: ErrorIfExists.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80845381 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -50,67 +51,48 @@ class JdbcRelationProvider extends CreatableRelationProvider JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } - /* - * The following structure applies to this code: - * |tableExists| !tableExists - * - * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation - * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation - * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation - * | saveTable, BaseRelation | - * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation - * - * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate - */ override def createRelation( sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], - data: DataFrame): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url -val table = jdbcOptions.table - + df: DataFrame): BaseRelation = { +val options = new JDBCOptions(parameters) +val url = options.url +val table = options.table +val createTableOptions = options.createTableOptions +val isTruncate = options.isTruncate val props = new Properties() props.putAll(parameters.asJava) -val conn = JdbcUtils.createConnectionFactory(url, props)() +val conn = JdbcUtils.createConnectionFactory(url, props)() try { val tableExists = JdbcUtils.tableExists(conn, url, table) + if (tableExists) { +mode match { + case SaveMode.Overwrite => +if (isTruncate && isCascadingTruncateTable(url).contains(false)) { + // In this case, we should truncate table and then load. + truncateTable(conn, table) + saveTable(df, url, table, props) +} else { + // Otherwise, do not truncate but just drop. --- End diff -- This comment looks weird. In RDBMS, users might do `drop and recreate` a table, instead of truncating it. Thus, please change it to `// Otherwise, do not truncate but just drop and recreate` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80844741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -50,67 +51,48 @@ class JdbcRelationProvider extends CreatableRelationProvider JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } - /* - * The following structure applies to this code: - * |tableExists| !tableExists - * - * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation - * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation - * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation - * | saveTable, BaseRelation | - * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation - * - * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate - */ override def createRelation( sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], - data: DataFrame): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url -val table = jdbcOptions.table - + df: DataFrame): BaseRelation = { +val options = new JDBCOptions(parameters) +val url = options.url +val table = options.table +val createTableOptions = options.createTableOptions +val isTruncate = options.isTruncate val props = new Properties() props.putAll(parameters.asJava) -val conn = JdbcUtils.createConnectionFactory(url, props)() +val conn = JdbcUtils.createConnectionFactory(url, props)() try { val tableExists = JdbcUtils.tableExists(conn, url, table) + if (tableExists) { +mode match { + case SaveMode.Overwrite => +if (isTruncate && isCascadingTruncateTable(url).contains(false)) { + // In this case, we should truncate table and then load. + truncateTable(conn, table) + saveTable(df, url, table, props) +} else { + // Otherwise, do not truncate but just drop. + dropTable(conn, table) + createTable(df, url, table, createTableOptions, conn) + saveTable(df, url, table, props) +} + case SaveMode.Append => +saveTable(df, url, table, props) - val (doCreate, doSave) = (mode, tableExists) match { -case (SaveMode.Ignore, true) => (false, false) -case (SaveMode.ErrorIfExists, true) => throw new AnalysisException( - s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") -case (SaveMode.Overwrite, true) => - if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { -JdbcUtils.truncateTable(conn, table) -(false, true) - } else { -JdbcUtils.dropTable(conn, table) -(true, true) - } -case (SaveMode.Append, true) => (false, true) -case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," + - " for handling existing tables.") -case (_, false) => (true, true) - } + case SaveMode.ErrorIfExists => +throw new AnalysisException( + s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") - if (doCreate) { -val schema = JdbcUtils.schemaString(data, url) -// To allow certain options to append when create a new table, which can be -// table_options or partition_options. -// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8" -val createtblOptions = jdbcOptions.createTableOptions -val sql = s"CREATE TABLE $table ($schema) $createtblOptions" -val statement = conn.createStatement -try { - statement.executeUpdate(sql) -} finally { - statement.close() + case SaveMode.Ignore => // Just ignore this case. --- End diff -- Please explicitly explain the behavior. `// Just ignore this case` is almost useless. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80739029 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -50,67 +51,52 @@ class JdbcRelationProvider extends CreatableRelationProvider JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } - /* - * The following structure applies to this code: - * |tableExists| !tableExists - * - * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation - * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation - * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation - * | saveTable, BaseRelation | - * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation - * - * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate - */ override def createRelation( sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], - data: DataFrame): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url -val table = jdbcOptions.table - + df: DataFrame): BaseRelation = { +val options = new JDBCOptions(parameters) +val url = options.url +val table = options.table +val createTableOptions = options.createTableOptions +val isTruncate = options.isTruncate val props = new Properties() props.putAll(parameters.asJava) -val conn = JdbcUtils.createConnectionFactory(url, props)() +val conn = JdbcUtils.createConnectionFactory(url, props)() try { val tableExists = JdbcUtils.tableExists(conn, url, table) + if (tableExists) { +mode match { + case SaveMode.Overwrite => +if (isTruncate && isCascadingTruncateTable(url).contains(false)) { + // In this case, we should truncate table and then load. + truncateTable(conn, table) + saveTable(df, url, table, props) +} else { + // Otherwise, do not truncate but just drop. + dropTable(conn, table) + createTable(df, url, table, createTableOptions, conn) + saveTable(df, url, table, props) +} + case SaveMode.Append => +saveTable(df, url, table, props) - val (doCreate, doSave) = (mode, tableExists) match { -case (SaveMode.Ignore, true) => (false, false) -case (SaveMode.ErrorIfExists, true) => throw new AnalysisException( - s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") -case (SaveMode.Overwrite, true) => - if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { -JdbcUtils.truncateTable(conn, table) -(false, true) - } else { -JdbcUtils.dropTable(conn, table) -(true, true) - } -case (SaveMode.Append, true) => (false, true) -case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," + - " for handling existing tables.") -case (_, false) => (true, true) - } + case SaveMode.ErrorIfExists => +sys.error(s"Table $table already exists.") + + case SaveMode.Ignore => // Just ignore this case. +} + } else { +mode match { + case SaveMode.Overwrite | SaveMode.Append | SaveMode.ErrorIfExists => +createTable(df, url, table, createTableOptions, conn) +saveTable(df, url, table, props) - if (doCreate) { -val schema = JdbcUtils.schemaString(data, url) -// To allow certain options to append when create a new table, which can be -// table_options or partition_options. -// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8" -val createtblOptions = jdbcOptions.createTableOptions -val sql = s"CREATE TABLE $table ($schema) $createtblOptions" -val statement = conn.createStatement -try { - statement.executeUpdate(sql) -} finally { - statement.close() + case SaveMode.Ignore => // Just ignore this case. --- End diff -- Definitely. I will mark
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
Github user JustinPihony commented on a diff in the pull request: https://github.com/apache/spark/pull/15263#discussion_r80738565 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -50,67 +51,52 @@ class JdbcRelationProvider extends CreatableRelationProvider JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) } - /* - * The following structure applies to this code: - * |tableExists| !tableExists - * - * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation - * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation - * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation - * | saveTable, BaseRelation | - * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation - * - * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate - */ override def createRelation( sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], - data: DataFrame): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url -val table = jdbcOptions.table - + df: DataFrame): BaseRelation = { +val options = new JDBCOptions(parameters) +val url = options.url +val table = options.table +val createTableOptions = options.createTableOptions +val isTruncate = options.isTruncate val props = new Properties() props.putAll(parameters.asJava) -val conn = JdbcUtils.createConnectionFactory(url, props)() +val conn = JdbcUtils.createConnectionFactory(url, props)() try { val tableExists = JdbcUtils.tableExists(conn, url, table) + if (tableExists) { +mode match { + case SaveMode.Overwrite => +if (isTruncate && isCascadingTruncateTable(url).contains(false)) { + // In this case, we should truncate table and then load. + truncateTable(conn, table) + saveTable(df, url, table, props) +} else { + // Otherwise, do not truncate but just drop. + dropTable(conn, table) + createTable(df, url, table, createTableOptions, conn) + saveTable(df, url, table, props) +} + case SaveMode.Append => +saveTable(df, url, table, props) - val (doCreate, doSave) = (mode, tableExists) match { -case (SaveMode.Ignore, true) => (false, false) -case (SaveMode.ErrorIfExists, true) => throw new AnalysisException( - s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") -case (SaveMode.Overwrite, true) => - if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { -JdbcUtils.truncateTable(conn, table) -(false, true) - } else { -JdbcUtils.dropTable(conn, table) -(true, true) - } -case (SaveMode.Append, true) => (false, true) -case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," + - " for handling existing tables.") -case (_, false) => (true, true) - } + case SaveMode.ErrorIfExists => +sys.error(s"Table $table already exists.") + + case SaveMode.Ignore => // Just ignore this case. +} + } else { +mode match { + case SaveMode.Overwrite | SaveMode.Append | SaveMode.ErrorIfExists => +createTable(df, url, table, createTableOptions, conn) +saveTable(df, url, table, props) - if (doCreate) { -val schema = JdbcUtils.schemaString(data, url) -// To allow certain options to append when create a new table, which can be -// table_options or partition_options. -// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8" -val createtblOptions = jdbcOptions.createTableOptions -val sql = s"CREATE TABLE $table ($schema) $createtblOptions" -val statement = conn.createStatement -try { - statement.executeUpdate(sql) -} finally { - statement.close() + case SaveMode.Ignore => // Just ignore this case. --- End diff -- Also, I'd say add a
[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15263 [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelationProvider ## What changes were proposed in this pull request? This PR proposes cleaning up the confusing part in `createRelation` as discussed in https://github.com/apache/spark/pull/12601/files#r80627940 ## How was this patch tested? Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-14525 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15263 commit fcb2bda185018c4ea214e0d07cc532559e24f590 Author: hyukjinkwonDate: 2016-09-27T15:18:41Z Clean up confusing bit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org