subject:"\[GitHub\] spark pull request #15263\: \[SPARK\-14525\]\[SQL\]\[FOLLOWUP\] Clean up JdbcRelatio..."

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-10-07 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r82490916
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -50,67 +51,52 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
 
-  /*
-   * The following structure applies to this code:
-   * |tableExists|  !tableExists
-   
*
-   * Ignore  | BaseRelation  | CreateTable, saveTable, 
BaseRelation
-   * ErrorIfExists   | ERROR | CreateTable, saveTable, 
BaseRelation
-   * Overwrite*  | (DropTable, CreateTable,) | CreateTable, saveTable, 
BaseRelation
-   * | saveTable, BaseRelation   |
-   * Append  | saveTable, BaseRelation   | CreateTable, saveTable, 
BaseRelation
-   *
-   * *Overwrite & tableExists with truncate, will not drop & create, but 
instead truncate
-   */
   override def createRelation(
   sqlContext: SQLContext,
   mode: SaveMode,
   parameters: Map[String, String],
-  data: DataFrame): BaseRelation = {
-val jdbcOptions = new JDBCOptions(parameters)
-val url = jdbcOptions.url
-val table = jdbcOptions.table
-
+  df: DataFrame): BaseRelation = {
+val options = new JDBCOptions(parameters)
+val url = options.url
+val table = options.table
+val createTableOptions = options.createTableOptions
+val isTruncate = options.isTruncate
 val props = new Properties()
 props.putAll(parameters.asJava)
-val conn = JdbcUtils.createConnectionFactory(url, props)()
 
+val conn = JdbcUtils.createConnectionFactory(url, props)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
+  if (tableExists) {
+mode match {
+  case SaveMode.Overwrite =>
+if (isTruncate && 
isCascadingTruncateTable(url).contains(false)) {
--- End diff --

one thing -- please avoid using Option.contains in the future. It is in 
general actually very confusing and more error prone, especially to people not 
super familiar with Scala, to use option as a collection (or monad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-10-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15263


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81283911
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
+import scala.collection.JavaConverters._
--- End diff --

Yup. This is even documented in 
https://github.com/databricks/scala-style-guide#imports as you might already 
know. I just thought this is an exceptional case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81283611
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
+import scala.collection.JavaConverters._
--- End diff --

This might be an exception. My comment is just a general principle. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81283082
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
+import scala.collection.JavaConverters._
--- End diff --

Yeap, but it seems `import scala.collection.JavaConverters._` is being used 
a lot across the codebase. So, I thought both are fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81282421
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
+import scala.collection.JavaConverters._
--- End diff --

We should avoid using `x.y.z._`, if possible


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81278191
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
+import scala.collection.JavaConverters._
--- End diff --

I think both are fine but will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81275138
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,9 +19,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
+import scala.collection.JavaConverters._
--- End diff --

revert it back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81274703
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -697,4 +697,27 @@ object JdbcUtils extends Logging {
   getConnection, table, iterator, rddSchema, nullTypes, batchSize, 
dialect, isolationLevel)
 )
   }
+
+  /**
+   * Creates a table according to the given schema.
--- End diff --

`Creates a table with a given schema`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81274640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -697,4 +697,27 @@ object JdbcUtils extends Logging {
   getConnection, table, iterator, rddSchema, nullTypes, batchSize, 
dialect, isolationLevel)
 )
   }
+
+  /**
+   * Creates a table according to the given schema.
+   */
+  def createTable(
+  schema: StructType,
+  url: String,
+  table: String,
+  createTableOptions: String,
+  conn: Connection): Unit = {
+val strSchema = schemaString(schema, url)
+// Create the table if the table didn't exist.
--- End diff --

`didn't` -> `does not`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81147277
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -697,4 +697,27 @@ object JdbcUtils extends Logging {
   getConnection, table, iterator, rddSchema, nullTypes, batchSize, 
dialect, isolationLevel)
 )
   }
+
+  /**
+   * Creates a table according to the given schema.
+   */
+  def createTable(
+  df: DataFrame,
--- End diff --

Yeap. I intended to make the changes minimised.  Let me fix them together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81147460
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -697,4 +697,27 @@ object JdbcUtils extends Logging {
   getConnection, table, iterator, rddSchema, nullTypes, batchSize, 
dialect, isolationLevel)
 )
   }
+
+  /**
+   * Creates a table according to the given schema.
+   */
+  def createTable(
+  df: DataFrame,
--- End diff --

Yeap. I intended to make the changes minimised. Let me fix them together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r81124219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -697,4 +697,27 @@ object JdbcUtils extends Logging {
   getConnection, table, iterator, rddSchema, nullTypes, batchSize, 
dialect, isolationLevel)
 )
   }
+
+  /**
+   * Creates a table according to the given schema.
+   */
+  def createTable(
+  df: DataFrame,
--- End diff --

why `createTable` takes a `DataFrame` instead of a `StructType`? I think 
it's clearer to make `schemaString` and this method takes `StrcutType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-28 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80957517
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1058,7 +1059,21 @@ the Data Sources API. The following options are 
supported:
   The JDBC fetch size, which determines how many rows to fetch per 
round trip. This can help performance on JDBC drivers which default to low 
fetch size (eg. Oracle with 10 rows).
 
   
-  
+
+  
+batchsize
--- End diff --

Same here. No need to change the document in this PR. How about minimizing 
the code changes and remove all these unrelated changes? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-28 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80957168
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -61,3 +61,12 @@ class JDBCOptions(
   // TODO: to reuse the existing partition parameters for those partition 
specific options
   val createTableOptions = parameters.getOrElse("createTableOptions", "")
 }
+
+object JDBCOptions {
+  // TODO: Theses property names are used in `JdbcUtils`, 
`PostgresDialect` and `JDBCRDD`. It'd be
--- End diff --

If you plan to resolve this in a separate PR, how about remove all these 
changes from this follow-up PR? These changes are not necessary, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80845890
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1058,7 +1059,21 @@ the Data Sources API. The following options are 
supported:
   The JDBC fetch size, which determines how many rows to fetch per 
round trip. This can help performance on JDBC drivers which default to low 
fetch size (eg. Oracle with 10 rows).
 
   
-  
+
+  
+batchsize
+
+  The JDBC batch size, which determines how many rows to insert per 
round trip. This can help performance on JDBC drivers.
+
+  
+
+  
+isolationLevel
+
+  The transaction isolation level, which applies to current 
connection. Please refer the documenation in java.sql.Connection.
--- End diff --

`current connection` -> `the current connection`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80845726
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -50,67 +51,48 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
 
-  /*
-   * The following structure applies to this code:
-   * |tableExists|  !tableExists
-   
*
-   * Ignore  | BaseRelation  | CreateTable, saveTable, 
BaseRelation
-   * ErrorIfExists   | ERROR | CreateTable, saveTable, 
BaseRelation
-   * Overwrite*  | (DropTable, CreateTable,) | CreateTable, saveTable, 
BaseRelation
-   * | saveTable, BaseRelation   |
-   * Append  | saveTable, BaseRelation   | CreateTable, saveTable, 
BaseRelation
-   *
-   * *Overwrite & tableExists with truncate, will not drop & create, but 
instead truncate
-   */
   override def createRelation(
   sqlContext: SQLContext,
   mode: SaveMode,
   parameters: Map[String, String],
-  data: DataFrame): BaseRelation = {
-val jdbcOptions = new JDBCOptions(parameters)
-val url = jdbcOptions.url
-val table = jdbcOptions.table
-
+  df: DataFrame): BaseRelation = {
+val options = new JDBCOptions(parameters)
+val url = options.url
+val table = options.table
+val createTableOptions = options.createTableOptions
+val isTruncate = options.isTruncate
 val props = new Properties()
 props.putAll(parameters.asJava)
-val conn = JdbcUtils.createConnectionFactory(url, props)()
 
+val conn = JdbcUtils.createConnectionFactory(url, props)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
+  if (tableExists) {
+mode match {
+  case SaveMode.Overwrite =>
+if (isTruncate && 
isCascadingTruncateTable(url).contains(false)) {
+  // In this case, we should truncate table and then load.
+  truncateTable(conn, table)
+  saveTable(df, url, table, props)
+} else {
+  // Otherwise, do not truncate but just drop.
+  dropTable(conn, table)
+  createTable(df, url, table, createTableOptions, conn)
+  saveTable(df, url, table, props)
+}
+  case SaveMode.Append =>
+saveTable(df, url, table, props)
 
-  val (doCreate, doSave) = (mode, tableExists) match {
-case (SaveMode.Ignore, true) => (false, false)
-case (SaveMode.ErrorIfExists, true) => throw new AnalysisException(
-  s"Table or view '$table' already exists, and SaveMode is set to 
ErrorIfExists.")
-case (SaveMode.Overwrite, true) =>
-  if (jdbcOptions.isTruncate && 
JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
-JdbcUtils.truncateTable(conn, table)
-(false, true)
-  } else {
-JdbcUtils.dropTable(conn, table)
-(true, true)
-  }
-case (SaveMode.Append, true) => (false, true)
-case (_, true) => throw new IllegalArgumentException(s"Unexpected 
SaveMode, '$mode'," +
-  " for handling existing tables.")
-case (_, false) => (true, true)
-  }
+  case SaveMode.ErrorIfExists =>
+throw new AnalysisException(
+  s"Table or view '$table' already exists, and SaveMode is set 
to ErrorIfExists.")
--- End diff --

`Table or view '$table' already exists. SaveMode: ErrorIfExists.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80845381
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -50,67 +51,48 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
 
-  /*
-   * The following structure applies to this code:
-   * |tableExists|  !tableExists
-   
*
-   * Ignore  | BaseRelation  | CreateTable, saveTable, 
BaseRelation
-   * ErrorIfExists   | ERROR | CreateTable, saveTable, 
BaseRelation
-   * Overwrite*  | (DropTable, CreateTable,) | CreateTable, saveTable, 
BaseRelation
-   * | saveTable, BaseRelation   |
-   * Append  | saveTable, BaseRelation   | CreateTable, saveTable, 
BaseRelation
-   *
-   * *Overwrite & tableExists with truncate, will not drop & create, but 
instead truncate
-   */
   override def createRelation(
   sqlContext: SQLContext,
   mode: SaveMode,
   parameters: Map[String, String],
-  data: DataFrame): BaseRelation = {
-val jdbcOptions = new JDBCOptions(parameters)
-val url = jdbcOptions.url
-val table = jdbcOptions.table
-
+  df: DataFrame): BaseRelation = {
+val options = new JDBCOptions(parameters)
+val url = options.url
+val table = options.table
+val createTableOptions = options.createTableOptions
+val isTruncate = options.isTruncate
 val props = new Properties()
 props.putAll(parameters.asJava)
-val conn = JdbcUtils.createConnectionFactory(url, props)()
 
+val conn = JdbcUtils.createConnectionFactory(url, props)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
+  if (tableExists) {
+mode match {
+  case SaveMode.Overwrite =>
+if (isTruncate && 
isCascadingTruncateTable(url).contains(false)) {
+  // In this case, we should truncate table and then load.
+  truncateTable(conn, table)
+  saveTable(df, url, table, props)
+} else {
+  // Otherwise, do not truncate but just drop.
--- End diff --

This comment looks weird. In RDBMS, users might do `drop and recreate` a 
table, instead of truncating it. Thus, please change it to

`// Otherwise, do not truncate but just drop and recreate`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80844741
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -50,67 +51,48 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
 
-  /*
-   * The following structure applies to this code:
-   * |tableExists|  !tableExists
-   
*
-   * Ignore  | BaseRelation  | CreateTable, saveTable, 
BaseRelation
-   * ErrorIfExists   | ERROR | CreateTable, saveTable, 
BaseRelation
-   * Overwrite*  | (DropTable, CreateTable,) | CreateTable, saveTable, 
BaseRelation
-   * | saveTable, BaseRelation   |
-   * Append  | saveTable, BaseRelation   | CreateTable, saveTable, 
BaseRelation
-   *
-   * *Overwrite & tableExists with truncate, will not drop & create, but 
instead truncate
-   */
   override def createRelation(
   sqlContext: SQLContext,
   mode: SaveMode,
   parameters: Map[String, String],
-  data: DataFrame): BaseRelation = {
-val jdbcOptions = new JDBCOptions(parameters)
-val url = jdbcOptions.url
-val table = jdbcOptions.table
-
+  df: DataFrame): BaseRelation = {
+val options = new JDBCOptions(parameters)
+val url = options.url
+val table = options.table
+val createTableOptions = options.createTableOptions
+val isTruncate = options.isTruncate
 val props = new Properties()
 props.putAll(parameters.asJava)
-val conn = JdbcUtils.createConnectionFactory(url, props)()
 
+val conn = JdbcUtils.createConnectionFactory(url, props)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
+  if (tableExists) {
+mode match {
+  case SaveMode.Overwrite =>
+if (isTruncate && 
isCascadingTruncateTable(url).contains(false)) {
+  // In this case, we should truncate table and then load.
+  truncateTable(conn, table)
+  saveTable(df, url, table, props)
+} else {
+  // Otherwise, do not truncate but just drop.
+  dropTable(conn, table)
+  createTable(df, url, table, createTableOptions, conn)
+  saveTable(df, url, table, props)
+}
+  case SaveMode.Append =>
+saveTable(df, url, table, props)
 
-  val (doCreate, doSave) = (mode, tableExists) match {
-case (SaveMode.Ignore, true) => (false, false)
-case (SaveMode.ErrorIfExists, true) => throw new AnalysisException(
-  s"Table or view '$table' already exists, and SaveMode is set to 
ErrorIfExists.")
-case (SaveMode.Overwrite, true) =>
-  if (jdbcOptions.isTruncate && 
JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
-JdbcUtils.truncateTable(conn, table)
-(false, true)
-  } else {
-JdbcUtils.dropTable(conn, table)
-(true, true)
-  }
-case (SaveMode.Append, true) => (false, true)
-case (_, true) => throw new IllegalArgumentException(s"Unexpected 
SaveMode, '$mode'," +
-  " for handling existing tables.")
-case (_, false) => (true, true)
-  }
+  case SaveMode.ErrorIfExists =>
+throw new AnalysisException(
+  s"Table or view '$table' already exists, and SaveMode is set 
to ErrorIfExists.")
 
-  if (doCreate) {
-val schema = JdbcUtils.schemaString(data, url)
-// To allow certain options to append when create a new table, 
which can be
-// table_options or partition_options.
-// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
-val createtblOptions = jdbcOptions.createTableOptions
-val sql = s"CREATE TABLE $table ($schema) $createtblOptions"
-val statement = conn.createStatement
-try {
-  statement.executeUpdate(sql)
-} finally {
-  statement.close()
+  case SaveMode.Ignore => // Just ignore this case.
--- End diff --

Please explicitly explain the behavior. `// Just ignore this case` is 
almost useless.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80739029
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -50,67 +51,52 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
 
-  /*
-   * The following structure applies to this code:
-   * |tableExists|  !tableExists
-   
*
-   * Ignore  | BaseRelation  | CreateTable, saveTable, 
BaseRelation
-   * ErrorIfExists   | ERROR | CreateTable, saveTable, 
BaseRelation
-   * Overwrite*  | (DropTable, CreateTable,) | CreateTable, saveTable, 
BaseRelation
-   * | saveTable, BaseRelation   |
-   * Append  | saveTable, BaseRelation   | CreateTable, saveTable, 
BaseRelation
-   *
-   * *Overwrite & tableExists with truncate, will not drop & create, but 
instead truncate
-   */
   override def createRelation(
   sqlContext: SQLContext,
   mode: SaveMode,
   parameters: Map[String, String],
-  data: DataFrame): BaseRelation = {
-val jdbcOptions = new JDBCOptions(parameters)
-val url = jdbcOptions.url
-val table = jdbcOptions.table
-
+  df: DataFrame): BaseRelation = {
+val options = new JDBCOptions(parameters)
+val url = options.url
+val table = options.table
+val createTableOptions = options.createTableOptions
+val isTruncate = options.isTruncate
 val props = new Properties()
 props.putAll(parameters.asJava)
-val conn = JdbcUtils.createConnectionFactory(url, props)()
 
+val conn = JdbcUtils.createConnectionFactory(url, props)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
+  if (tableExists) {
+mode match {
+  case SaveMode.Overwrite =>
+if (isTruncate && 
isCascadingTruncateTable(url).contains(false)) {
+  // In this case, we should truncate table and then load.
+  truncateTable(conn, table)
+  saveTable(df, url, table, props)
+} else {
+  // Otherwise, do not truncate but just drop.
+  dropTable(conn, table)
+  createTable(df, url, table, createTableOptions, conn)
+  saveTable(df, url, table, props)
+}
+  case SaveMode.Append =>
+saveTable(df, url, table, props)
 
-  val (doCreate, doSave) = (mode, tableExists) match {
-case (SaveMode.Ignore, true) => (false, false)
-case (SaveMode.ErrorIfExists, true) => throw new AnalysisException(
-  s"Table or view '$table' already exists, and SaveMode is set to 
ErrorIfExists.")
-case (SaveMode.Overwrite, true) =>
-  if (jdbcOptions.isTruncate && 
JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
-JdbcUtils.truncateTable(conn, table)
-(false, true)
-  } else {
-JdbcUtils.dropTable(conn, table)
-(true, true)
-  }
-case (SaveMode.Append, true) => (false, true)
-case (_, true) => throw new IllegalArgumentException(s"Unexpected 
SaveMode, '$mode'," +
-  " for handling existing tables.")
-case (_, false) => (true, true)
-  }
+  case SaveMode.ErrorIfExists =>
+sys.error(s"Table $table already exists.")
+
+  case SaveMode.Ignore => // Just ignore this case.
+}
+  } else {
+mode match {
+  case SaveMode.Overwrite | SaveMode.Append | 
SaveMode.ErrorIfExists =>
+createTable(df, url, table, createTableOptions, conn)
+saveTable(df, url, table, props)
 
-  if (doCreate) {
-val schema = JdbcUtils.schemaString(data, url)
-// To allow certain options to append when create a new table, 
which can be
-// table_options or partition_options.
-// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
-val createtblOptions = jdbcOptions.createTableOptions
-val sql = s"CREATE TABLE $table ($schema) $createtblOptions"
-val statement = conn.createStatement
-try {
-  statement.executeUpdate(sql)
-} finally {
-  statement.close()
+  case SaveMode.Ignore => // Just ignore this case.
--- End diff --

Definitely. I will mark

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread JustinPihony

Github user JustinPihony commented on a diff in the pull request:

https://github.com/apache/spark/pull/15263#discussion_r80738565
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -50,67 +51,52 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
 
-  /*
-   * The following structure applies to this code:
-   * |tableExists|  !tableExists
-   
*
-   * Ignore  | BaseRelation  | CreateTable, saveTable, 
BaseRelation
-   * ErrorIfExists   | ERROR | CreateTable, saveTable, 
BaseRelation
-   * Overwrite*  | (DropTable, CreateTable,) | CreateTable, saveTable, 
BaseRelation
-   * | saveTable, BaseRelation   |
-   * Append  | saveTable, BaseRelation   | CreateTable, saveTable, 
BaseRelation
-   *
-   * *Overwrite & tableExists with truncate, will not drop & create, but 
instead truncate
-   */
   override def createRelation(
   sqlContext: SQLContext,
   mode: SaveMode,
   parameters: Map[String, String],
-  data: DataFrame): BaseRelation = {
-val jdbcOptions = new JDBCOptions(parameters)
-val url = jdbcOptions.url
-val table = jdbcOptions.table
-
+  df: DataFrame): BaseRelation = {
+val options = new JDBCOptions(parameters)
+val url = options.url
+val table = options.table
+val createTableOptions = options.createTableOptions
+val isTruncate = options.isTruncate
 val props = new Properties()
 props.putAll(parameters.asJava)
-val conn = JdbcUtils.createConnectionFactory(url, props)()
 
+val conn = JdbcUtils.createConnectionFactory(url, props)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
+  if (tableExists) {
+mode match {
+  case SaveMode.Overwrite =>
+if (isTruncate && 
isCascadingTruncateTable(url).contains(false)) {
+  // In this case, we should truncate table and then load.
+  truncateTable(conn, table)
+  saveTable(df, url, table, props)
+} else {
+  // Otherwise, do not truncate but just drop.
+  dropTable(conn, table)
+  createTable(df, url, table, createTableOptions, conn)
+  saveTable(df, url, table, props)
+}
+  case SaveMode.Append =>
+saveTable(df, url, table, props)
 
-  val (doCreate, doSave) = (mode, tableExists) match {
-case (SaveMode.Ignore, true) => (false, false)
-case (SaveMode.ErrorIfExists, true) => throw new AnalysisException(
-  s"Table or view '$table' already exists, and SaveMode is set to 
ErrorIfExists.")
-case (SaveMode.Overwrite, true) =>
-  if (jdbcOptions.isTruncate && 
JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
-JdbcUtils.truncateTable(conn, table)
-(false, true)
-  } else {
-JdbcUtils.dropTable(conn, table)
-(true, true)
-  }
-case (SaveMode.Append, true) => (false, true)
-case (_, true) => throw new IllegalArgumentException(s"Unexpected 
SaveMode, '$mode'," +
-  " for handling existing tables.")
-case (_, false) => (true, true)
-  }
+  case SaveMode.ErrorIfExists =>
+sys.error(s"Table $table already exists.")
+
+  case SaveMode.Ignore => // Just ignore this case.
+}
+  } else {
+mode match {
+  case SaveMode.Overwrite | SaveMode.Append | 
SaveMode.ErrorIfExists =>
+createTable(df, url, table, createTableOptions, conn)
+saveTable(df, url, table, props)
 
-  if (doCreate) {
-val schema = JdbcUtils.schemaString(data, url)
-// To allow certain options to append when create a new table, 
which can be
-// table_options or partition_options.
-// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
-val createtblOptions = jdbcOptions.createTableOptions
-val sql = s"CREATE TABLE $table ($schema) $createtblOptions"
-val statement = conn.createStatement
-try {
-  statement.executeUpdate(sql)
-} finally {
-  statement.close()
+  case SaveMode.Ignore => // Just ignore this case.
--- End diff --

Also, I'd say add a

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

2016-09-27 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/15263

[SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelationProvider

## What changes were proposed in this pull request?

This PR proposes cleaning up the confusing part in `createRelation` as 
discussed in https://github.com/apache/spark/pull/12601/files#r80627940

## How was this patch tested?

Existing tests should cover this.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-14525

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15263


commit fcb2bda185018c4ea214e0d07cc532559e24f590
Author: hyukjinkwon 
Date:   2016-09-27T15:18:41Z

Clean up confusing bit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

[GitHub] spark pull request #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelatio...

22 matches

Site Navigation

Mail list logo

Footer information