[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2017-12-22 Thread danielvdende
GitHub user danielvdende opened a pull request:

https://github.com/apache/spark/pull/20057

[SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource

This commit adds the `cascadeTruncate` option to the JDBC datasource
API, for databases that support this functionality (PostgreSQL and
Oracle at the moment). This allows for applying a cascading truncate
that affects tables that have foreign key constraints on the table
being truncated.

## What changes were proposed in this pull request?

Add `cascadeTruncate` option to JDBC datasource API. Allow this to affect 
the 
`TRUNCATE` query for databases that support this option.

## How was this patch tested?
Existing tests for `truncateQuery` were updated. Also, an additional test 
was added
to ensure that the correct syntax was applied, and that enabling the config 
for databases
that do not support this option does not result in invalid queries.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielvdende/spark SPARK-22880

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20057


commit 842eca7fa282a81f0f4d459242869c022ef21080
Author: Daniel van der Ende 
Date:   2017-12-22T15:58:28Z

[SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource

This commit adds the cascadeTruncate option to the JDBC datasource
API, for databases that support this functionality (PostgreSQL and
Oracle at the moment). This allows for applying a cascading truncate
that affects tables that have foreign key constraints on the table
being truncated.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2017-12-22 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun this is the functionality we discussed in PR for 
SPARK-22729, would be great to hear your opinion on this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2017-12-22 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r158555400
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -120,11 +121,12 @@ abstract class JdbcDialect extends Serializable {
* The SQL query that should be used to truncate a table. Dialects can 
override this method to
* return a query that is suitable for a particular database. For 
PostgreSQL, for instance,
* a different query is used to prevent "TRUNCATE" affecting other 
tables.
-   * @param table The name of the table.
+   * @param table The table to truncate
+   * @param cascade (OPTIONAL) Whether or not to cascade the truncation. 
Default: false
* @return The SQL query to use for truncating a table
*/
   @Since("2.3.0")
-  def getTruncateQuery(table: String): String = {
+  def getTruncateQuery(table: String, cascade: Boolean = false): String = {
--- End diff --

That would force all dialects to implement this method though, which would 
lead to unnecessary code duplication. Moreover, removing the default value here 
would change the public API, as it would force others who have written custom 
dialects to change calls to this method. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2017-12-22 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
I think the default for all dialects should be false, regardless of whether 
the cascade feature is even supported. And for those for which it is supported, 
it should default to false.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2017-12-22 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r158557222
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -120,11 +121,12 @@ abstract class JdbcDialect extends Serializable {
* The SQL query that should be used to truncate a table. Dialects can 
override this method to
* return a query that is suitable for a particular database. For 
PostgreSQL, for instance,
* a different query is used to prevent "TRUNCATE" affecting other 
tables.
-   * @param table The name of the table.
+   * @param table The table to truncate
+   * @param cascade (OPTIONAL) Whether or not to cascade the truncation. 
Default: false
* @return The SQL query to use for truncating a table
*/
   @Since("2.3.0")
-  def getTruncateQuery(table: String): String = {
+  def getTruncateQuery(table: String, cascade: Boolean = false): String = {
--- End diff --

I think the default for all dialects should be false, regardless of whether 
the cascade feature is even supported. And for those for which it is supported, 
it should default to false. It's a feature that you should only use if you 
explicitly want to cascade imho. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2017-12-22 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@gatorsmile could you explain why you have doubts about the feature? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2017-12-26 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@gatorsmile would be great to hear why you doubt the value of the feature 
:). I know that for us it would be extremely valuable (at the moment we have to 
do an extra step in our data pipeline because this feature is missing in 
Spark), but of course we're not the only ones using Spark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-01-02 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r159183234
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1339,6 +1339,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with case. This option applies only to writing.
--- End diff --

I think it raises the question of how complete/incomplete the Spark JDBC 
API should be, and what the use cases are that it should serve. For the most 
simple cases, in which no key constraints are set between tables, you won't 
need this option. However, as soon as foreign key constraints are introduced, 
it is very important. I agree that not every functionality from SQL (dialects) 
should be included, but I personally feel this is quite fundamental 
functionality. 

Moreover, as it's configuration option users that don't want it also don't 
have to use it. I think we also discussed this functionality in previous PR 
with @dongjoon-hyun  here: 
[SPARK-22729](https://github.com/apache/spark/pull/19911)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-01-02 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r159370871
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1339,6 +1339,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with case. This option applies only to writing.
--- End diff --

@dongjoon-hyun apologies if I misrepresented your comments/opinions from 
the previous PR, wasn't my intention :-).

I've given it some more thought, and I can see you point about the default 
value. I'll make the change we discussed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-01-02 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r159371171
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1339,6 +1339,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with case. This option applies only to writing.
--- End diff --

@dongjoon-hyun one thing I'm thinking of (just curious to hear your 
opinion): could we use the value of `isCascadingTruncateTable` for each dialect 
as the default value for the `cascade` boolean? In that way, there is only a 
single boolean per dialect specifying what the default behaviour is with regard 
to cascading during truncates.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-01-03 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
I've made some changes to the code; the hardcoded `false` value for 
`cascade` in `JdbcDialects` is now replaced by a default value of 
`isCascadingTruncateTable`. This is also the case for each individual dialect, 
with a pattern match in the dialects for Postgres and Oracle, as they support 
the cascading behaviour. Curious to hear your thoughts on this @dongjoon-hyun 
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-01-10 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun @gatorsmile any further thoughts on this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-01-18 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun @gatorsmile any further update? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-05 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@Stephan202 thanks for pointing out those docs issues, just pushed the 
changes :-).
@gatorsmile @dongjoon-hyun would you have a chance to take a look at this 
again? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun @gatorsmile sorry to keep asking, but could you let me know 
when we can get this merged?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-15 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Tests are failing on a spark streaming test. I think it's probably because 
of the age of this PR, will rebase to get the changes into the PR that were 
merged into master since I opened the PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Hmm, not it fails the OrcQuerySuite. This PR doesn't touch any of the Orc 
implementation in Spark. Could this be a flaky test @gatorsmile ?
```org.scalatest.exceptions.TestFailedDueToTimeoutException: The code 
passed to eventually never returned normally. Attempted 12 times over 
10.15875468798 seconds. Last failure message: There are 1 possibly leaked 
file streams..```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Hi guys, @Fokko @gatorsmile, completely agree with what @Fokko mentioned, 
our main reason for wanting to get away from Sqoop is also for stability 
reasons and to get rid of MapReduce in preparation for our move to Kubernetes 
(or something similar). We've also seen it to be much faster than Sqoop. In 
terms of why we need the feature in this PR: we have some tables in PostgreSQL 
that have foreign keys linking them. We have also specified a schema for these 
tables. If we use the drop-and-recreate option, Spark will determine the 
schema, overriding our PostgreSQL schema. Obviously, these should match up, but 
I personally don't like that Spark can do this (and that you can't explicitly 
tell it not to). 

Because of this behaviour, we currently require 2 tasks in Airflow (as 
@Fokko mentioned) to ensure the tables are truncated, but the schema stays in 
place. This PR would enable us to specify in a single, idempotent (Airflow) 
task that we want to truncate the table before putting new data in there. The 
cascade enables us to not break foreign key relations and cause errors.

To be clear, this therefore isn't emulating a Sqoop feature (as a Sqoop 
task isn't idempotent), but is in fact improving on what Sqoop offers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Thanks guys! @gatorsmile @dongjoon-hyun 
Happy to help out expanding Spark SQL JDBC where necessary to match and 
improve on sqoop :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168921808
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -119,6 +119,8 @@ class JDBCOptions(
   // 
   // if to truncate the table from the JDBC database
   val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean
+
+  val isCascadeTruncate: Option[Boolean] = 
parameters.get(JDBC_CASCADE_TRUNCATE).map(_.toBoolean)
--- End diff --

The reason I didn't do that is because of the existence of the 
`isCascadingTruncateTable` function for each dialect. According to the docs, 
that indicates whether or not a `TRUNCATE TABLE` command results in cascading 
behaviour by default for a given dialect. I thought it would be nice to then 
use that value as the default value for `isCascadeTruncate`. In that way, if 
there ever is a dialect that cascades truncations by default, we don't 
'hardcode' a default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168921833
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -102,7 +102,12 @@ object JdbcUtils extends Logging {
 val dialect = JdbcDialects.get(options.url)
 val statement = conn.createStatement
 try {
-  statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  if (options.isCascadeTruncate.isDefined) {
+statement.executeUpdate(dialect.getTruncateQuery(options.table,
+  options.isCascadeTruncate))
+  } else {
+statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  }
--- End diff --

see the previous comment on why this is an `Option[Boolean]` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168921851
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1372,6 +1372,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with care. This option applies only to writing.
--- End diff --

As mentioned in another comment, I think we should use the value of 
`isCascadingTruncateTable` as the default, rather than always `false`. Seems 
like the correct use of that variable. I can add a sentence to the docs 
specifying that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168922006
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -860,14 +860,41 @@ class JDBCSuite extends SparkFunSuite
 val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db")
 val h2 = JdbcDialects.get(url)
 val derby = JdbcDialects.get("jdbc:derby:db")
+val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db")
+
 val table = "weblogs"
 val defaultQuery = s"TRUNCATE TABLE $table"
 val postgresQuery = s"TRUNCATE TABLE ONLY $table"
+
 assert(MySQL.getTruncateQuery(table) == defaultQuery)
 assert(Postgres.getTruncateQuery(table) == postgresQuery)
 assert(db2.getTruncateQuery(table) == defaultQuery)
 assert(h2.getTruncateQuery(table) == defaultQuery)
 assert(derby.getTruncateQuery(table) == defaultQuery)
+assert(oracle.getTruncateQuery(table) == defaultQuery)
+  }
+
+  test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") {
+// cascade in a truncate should only be applied for databases that 
support this,
+// even if the parameter is passed.
+val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db")
--- End diff --

Good point, I saw a few other tests in there (untouched by the PR 
initially) that had this, fixed those now too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168922010
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -860,14 +860,41 @@ class JDBCSuite extends SparkFunSuite
 val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db")
 val h2 = JdbcDialects.get(url)
 val derby = JdbcDialects.get("jdbc:derby:db")
+val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db")
+
 val table = "weblogs"
 val defaultQuery = s"TRUNCATE TABLE $table"
 val postgresQuery = s"TRUNCATE TABLE ONLY $table"
+
 assert(MySQL.getTruncateQuery(table) == defaultQuery)
 assert(Postgres.getTruncateQuery(table) == postgresQuery)
 assert(db2.getTruncateQuery(table) == defaultQuery)
 assert(h2.getTruncateQuery(table) == defaultQuery)
 assert(derby.getTruncateQuery(table) == defaultQuery)
+assert(oracle.getTruncateQuery(table) == defaultQuery)
+  }
+
+  test("SPARK-22880: Truncate table with CASCADE by jdbc dialect") {
+// cascade in a truncate should only be applied for databases that 
support this,
+// even if the parameter is passed.
+val MySQL = JdbcDialects.get("jdbc:mysql://127.0.0.1/db")
+val Postgres = JdbcDialects.get("jdbc:postgresql://127.0.0.1/db")
+val db2 = JdbcDialects.get("jdbc:db2://127.0.0.1/db")
+val h2 = JdbcDialects.get(url)
+val derby = JdbcDialects.get("jdbc:derby:db")
+val oracle = JdbcDialects.get("jdbc:oracle://127.0.0.1/db")
--- End diff --

Done 👍 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Thanks for your review @dongjoon-hyun 👍 I've corrected all the 
indentation problems (didn't show up locally when running scalastyle checks for 
some reason). Added comments where necessary for the other points. 

Sure, no problem to hold this until Spark 2.4, would be great if it could 
land there. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168929792
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala ---
@@ -42,4 +42,17 @@ private object MsSqlServerDialect extends JdbcDialect {
   }
 
   override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for MsSql 
as it is unsupported.
--- End diff --

Hmm, I disagree with this, the reason I added it was to show users that 
even if they specified `cascadeTruncate` being `true`, it doesn't take effect 
for some databases (like MsSQL). The `isCascadingTruncateTable` only shows the 
default behaviour, which for all databases at the moment is false. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168929798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -94,5 +94,21 @@ private case object OracleDialect extends JdbcDialect {
 case _ => value
   }
 
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The JDBCOptions.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable()
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-17 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168929874
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1372,6 +1372,13 @@ the following case-insensitive options:
  This is a JDBC writer related option. When 
SaveMode.Overwrite is enabled, this option causes Spark to 
truncate an existing table instead of dropping and recreating it. This can be 
more efficient, and prevents the table metadata (e.g., indices) from being 
removed. However, it will not work in some cases, such as when the new data has 
a different schema. It defaults to false. This option applies only 
to writing.

   
+  
+  
+cascadeTruncate
+
+This is a JDBC writer related option. If enabled and supported by 
the JDBC database (PostgreSQL and Oracle at the moment), this options allows 
execution of a TRUNCATE TABLE t CASCADE. This will affect other 
tables, and thus should be used with care. This option applies only to writing. 
It defaults to the default cascading truncate behaviour of the JDBC database in 
question, specified in the isCascadeTruncate in each JDBCDialect.
--- End diff --

Yeah, so the "problem" there is that PostgreSQL has inheritance across 
tables (as we discovered in 
[SPARK-22729](https://github.com/apache/spark/pull/19911). To be completely 
transparent to users, I think the `ONLY` part of the query for PostgreSQL 
should also be configurable, but I think that's outside the scope of this PR. 
I've added a comment saying that for PostgreSQL a `ONLY` is added to prevent 
descendant tables from being truncated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-17 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Thanks again @dongjoon-hyun, much appreciated 👍. I've made changes 
accordingly, and added comments in this PR where appropriate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-18 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r168943159
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect {
 s"SELECT 1 FROM $table LIMIT 1"
   }
 
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
   /**
-  * The SQL query used to truncate a table. For Postgres, the default 
behaviour is to
-  * also truncate any descendant tables. As this is a (possibly unwanted) 
side-effect,
-  * the Postgres dialect adds 'ONLY' to truncate only the table in question
-  * @param table The name of the table.
-  * @return The SQL query to use for truncating a table
-  */
-  override def getTruncateQuery(table: String): String = {
-s"TRUNCATE TABLE ONLY $table"
+   * The SQL query used to truncate a table. For Postgres, the default 
behaviour is to
+   * also truncate any descendant tables. As this is a (possibly unwanted) 
side-effect,
+   * the Postgres dialect adds 'ONLY' to truncate only the table in 
question
+   * @param table The table to truncate
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the value of
+   *isCascadingTruncateTable()
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
+  table: String,
+  cascade: Option[Boolean] = isCascadingTruncateTable): String = {
+cascade match {
+  case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE"
--- End diff --

Sure, I made a quick example, as you can see using `TRUNCATE TABLE ONLY 
$table CASCADE` will truncate the foreign key-ed table, but leave the 
inheritance relationship intact:
```postgres=# CREATE SCHEMA test;
CREATE SCHEMA
postgres=# CREATE TABLE parent(a INT);
CREATE TABLE
postgres=# ALTER TABLE parent ADD CONSTRAINT some_constraint PRIMARY KEY(a);
ALTER TABLE
postgres=# CREATE TABLE child(b INT) INHERITS (parent);
CREATE TABLE
postgres=# CREATE TABLE forkey(c INT);
CREATE TABLE
postgres=# ALTER TABLE forkey ADD FOREIGN KEY(c) REFERENCES parent(a);
ALTER TABLE

postgres=# INSERT INTO parent VALUES(1);
INSERT 0 1
postgres=# select * from parent;
 a
---
 1
(1 row)

postgres=# select * from child;
 a | b
---+---
(0 rows)

postgres=# INSERT INTO child VALUES(2);
INSERT 0 1
postgres=# select * from child;
 a | b
---+---
 2 |
(1 row)

postgres=# select * from parent;
 a
---
 1
 2
(2 rows)

postgres=# INSERT INTO forkey VALUES(1);
INSERT 0 1
postgres=# select * from forkey;
 c
---
 1
(1 row)

postgres=# TRUNCATE TABLE ONLY parent CASCADE;
NOTICE:  truncate cascades to table "forkey"
TRUNCATE TABLE
postgres=# select * from parent;
 a
---
 2
(1 row)

postgres=# select * from child;
 a | b
---+---
 2 |
(1 row)

postgres=# select * from forkey;
 c
---
(0 rows)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-19 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r169025602
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -85,15 +85,24 @@ private object PostgresDialect extends JdbcDialect {
 s"SELECT 1 FROM $table LIMIT 1"
   }
 
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
   /**
-  * The SQL query used to truncate a table. For Postgres, the default 
behaviour is to
-  * also truncate any descendant tables. As this is a (possibly unwanted) 
side-effect,
-  * the Postgres dialect adds 'ONLY' to truncate only the table in question
-  * @param table The name of the table.
-  * @return The SQL query to use for truncating a table
-  */
-  override def getTruncateQuery(table: String): String = {
-s"TRUNCATE TABLE ONLY $table"
+   * The SQL query used to truncate a table. For Postgres, the default 
behaviour is to
+   * also truncate any descendant tables. As this is a (possibly unwanted) 
side-effect,
+   * the Postgres dialect adds 'ONLY' to truncate only the table in 
question
+   * @param table The table to truncate
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the value of
+   *isCascadingTruncateTable()
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
+  table: String,
+  cascade: Option[Boolean] = isCascadingTruncateTable): String = {
+cascade match {
+  case Some(true) => s"TRUNCATE TABLE ONLY $table CASCADE"
--- End diff --

Done :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-21 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r169583476
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala ---
@@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect {
 case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
 case _ => None
   }
+
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The table to truncate.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for 
Teradata as it is unsupported
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
+  table: String,
+  cascade: Option[Boolean] = isCascadingTruncateTable): String = {
+s"TRUNCATE TABLE $table"
--- End diff --

Thanks @klinvill for helping out :-)
@dongjoon-hyun I've made the changes to `TeradataDialect` and updated the 
tests accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-21 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun Made the changes you pointed out, thanks! 👍 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-03-05 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun I saw that Spark 2.3 was released a few days ago, congrats 
on the release! :-) Is there anything stopping us from merging this PR into 
master now? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18304: [SPARK-21098] Add lineseparator parameter to csv ...

2017-06-14 Thread danielvdende
GitHub user danielvdende opened a pull request:

https://github.com/apache/spark/pull/18304

[SPARK-21098] Add lineseparator parameter to csv options

This commit adds the lineSeparator option to the spark support for
reading/writing csv files.

## What changes were proposed in this pull request?

This PR adds support for specifying the `lineSeparator` option when 
reading/writing to/from csv. This functionality is supported by the Univocity 
parsers library in use by Spark.

## How was this patch tested?

A file with DOS (`\r\n`) line endings was added to the CsvSuite, which 
should produce the exact same results given the correct line separator


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielvdende/spark 
add-csv-line-separator-option

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18304


commit 9684a39394b1fd35636af6adc3e45394b01934cf
Author: Daniel van der Ende 
Date:   2017-06-14T18:19:50Z

[SPARK-21098] Add lineseparator parameter to csv options

This commit adds the lineSeparator option to the spark support for
reading/writing csv files.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18304: [SPARK-21098] Add lineseparator parameter to csv ...

2017-06-15 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/18304#discussion_r122135698
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ class CSVOptions(
   val quote = getChar("quote", '\"')
   val escape = getChar("escape", '\\')
   val comment = getChar("comment", '\u')
+  val lineSeparator = parameters.getOrElse("lineSeparator", "\n")
--- End diff --

Hi, thanks for pointing this out, I wasn't aware of this. Looking at it 
though, how would you propose fixing this? We would need a `writerSettings` 
instance to fetch the univocity default I think. Also, looking at the other 
parameters, I also see hardcoded  values there (for instance `val quote = 
getChar("quote", '\"')`). Personally, I like the idea of being able to see the 
default used directly in Spark, to prevent changes in the univocity lib from 
impacting Spark usage (and requiring confusion when using). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18304: [SPARK-21098] Add lineseparator parameter to csv ...

2017-06-15 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/18304#discussion_r122144727
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ class CSVOptions(
   val quote = getChar("quote", '\"')
   val escape = getChar("escape", '\\')
   val comment = getChar("comment", '\u')
+  val lineSeparator = parameters.getOrElse("lineSeparator", "\n")
--- End diff --

Ah, I think I understand now, thanks. 

Maybe I'm overlooking it, but I don't see the `wholeFile` option being used 
in `CsvOptions` at all at the moment (nor `multiline`) ?

I'll take a further look at fixing this up considering the dependency on 
`LineRecordReader` (which only supports the two line separators you mentioned)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18304: [SPARK-21098] Add lineseparator parameter to csv ...

2017-06-15 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/18304#discussion_r122189644
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ class CSVOptions(
   val quote = getChar("quote", '\"')
   val escape = getChar("escape", '\\')
   val comment = getChar("comment", '\u')
+  val lineSeparator = parameters.getOrElse("lineSeparator", "\n")
--- End diff --

Yeah, I did some digging, but that will probably be a lot trickier, because 
it would involve making changes in the Hadoop repo as well. For now, I've 
changed the content of this PR to only contain the hardcoding to `\n` as you 
suggested. I may take a look at the Hadoop code that would require changing as 
well, but that would require a lot more work than the scope of this story 
suggests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18304: [SPARK-21098] Set lineseparator csv multiline and...

2017-06-15 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/18304#discussion_r122194350
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ class CSVOptions(
   val quote = getChar("quote", '\"')
   val escape = getChar("escape", '\\')
   val comment = getChar("comment", '\u')
+  val lineSeparator = parameters.getOrElse("lineSeparator", "\n")
--- End diff --

Updated the PR (had already done the Jira and the commit).

I'll definitely take a look at that proposal. It seems like a very valuable 
addition to Spark (I've heard a few people around me want the ability to set 
the line separator themselves)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...

2017-06-26 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/18304
  
@HyukjinKwon any further update on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-21098] Correct cascade default for postgre...

2017-12-06 Thread danielvdende
GitHub user danielvdende opened a pull request:

https://github.com/apache/spark/pull/19911

[SPARK-21098] Correct cascade default for postgres

The PostgresDialect indicates that cascade is true by default for Postgres.
This is not the case. This patch fixes this.

## What changes were proposed in this pull request?

Set correct default cascade behaviour for Postgres.

## How was this patch tested?

Existing tests all pass.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielvdende/spark SPARK-22717

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19911


commit 76b63a7cb5e86f7cbf6c8a0531c887a6b78d7794
Author: Daniel van der Ende 
Date:   2017-12-06T14:06:25Z

[SPARK-21098] Correct cascade default for postgres

The PostgresDialect indicates that cascade is true by default for Postgres.
This is not the case. This patch fixes this.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres

2017-12-06 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@srowen yep, I can add the details from the JIRA to the PR if you like 
(just to make it easier to read this PR in future if necessary)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...

2017-12-06 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155330583
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect {
 
   }
 
-  override def isCascadingTruncateTable(): Option[Boolean] = Some(true)
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
--- End diff --

@dongjoon-hyun indeed, Spark does not use `TRUNCATE` for Postgres, due to 
the value of `JdbcUtils.isCascadingTruncateTable(url)`. The result, however, is 
that it will try to drop the table, which will fail for the exact same reason 
if there are tables dependent on the table to be truncated/dropped (e.g. a 
foreign key constraint). I think this raises another issue, that it should be 
possible to specify a `cascade` flag for these operations (but that's for 
another JIRA) Moreover, the definition of the 
`JdbcUtils.isCascadingTruncateTable` function is: ```Return Some[true] iff 
`TRUNCATE TABLE` causes cascading default.```, which isn't the case for 
Postgres (as mentioned in the JIRA) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...

2017-12-06 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155331702
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect {
 
   }
 
-  override def isCascadingTruncateTable(): Option[Boolean] = Some(true)
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
--- End diff --

```

CASCADE

Automatically truncate all tables that have foreign-key references to 
any of the named tables, or to any tables added to the group due to CASCADE.

``` 
So I think the difference is that `CASCADE` will also affect tables with 
foreign-key references to the table in question, rather than only its 
descendant tables. The function definition (and its name) seems to point more 
towards the definition of `CASCADE` behaviour than `ONLY` imho


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...

2017-12-06 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155335791
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect {
 
   }
 
-  override def isCascadingTruncateTable(): Option[Boolean] = Some(true)
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
--- End diff --

@dongjoon-hyun to be clear, I think there are 2 problems:
1. The PostgresDialect indicates that `CASCADE` is enabled by default for 
Postgres. This isn't the case as the Postgres docs show. 
2. As you correctly mention (this is what in my previous comment), Spark 
doesn't use `CASCADE` at all, which, especially considering the method this PR 
edits, is a bit odd I think. I plan to open a different JIRA ticket for this, 
and add it. This will be more work, and is outside the scope of the current 
JIRA.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres

2017-12-06 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
Sure, I'll make the changes :). I'll use this PR, seems to make sense, as 
it would fix truncate for postgres


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres

2017-12-07 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@dongjoon-hyun I made the changes as suggested by @bolkedebruin (so 
modifying the Postgres dialect to use `TRUNCATE ONLY`). Let me know what you 
think :) (and if I missed anything) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...

2017-12-07 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155477436
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable {
 s"SELECT * FROM $table WHERE 1=0"
   }
 
+  /**
+* The SQL query that should be used to truncate a table. Dialects can 
override this method to
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...

2017-12-07 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155477421
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -85,6 +85,10 @@ private object PostgresDialect extends JdbcDialect {
 s"SELECT 1 FROM $table LIMIT 1"
   }
 
+  override def getTruncateQuery(table: String): String = {
+s"TRUNCATE ONLY $table"
--- End diff --

yes will do 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...

2017-12-07 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155477578
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable {
 s"SELECT * FROM $table WHERE 1=0"
   }
 
+  /**
+* The SQL query that should be used to truncate a table. Dialects can 
override this method to
+* return a query that is suitable for a particular database
+* @param table The name of the table.
+* @return The SQL query to use for truncating a table
+*/
+  @Since("2.3.0")
--- End diff --

So create a new "improvement" JIRA and change the ticket number in this PR? 
(just checking I understand you correctly)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-07 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@dongjoon-hyun I've addressed the issues you mentioned :) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDi...

2017-12-07 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155515480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala ---
@@ -85,6 +85,10 @@ private object PostgresDialect extends JdbcDialect {
 s"SELECT 1 FROM $table LIMIT 1"
   }
 
+  override def getTruncateQuery(table: String): String = {
+s"TRUNCATE ONLY $table"
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDi...

2017-12-07 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155515466
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable {
 s"SELECT * FROM $table WHERE 1=0"
   }
 
+  /**
+* The SQL query that should be used to truncate a table. Dialects can 
override this method to
+* return a query that is suitable for a particular database
+* @param table The name of the table.
+* @return The SQL query to use for truncating a table
+*/
+  @Since("2.3.0")
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-07 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@dongjoon-hyun I was considering this. Wanted to hear your opinion on it 
first ;-). I'll make those changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-07 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@dongjoon-hyun ok made the changes, also replaced the test that was in 
place for `isCascadingTruncateTable` with one for the `getTruncateQuery` 
method. Right now, I've left the method in `JdbcDialects` unimplemented. This 
will cause a compile time error if you extend JdbcDialect. Another option would 
be to do this at runtime, so to implement the method with a `throw new 
UnimplementedException` and let each dialect override this. Which of these 
options do you prefer? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDi...

2017-12-07 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155622687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala ---
@@ -48,5 +48,7 @@ private object DB2Dialect extends JdbcDialect {
 case _ => None
   }
 
-  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+  override def getTruncateQuery(table: String): String = {
+s"TRUNCATE $table"
--- End diff --

The `JdbcDialect` object now no longer provides a default implementation 
for `getTruncateQuery` (as per @dongjoon-hyun 's suggestion). The reason is 
that this explicitly forces dialects to implement a truncate operation without 
side effects. 

One thing I don't like about this solution is the code duplication across 
dialects. But it does prevent dialects from inheriting default behaviour and 
(accidentally) truncating with side effects.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDi...

2017-12-09 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155916248
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -147,14 +157,6 @@ abstract class JdbcDialect extends Serializable {
 case arrayValue: Array[Any] => 
arrayValue.map(compileValue).mkString(", ")
 case _ => value
   }
-
-  /**
-   * Return Some[true] iff `TRUNCATE TABLE` causes cascading default.
-   * Some[true] : TRUNCATE TABLE causes cascading.
-   * Some[false] : TRUNCATE TABLE does not cause cascading.
-   * None: The behavior of TRUNCATE TABLE is unknown (default).
-   */
-  def isCascadingTruncateTable(): Option[Boolean] = None
--- End diff --

As a middle ground, could it be marked deprecated? Seeing as we don't 
really need it anymore...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDi...

2017-12-09 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/19911#discussion_r155916550
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -147,14 +157,6 @@ abstract class JdbcDialect extends Serializable {
 case arrayValue: Array[Any] => 
arrayValue.map(compileValue).mkString(", ")
 case _ => value
   }
-
-  /**
-   * Return Some[true] iff `TRUNCATE TABLE` causes cascading default.
-   * Some[true] : TRUNCATE TABLE causes cascading.
-   * Some[false] : TRUNCATE TABLE does not cause cascading.
-   * None: The behavior of TRUNCATE TABLE is unknown (default).
-   */
-  def isCascadingTruncateTable(): Option[Boolean] = None
--- End diff --

Although, custom third party dialects may want it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-09 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@dongjoon-hyun @gatorsmile As @gatorsmile pointed out, the 
`isCascadingTruncateTable` is a method in the public API, so we can't just drop 
it. I've made changes again, now the truncate query method is defined for all 
dialects by default, with only the PostgresDialect overriding the definition 
(for reasons we discussed before). Have also reinstated the test I removed 
before (to test the `isCascadingTruncateTable` function). Let me know what you 
think, once you're ok with the changes, I'll squash the commits into 1.

Thanks for helping out :), much appreciated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-11 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-12 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
Forgive my impatience, but is it supposed to take this long? @dongjoon-hyun 
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-12 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
@srowen Sorry, should have been clearer, I meant the triggering of the 
tests (I see they're running now, thanks! )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...

2018-09-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/18304
  
Sounds good to me :+1: @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18304: [SPARK-21098] Set lineseparator csv multiline and...

2018-09-27 Thread danielvdende
Github user danielvdende closed the pull request at:

https://github.com/apache/spark/pull/18304


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-03-13 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Gentle ping @gatorsmile :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-03-26 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Gentle ping again @gatorsmile @dongjoon-hyun :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-04-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@gatorsmile @dongjoon-hyun Guys, any update?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-07-20 Thread danielvdende
Github user danielvdende commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r203955129
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -102,7 +102,12 @@ object JdbcUtils extends Logging {
 val dialect = JdbcDialects.get(options.url)
 val statement = conn.createStatement
 try {
-  statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  if (options.isCascadeTruncate.isDefined) {
+statement.executeUpdate(dialect.getTruncateQuery(options.table,
+  options.isCascadeTruncate))
+  } else {
+statement.executeUpdate(dialect.getTruncateQuery(options.table))
+  }
--- End diff --

Sorry, missed this comment. I'll make the change (@gatorsmile I think this 
is the comment you meant, right?) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-07-20 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Just made the changes you mentioned @gatorsmile :-) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-07-10 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
@dongjoon-hyun @gatorsmile Any update, guys? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org