[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155478097 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable { s"SELECT * FROM $table WHERE 1=0" } + /** +* The SQL query that should be used to truncate a table. Dialects can override this method to +* return a query that is suitable for a particular database +* @param table The name of the table. +* @return The SQL query to use for truncating a table +*/ + @Since("2.3.0") --- End diff -- Yep. And, we need to fix the following. ``` The PostgresDialect indicates that cascade is true by default for Postgres. This is not the case. This patch fixes this. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155477703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -98,10 +98,11 @@ object JdbcUtils extends Logging { /** * Truncates a table from the JDBC database. */ - def truncateTable(conn: Connection, table: String): Unit = { + def truncateTable(conn: Connection, options: JDBCOptions): Unit = { --- End diff -- Ur, it looks like a counter-intuitive change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155477421 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,6 +85,10 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def getTruncateQuery(table: String): String = { +s"TRUNCATE ONLY $table" --- End diff -- yes will do --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155477578 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable { s"SELECT * FROM $table WHERE 1=0" } + /** +* The SQL query that should be used to truncate a table. Dialects can override this method to +* return a query that is suitable for a particular database +* @param table The name of the table. +* @return The SQL query to use for truncating a table +*/ + @Since("2.3.0") --- End diff -- So create a new "improvement" JIRA and change the ticket number in this PR? (just checking I understand you correctly) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155477436 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable { s"SELECT * FROM $table WHERE 1=0" } + /** +* The SQL query that should be used to truncate a table. Dialects can override this method to --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155476723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,6 +85,10 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def getTruncateQuery(table: String): String = { +s"TRUNCATE ONLY $table" --- End diff -- Also, please add the similar explanation to `JdbcDialect.scala`, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155476152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -85,6 +85,10 @@ private object PostgresDialect extends JdbcDialect { s"SELECT 1 FROM $table LIMIT 1" } + override def getTruncateQuery(table: String): String = { +s"TRUNCATE ONLY $table" --- End diff -- Could you add a function comment about why we use `TRUNCATE ONLY` here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155475469 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable { s"SELECT * FROM $table WHERE 1=0" } + /** +* The SQL query that should be used to truncate a table. Dialects can override this method to --- End diff -- Indentation? The following will check Scala style violations for you. ``` $ dev/scalastyle ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155473951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -116,6 +116,17 @@ abstract class JdbcDialect extends Serializable { s"SELECT * FROM $table WHERE 1=0" } + /** +* The SQL query that should be used to truncate a table. Dialects can override this method to +* return a query that is suitable for a particular database +* @param table The name of the table. +* @return The SQL query to use for truncating a table +*/ + @Since("2.3.0") --- End diff -- @danielvdende . As we see here, this is a new feature. It seems that we had better a new JIRA as a `New Feature` with a title like `Add getTruncateQuery to JdbcDialect`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user bolkedebruin commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155342020 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- In not sure if you are agreeing or not agreeing with me now :-) , but at least as just shown truncate is supported and does not cascade by default on Postgres. Can we conclude that this change seems right for Postgres, which is the only affected supported database by Spark - others default to truncate? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155340831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Thanks for the extended example. The following was the original motivation of #14086 . I'm not disagree with you on this. ``` `drop/create` is an expensive operation ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user bolkedebruin commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155337956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Please note that drop/create is an expensive operation. In addition I don't think (imho) spark should ever do a drop/create as it changes the schema. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user bolkedebruin commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155336986 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- ``` airflow=# CREATE TABLE products ( airflow(# product_no integer PRIMARY KEY, airflow(# name text, airflow(# price numeric airflow(# ); CREATE TABLE airflow=# airflow=# CREATE TABLE orders ( airflow(# order_id integer PRIMARY KEY, airflow(# product_no integer REFERENCES products (product_no), airflow(# quantity integer airflow(# ); CREATE TABLE airflow=# insert into products VALUES (1, 1, 1); INSERT 0 1 airflow=# insert into orders VALUES (1,1,1); INSERT 0 1 airflow=# select * from products; product_no | name | price +--+--- 1 | 1| 1 (1 row) airflow=# select * from orders; order_id | product_no | quantity --++-- 1 | 1 |1 (1 row) airflow=# truncate orders; TRUNCATE TABLE airflow=# select * from products; product_no | name | price +--+--- 1 | 1| 1 (1 row) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155335791 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- @dongjoon-hyun to be clear, I think there are 2 problems: 1. The PostgresDialect indicates that `CASCADE` is enabled by default for Postgres. This isn't the case as the Postgres docs show. 2. As you correctly mention (this is what in my previous comment), Spark doesn't use `CASCADE` at all, which, especially considering the method this PR edits, is a bit odd I think. I plan to open a different JIRA ticket for this, and add it. This will be more work, and is outside the scope of the current JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155334735 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- I've looked at your test case again. `fix` is not a decendant table of `info`, it's a parent table. ``` domino=> truncate domino.info; ERROR: cannot truncate a table referenced in a foreign key constraint DETAIL: Table "fix" references "info". HINT: Truncate table "fix" at the same time, or use TRUNCATE ... CASCADE. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155333091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Spark don't use `CASCADE` at all. I'm wondering if your PR will not solve your problem. ```scala def truncateTable(conn: Connection, table: String): Unit = { val statement = conn.createStatement try { statement.executeUpdate(s"TRUNCATE TABLE $table") } finally { statement.close() } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155332414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- @danielvdende . Actually, I proposed TRUNCATE version as an optimized version of this operation. If DROP TABLE failed with the same reasons, it seems to be legitimate for some reasons (your configurations or previleges) BTW, if you enables this, does it solve your situation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155331702 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- ``` CASCADE Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE. ``` So I think the difference is that `CASCADE` will also affect tables with foreign-key references to the table in question, rather than only its descendant tables. The function definition (and its name) seems to point more towards the definition of `CASCADE` behaviour than `ONLY` imho --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user danielvdende commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155330583 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- @dongjoon-hyun indeed, Spark does not use `TRUNCATE` for Postgres, due to the value of `JdbcUtils.isCascadingTruncateTable(url)`. The result, however, is that it will try to drop the table, which will fail for the exact same reason if there are tables dependent on the table to be truncated/dropped (e.g. a foreign key constraint). I think this raises another issue, that it should be possible to specify a `cascade` flag for these operations (but that's for another JIRA) Moreover, the definition of the `JdbcUtils.isCascadingTruncateTable` function is: ```Return Some[true] iff `TRUNCATE TABLE` causes cascading default.```, which isn't the case for Postgres (as mentioned in the JIRA) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155330593 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Previously, we referred the following sentence in https://www.postgresql.org/docs/current/static/sql-truncate.html. ``` name The name (optionally schema-qualified) of a table to truncate. If ONLY is specified before the table name, only that table is truncated. If ONLY is not specified, the table and all its descendant tables (if any) are truncated. Optionally, * can be specified after the table name to explicitly indicate that descendant tables are included. ``` For your test case, it seems to be different, but could you point us the sentence you're referring exactly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155328166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- @bolkedebruin . Originally, Spark uses `DROP TABLE` for Overwrite mode. #14086 introduced `TRUNCATE` operation when `JdbcUtils.isCascadingTruncateTable(url) == Some(false)`. So, for PostgresDialect, Spark doesn't use TRUNCATE here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155325104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Thank you for pining me, @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user bolkedebruin commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155318711 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Sorry, you are the author of that comment. Missed that :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user bolkedebruin commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155318011 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- Actually @gatorsmile made the comment in #14086 that Postgres cascades truncates by default. Pointing to the exact same documentation as I do in the JIRA, but then referring to the opposite behavior. I still stand by my Jira and included proof (see Jira). Also Postgres supports a rollback of a truncate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19911: [SPARK-22717][SQL] Correct cascade default for po...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19911#discussion_r155306118 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala --- @@ -100,5 +100,5 @@ private object PostgresDialect extends JdbcDialect { } - override def isCascadingTruncateTable(): Option[Boolean] = Some(true) + override def isCascadingTruncateTable(): Option[Boolean] = Some(false) --- End diff -- cc @dongjoon-hyun https://github.com/apache/spark/pull/14086 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org