[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Noritaka Sekiyama updated SPARK-32013: -------------------------------------- Description: For ETL workload, there is a common requirement to perform SQL statement before/after reading/writing over JDBC. Here's examples; - Create a view with specific conditions - Delete/Update some records - Truncate a table (it is already possible in `truncate` option) - Execute stored procedure Currently `query` options is available to specify SQL statement against JDBC datasource when loading data as DataFrame. https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html However, this query is only for reading data, and it does not support the common examples listed above. If Spark can support executing SQL statement against JDBC datasources before/after reading/writing over JDBC, it can cover a lot of common use-cases. Note: Databricks' old Redshift connector has similar option like `preactions` and `postactions`. was: For ETL workload, there is a common requirement to perform SQL statement before/after reading/writing over JDBC. Here's examples; - Create a view with specific conditions - Delete/Update some records - Truncate a table (it is already possible in `truncate` option) - Execute stored procedure Currently `query` options is available to specify SQL statement against JDBC datasource when loading data as DataFrame. https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html However, this query is only for reading data, and it does not support the common examples listed above. If Spark can support executing SQL statement against JDBC datasources before/after reading/writing over JDBC, it can cover a lot of common use-cases. > Support query execution before/after reading/writing over JDBC > -------------------------------------------------------------- > > Key: SPARK-32013 > URL: https://issues.apache.org/jira/browse/SPARK-32013 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Noritaka Sekiyama > Priority: Major > > For ETL workload, there is a common requirement to perform SQL statement > before/after reading/writing over JDBC. > Here's examples; > - Create a view with specific conditions > - Delete/Update some records > - Truncate a table (it is already possible in `truncate` option) > - Execute stored procedure > Currently `query` options is available to specify SQL statement against JDBC > datasource when loading data as DataFrame. > https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html > However, this query is only for reading data, and it does not support the > common examples listed above. > If Spark can support executing SQL statement against JDBC datasources > before/after reading/writing over JDBC, it can cover a lot of common > use-cases. > Note: Databricks' old Redshift connector has similar option like `preactions` > and `postactions`. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org