[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-32013: ------------------------------------ Assignee: Apache Spark > Support query execution before/after reading/writing over JDBC > -------------------------------------------------------------- > > Key: SPARK-32013 > URL: https://issues.apache.org/jira/browse/SPARK-32013 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Noritaka Sekiyama > Assignee: Apache Spark > Priority: Major > > For ETL workload, there is a common requirement to perform SQL statement > before/after reading/writing over JDBC. > Here's examples; > - Create a view with specific conditions > - Delete/Update some records > - Truncate a table (it is already possible in `truncate` option) > - Execute stored procedure (it is also requested in SPARK-32014) > Currently `query` options is available to specify SQL statement against JDBC > datasource when loading data as DataFrame. > However, this query is only for reading data, and it does not support the > common examples listed above. > On the other hand, there is `sessionInitStatement` option available before > writing data from DataFrame. > This option is to run custom SQL in order to implement session > initialization code. Since it runs per session, it cannot be used for > non-idempotent operations. > > If Spark can support executing SQL statement against JDBC datasources > before/after reading/writing over JDBC, it can cover a lot of common > use-cases. > Note: Databricks' old Redshift connector has similar option like `preactions` > and `postactions`. [https://github.com/databricks/spark-redshift] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org