writing over JDBC

Noritaka Sekiyama (Jira) Mon, 22 Jun 2020 17:38:23 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Noritaka Sekiyama updated SPARK-32013:
--------------------------------------
    Description: 
For ETL workload, there is a common requirement to perform SQL statement 
before/after reading/writing over JDBC.
 Here's examples;
 - Create a view with specific conditions
 - Delete/Update some records
 - Truncate a table (it is already possible in `truncate` option)
 - Execute stored procedure (it is also requested in SPARK-32014)

Currently `query` options is available to specify SQL statement against JDBC 
datasource when loading data as DataFrame.
 However, this query is only for reading data, and it does not support the 
common examples listed above.

On the other hand, there is `sessionInitStatement` option available before 
writing data from DataFrame.
This option is to run custom SQL in order to implement session initialization 
code. Since it runs per session, it cannot be used for write operations.

 

If Spark can support executing SQL statement against JDBC datasources 
before/after reading/writing over JDBC, it can cover a lot of common use-cases.

Note: Databricks' old Redshift connector has similar option like `preactions` 
and `postactions`. [https://github.com/databricks/spark-redshift]

  was:
For ETL workload, there is a common requirement to perform SQL statement 
before/after reading/writing over JDBC.
 Here's examples;
 - Create a view with specific conditions
 - Delete/Update some records
 - Truncate a table (it is already possible in `truncate` option)
 - Execute stored procedure (it is also requested in SPARK-32014)

Currently `query` options is available to specify SQL statement against JDBC 
datasource when loading data as DataFrame.
 [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html]
 However, this query is only for reading data, and it does not support the 
common examples listed above.

If Spark can support executing SQL statement against JDBC datasources 
before/after reading/writing over JDBC, it can cover a lot of common use-cases.

Note: Databricks' old Redshift connector has similar option like `preactions` 
and `postactions`. [https://github.com/databricks/spark-redshift]


> Support query execution before/after reading/writing over JDBC
> --------------------------------------------------------------
>
>                 Key: SPARK-32013
>                 URL: https://issues.apache.org/jira/browse/SPARK-32013
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Noritaka Sekiyama
>            Priority: Major
>
> For ETL workload, there is a common requirement to perform SQL statement 
> before/after reading/writing over JDBC.
>  Here's examples;
>  - Create a view with specific conditions
>  - Delete/Update some records
>  - Truncate a table (it is already possible in `truncate` option)
>  - Execute stored procedure (it is also requested in SPARK-32014)
> Currently `query` options is available to specify SQL statement against JDBC 
> datasource when loading data as DataFrame.
>  However, this query is only for reading data, and it does not support the 
> common examples listed above.
> On the other hand, there is `sessionInitStatement` option available before 
> writing data from DataFrame.
> This option is to run custom SQL in order to implement session initialization 
> code. Since it runs per session, it cannot be used for write operations.
>  
> If Spark can support executing SQL statement against JDBC datasources 
> before/after reading/writing over JDBC, it can cover a lot of common 
> use-cases.
> Note: Databricks' old Redshift connector has similar option like `preactions` 
> and `postactions`. [https://github.com/databricks/spark-redshift]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32013) Support query execution before/after reading/writing over JDBC

Reply via email to