[ 
https://issues.apache.org/jira/browse/SPARK-52576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-52576:
----------------------------------
        Parent: SPARK-51727
    Issue Type: Sub-task  (was: Improvement)

> In Declarative Pipelines, drop/recreate on full refresh and MV update
> ---------------------------------------------------------------------
>
>                 Key: SPARK-52576
>                 URL: https://issues.apache.org/jira/browse/SPARK-52576
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Declarative Pipelines
>    Affects Versions: 4.1.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Some pipeline runs result in wiping out and replacing all the data for a 
> table:
>  * Every run of a materialized view
>  * Runs of streaming tables that have the "full refresh" flag
> In the current implementation, this "wipe out and replace" is implemented by:
>  * Truncating the table
>  * Altering the table to drop/update/add columns that don't match the columns 
> in the DataFrame for the current run
> The reason that we want originally wanted to truncate + alter instead of drop 
> / recreate is that dropping has some undesirable effects. E.g. it interrupts 
> readers of the table and wipes away things like ACLs.
> However, we discovered that not all catalogs support dropping columns (e.g. 
> Hive does not), and there’s no way to tell whether a catalog supports 
> dropping columns or not. So change the implementation to drop/recreate the 
> table instead of truncate/alter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to