[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2

2023-01-21 Thread Anton Okolnychyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679464#comment-17679464
 ] 

Anton Okolnychyi commented on SPARK-35801:
--

We should probably keep it open even beyond 3.4 as the item is not complete. I 
will add sub-tasks as we go.

> SPIP: Row-level operations in Data Source V2
> 
>
> Key: SPARK-35801
> URL: https://issues.apache.org/jira/browse/SPARK-35801
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Major
>  Labels: SPIP
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
> important for modern Big Data workflows. Use cases include but are not 
> limited to deleting a set of records for regulatory compliance, updating a 
> set of records to fix an issue in the ingestion pipeline, applying changes in 
> a transaction log to a fact table. Row-level operations allow users to easily 
> express their use cases that would otherwise require much more SQL. Common 
> patterns for updating partitions are to read, union, and overwrite or read, 
> diff, and append. Using commands like MERGE, these operations are easier to 
> express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
> and Spark should implement similar support.
> SPIP: 
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2

2023-01-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677256#comment-17677256
 ] 

Dongjoon Hyun commented on SPARK-35801:
---

Hi, [~viirya]and [~aokolnychyi]. Are we going to open this in Apache Spark 
3.4.0 as `Unresolved`?

> SPIP: Row-level operations in Data Source V2
> 
>
> Key: SPARK-35801
> URL: https://issues.apache.org/jira/browse/SPARK-35801
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Major
>  Labels: SPIP
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
> important for modern Big Data workflows. Use cases include but are not 
> limited to deleting a set of records for regulatory compliance, updating a 
> set of records to fix an issue in the ingestion pipeline, applying changes in 
> a transaction log to a fact table. Row-level operations allow users to easily 
> express their use cases that would otherwise require much more SQL. Common 
> patterns for updating partitions are to read, union, and overwrite or read, 
> diff, and append. Using commands like MERGE, these operations are easier to 
> express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
> and Spark should implement similar support.
> SPIP: 
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2

2022-02-01 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485508#comment-17485508
 ] 

L. C. Hsieh commented on SPARK-35801:
-

I think we can leave this open and put sub-tasks under this, like 
https://issues.apache.org/jira/browse/SPARK-34849.

> SPIP: Row-level operations in Data Source V2
> 
>
> Key: SPARK-35801
> URL: https://issues.apache.org/jira/browse/SPARK-35801
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Major
>  Labels: SPIP
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
> important for modern Big Data workflows. Use cases include but are not 
> limited to deleting a set of records for regulatory compliance, updating a 
> set of records to fix an issue in the ingestion pipeline, applying changes in 
> a transaction log to a fact table. Row-level operations allow users to easily 
> express their use cases that would otherwise require much more SQL. Common 
> patterns for updating partitions are to read, union, and overwrite or read, 
> diff, and append. Using commands like MERGE, these operations are easier to 
> express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
> and Spark should implement similar support.
> SPIP: 
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2

2022-02-01 Thread Anton Okolnychyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485502#comment-17485502
 ] 

Anton Okolnychyi commented on SPARK-35801:
--

[~viirya], shall we keep this one open until the implementation is done or can 
we close it now? The community has already voted on this SPIP.

> SPIP: Row-level operations in Data Source V2
> 
>
> Key: SPARK-35801
> URL: https://issues.apache.org/jira/browse/SPARK-35801
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Priority: Major
>  Labels: SPIP
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more 
> important for modern Big Data workflows. Use cases include but are not 
> limited to deleting a set of records for regulatory compliance, updating a 
> set of records to fix an issue in the ingestion pipeline, applying changes in 
> a transaction log to a fact table. Row-level operations allow users to easily 
> express their use cases that would otherwise require much more SQL. Common 
> patterns for updating partitions are to read, union, and overwrite or read, 
> diff, and append. Using commands like MERGE, these operations are easier to 
> express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] 
> and Spark should implement similar support.
> SPIP: 
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org