[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2
[ https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679464#comment-17679464 ] Anton Okolnychyi commented on SPARK-35801: -- We should probably keep it open even beyond 3.4 as the item is not complete. I will add sub-tasks as we go. > SPIP: Row-level operations in Data Source V2 > > > Key: SPARK-35801 > URL: https://issues.apache.org/jira/browse/SPARK-35801 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Major > Labels: SPIP > > Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more > important for modern Big Data workflows. Use cases include but are not > limited to deleting a set of records for regulatory compliance, updating a > set of records to fix an issue in the ingestion pipeline, applying changes in > a transaction log to a fact table. Row-level operations allow users to easily > express their use cases that would otherwise require much more SQL. Common > patterns for updating partitions are to read, union, and overwrite or read, > diff, and append. Using commands like MERGE, these operations are easier to > express and can be more efficient to run. > Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] > and Spark should implement similar support. > SPIP: > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2
[ https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677256#comment-17677256 ] Dongjoon Hyun commented on SPARK-35801: --- Hi, [~viirya]and [~aokolnychyi]. Are we going to open this in Apache Spark 3.4.0 as `Unresolved`? > SPIP: Row-level operations in Data Source V2 > > > Key: SPARK-35801 > URL: https://issues.apache.org/jira/browse/SPARK-35801 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Major > Labels: SPIP > > Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more > important for modern Big Data workflows. Use cases include but are not > limited to deleting a set of records for regulatory compliance, updating a > set of records to fix an issue in the ingestion pipeline, applying changes in > a transaction log to a fact table. Row-level operations allow users to easily > express their use cases that would otherwise require much more SQL. Common > patterns for updating partitions are to read, union, and overwrite or read, > diff, and append. Using commands like MERGE, these operations are easier to > express and can be more efficient to run. > Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] > and Spark should implement similar support. > SPIP: > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2
[ https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485508#comment-17485508 ] L. C. Hsieh commented on SPARK-35801: - I think we can leave this open and put sub-tasks under this, like https://issues.apache.org/jira/browse/SPARK-34849. > SPIP: Row-level operations in Data Source V2 > > > Key: SPARK-35801 > URL: https://issues.apache.org/jira/browse/SPARK-35801 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Major > Labels: SPIP > > Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more > important for modern Big Data workflows. Use cases include but are not > limited to deleting a set of records for regulatory compliance, updating a > set of records to fix an issue in the ingestion pipeline, applying changes in > a transaction log to a fact table. Row-level operations allow users to easily > express their use cases that would otherwise require much more SQL. Common > patterns for updating partitions are to read, union, and overwrite or read, > diff, and append. Using commands like MERGE, these operations are easier to > express and can be more efficient to run. > Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] > and Spark should implement similar support. > SPIP: > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2
[ https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485502#comment-17485502 ] Anton Okolnychyi commented on SPARK-35801: -- [~viirya], shall we keep this one open until the implementation is done or can we close it now? The community has already voted on this SPIP. > SPIP: Row-level operations in Data Source V2 > > > Key: SPARK-35801 > URL: https://issues.apache.org/jira/browse/SPARK-35801 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Priority: Major > Labels: SPIP > > Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more > important for modern Big Data workflows. Use cases include but are not > limited to deleting a set of records for regulatory compliance, updating a > set of records to fix an issue in the ingestion pipeline, applying changes in > a transaction log to a fact table. Row-level operations allow users to easily > express their use cases that would otherwise require much more SQL. Common > patterns for updating partitions are to read, union, and overwrite or read, > diff, and append. Using commands like MERGE, these operations are easier to > express and can be more efficient to run. > Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] > and Spark should implement similar support. > SPIP: > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org