Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Wenchen Fan Mon, 01 Nov 2021 07:59:47 -0700

The general idea looks great. This is indeed a complicated API and we
probably need more time to evaluate the API design. It's better to commit
this work earlier so that we have more time to verify it before the 3.3
release. Maybe we can commit the group-based API first, then the
delta-based one, as the delta-based API is significantly more convoluted.


On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vii...@apache.org> wrote:

>
> Thanks for the initial feedback.
>
> I think previously the community is busy on the works related to Spark 3.2
> release.
> As 3.2 release was done, I'd like to bring this up to the surface again
> and seek for more discussion and feedback.
>
> Thanks.
>
> On 2021/06/25 15:49:49, huaxin gao <huaxin.ga...@gmail.com> wrote:
> > I took a quick look at the PR and it looks like a great feature to have.
> It
> > provides unified APIs for data sources to perform the commonly used
> > operations easily and efficiently, so users don't have to implement
> > customer extensions on their own. Thanks Anton for the work!
> >
> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vii...@apache.org> wrote:
> >
> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is
> also
> > > my first time to shepherd a SPIP, so please let me know if anything I
> can
> > > improve.
> > >
> > > This looks great features and the rationale claimed by the proposal
> makes
> > > sense. These operations are getting more common and more important in
> big
> > > data workloads. Instead of building custom extensions by individual
> data
> > > sources, it makes more sense to support the API from Spark.
> > >
> > > Please provide your thoughts about the proposal and the design.
> Appreciate
> > > your feedback. Thank you!
> > >
> > > On 2021/06/24 23:53:32, Anton Okolnychyi <aokolnyc...@gmail.com>
> wrote:
> > > > Hey everyone,
> > > >
> > > > I'd like to start a discussion on adding support for executing
> row-level
> > > > operations such as DELETE, UPDATE, MERGE for v2 tables
> (SPARK-35801). The
> > > > execution should be the same across data sources and the best way to
> do
> > > > that is to implement it in Spark.
> > > >
> > > > Right now, Spark can only parse and to some extent analyze DELETE,
> > > UPDATE,
> > > > MERGE commands. Data sources that support row-level changes have to
> build
> > > > custom Spark extensions to execute such statements. The goal of this
> > > effort
> > > > is to come up with a flexible and easy-to-use API that will work
> across
> > > > data sources.
> > > >
> > > > Design doc:
> > > >
> > >
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> > > >
> > > > PR for handling DELETE statements:
> > > > https://github.com/apache/spark/pull/33008
> > > >
> > > > Any feedback is more than welcome.
> > > >
> > > > Liang-Chi was kind enough to shepherd this effort. Thanks!
> > > >
> > > > - Anton
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Reply via email to