+1 (binding)

On Sun, 4 Aug 2024 at 21:14, Jarek Potiuk <ja...@potiuk.com> wrote:

> +1 (binding). I think it's a big improvement, and I agree the "incremental"
> part might be misleading as we essentially always "replace" (but with finer
> granularity - partition level) - so we never "add" things incrementally and
> people might be misled here.
>
> On Sun, Aug 4, 2024 at 10:02 PM Shahar Epstein <sha...@apache.org> wrote:
>
> > +1 (binding)
> >
> > On Fri, Aug 2, 2024 at 10:43 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Yup, I am fine removing that language to make it explicit but leave it
> up
> > > to TP.
> > >
> > > On Fri, 2 Aug 2024 at 19:56, Daniel Standish
> > > <daniel.stand...@astronomer.io.invalid> wrote:
> > >
> > > > My concern with the AIP is the talk of support for incremental data
> > > > pipelines.  In an incremental data pipeline, you don't think of a
> delta
> > > > load (let's say a collection of updated rows) as a partition.  A
> > > partition
> > > > in data is defined by a partition key, which should be an immutable
> > field
> > > > or fields in a record.  You can't use an "updated at" field as a
> > > partition
> > > > key because then the same record can be in multiple partitions.  And
> it
> > > > doesn't make sense either when you think about what it would mean to
> > > > "reprocess a partition" -- the rows that were in that partition now
> > might
> > > > not be there anymore.  So I think this AIP needs to not brand itself
> as
> > > any
> > > > kind of solution for incremental loads.
> > > > If your processing hive partitions (by time), and those data can be
> > > > updated, you might need to reprocess the last N partitions each time.
> > > > That's a common way to handle updates.  (And maybe something that we
> > > should
> > > > consider supporting in this AIP.)  If you're doing some kind of
> change
> > > > tracking, you're just processing rows or new files, and it doesn't
> make
> > > > sense to consider those a partition.
> > > > My suggestion would be to remove the language talking about
> incremental
> > > > loads from this AIP.
> > > >
> > >
> >
>

Reply via email to