I meant to do this last week but totally forgot. Since no-one really expressed 
any more concerns, I’ll finish the process and declare the results in a new 
thread.

TP


> On 5 Aug 2024, at 14:15, Ephraim Anierobi <ephraimanier...@gmail.com> wrote:
> 
> +1 (binding)
> 
> On Sun, 4 Aug 2024 at 21:14, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
>> +1 (binding). I think it's a big improvement, and I agree the "incremental"
>> part might be misleading as we essentially always "replace" (but with finer
>> granularity - partition level) - so we never "add" things incrementally and
>> people might be misled here.
>> 
>> On Sun, Aug 4, 2024 at 10:02 PM Shahar Epstein <sha...@apache.org> wrote:
>> 
>>> +1 (binding)
>>> 
>>> On Fri, Aug 2, 2024 at 10:43 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>> 
>>>> Yup, I am fine removing that language to make it explicit but leave it
>> up
>>>> to TP.
>>>> 
>>>> On Fri, 2 Aug 2024 at 19:56, Daniel Standish
>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>> 
>>>>> My concern with the AIP is the talk of support for incremental data
>>>>> pipelines.  In an incremental data pipeline, you don't think of a
>> delta
>>>>> load (let's say a collection of updated rows) as a partition.  A
>>>> partition
>>>>> in data is defined by a partition key, which should be an immutable
>>> field
>>>>> or fields in a record.  You can't use an "updated at" field as a
>>>> partition
>>>>> key because then the same record can be in multiple partitions.  And
>> it
>>>>> doesn't make sense either when you think about what it would mean to
>>>>> "reprocess a partition" -- the rows that were in that partition now
>>> might
>>>>> not be there anymore.  So I think this AIP needs to not brand itself
>> as
>>>> any
>>>>> kind of solution for incremental loads.
>>>>> If your processing hive partitions (by time), and those data can be
>>>>> updated, you might need to reprocess the last N partitions each time.
>>>>> That's a common way to handle updates.  (And maybe something that we
>>>> should
>>>>> consider supporting in this AIP.)  If you're doing some kind of
>> change
>>>>> tracking, you're just processing rows or new files, and it doesn't
>> make
>>>>> sense to consider those a partition.
>>>>> My suggestion would be to remove the language talking about
>> incremental
>>>>> loads from this AIP.
>>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to