Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-03 Thread Vikram Koka
Good point Jed.
I responded back to your comment in the doc as well and very open to
changing the term in the doc.

Used the term "interactive DAG run" as the ability to invoke or trigger a
DAG run through the API, with the expectation of getting back a result
immediately. An alternate term could be a "synchronous DAG run".

Regardless, this is a significant change so a good term to indicate the
expansion from "batch runs only" is warranted. Very open to different terms
here.

On Fri, May 3, 2024 at 4:05 PM Jed Cunningham 
wrote:

> Very exciting! Looks like we will have a busy period of time ahead of us.
> Overall I like the plan so far, especially using this year's Airflow Summit
> as an opportunity to announce and gather feedback, and the 2025 version to
> pitch upgrading.
>
> I left a comment in the doc, but we might want to iterate on the
> terminology we use for high priority or "synchronous" DAG runs to serve LLM
> responses - I find "interactive DAG runs" a bit confusing.
>


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-03 Thread Jed Cunningham
Very exciting! Looks like we will have a busy period of time ahead of us.
Overall I like the plan so far, especially using this year's Airflow Summit
as an opportunity to announce and gather feedback, and the 2025 version to
pitch upgrading.

I left a comment in the doc, but we might want to iterate on the
terminology we use for high priority or "synchronous" DAG runs to serve LLM
responses - I find "interactive DAG runs" a bit confusing.


Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs strategic (Airflow 3) approach

2024-05-03 Thread Kaxil Naik
Hi all,

As promised, we are pleased to share our proposal for Airflow 3
.


We met several community members in the last few days to get feedback on
this proposal, and we are glad to say that most of the things in the doc
resonated with them. It was also very pleasing to see that everyone we
talked to asked how they could help contribute towards Airflow 3 and making
it successful.

We would like to now open it up for feedback from the entire community.
Please add comments to this Google doc. This proposal is purposefully
high-level to get feedback on the general direction and we will have
several AIPs for the big pieces mentioned in the doc.

If there aren't any strong objections on why we (as the Airflow community)
should work on Airflow 3, we propose a dedicated fortnightly recurring
call starting the first week of June. This will give enough time to get
feedback on our proposal, incorporate any feedback and then focus our
discussion on the What & How of Airflow 3 as opposed to Why.

As we hit the 10-year mark at this year’s Airflow Summit, we have a unique
marketing opportunity to officially announce Airflow 3. We can either use
this milestone to just look back at a decade of growth or make it more
exciting to not only showcase growth but innovation and, more importantly,
get the Airflow users excited about the next chapter in Airflow’s story.
We’ll have the perfect platform to gather immediate feedback, engage the
community in shaping future features, and share our vision for what lies
ahead. We can then focus Airflow Summit 2025 to discuss how Airflow 3
features are used in the wild by companies around the world, showcase the
migration tools and utilities that will have matured by then, and gather
more community insights to continue improving Airflow. This will give our
users more confidence in Airflow 3 and help them feel comfortable upgrading
to it. This is, of course, just our proposal, and we would love to hear
what others think. It’s certainly ambitious, but nothing we haven’t done in
the past, and having a goal-post will help us all.

Hope you all have a great weekend.

Regards,
Kaxil, Vikram & Constance

On Mon, 29 Apr 2024 at 05:11, Amogh Desai  wrote:

> Thanks for starting the discussion, Jarek!
>
> I too agree that with the new upcoming features and AIPs, it might just be
> the right time
> to discuss the possibility of having Airflow 3. I agree with most reasons
> pointed out by others and
> I would love to see it happen, and also be a part of it.
>
> Since this is a major step for the future of Airflow, we need to carefully
> consider the user experience for
> users coming from versions of Airflow 2 and would not want this migration
> to be a pain.
>
> Btw, I concur with Jens and I too am not very clear when we say that "Gen
> AI is going to be the new trigger for Airflow".
> Would be obliged if someone could explain that portion to me :)
>
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Mon, Apr 22, 2024 at 1:52 PM Jarek Potiuk  wrote:
>
> > Just one comment here - while maybe "shocking" for some cases - yes, this
> > one has been clearly coming. Actually, it took a lot of my brain cycles
> > recently to think "what's next". Too much, to the point that I started
> the
> > thread.
> > I thought it might be quite a valuable opening from someone who always
> said
> > "well, we have to have **really** good reason to do Airflow 3" and "maybe
> > there will not be Airflow 3".
> >
> > And I quite agree with Kaxil - that trying to organise our thoughts
> around
> > what to do and how our Approach for Airflow 3 based on just this thread
> is
> > a bit too early.
> > I do not think this one thread here will lead to us deciding what to do -
> > if we try to do it now in a discussion thread or even a confluence doc,
> we
> > might fail achieving the goal.
> >
> > My main point here was to really get the feel and open thoughts of those
> > who are actively involved in Airflow - on what we should do next. And to
> > see if this is the right time to start thinking in "two" modes: Airflow 2
> > and Airflow 3 (even if we do not know yet what Airflow 3 will be).
> >
> > I'd rather let a free stream of thoughts of what people think should
> happen
> > here continue. Merely opening our minds to the possibility of Airflow 3.
> > And I would love to keep it flowing for others - without the goal of
> > organizing it or achieving consensus.
> >
> > And I think all that Kaxil writes about - starting a series of calls,
> > organizing our discussions, getting "product manager(s)" working on
> > organizing those discussions is the **right** thing to do.
> > How exactly to do that, how to make sure everyone is involved, while we
> are
> > not tied up in endless discussions and bike-shedding, should materialize
> > from our discussion.
> >
> > But I would propose (and encourage) others' thoughts here as well - just
> a
> 

Re: Refactor Scheduler Timed Events to be Async?

2024-05-03 Thread Ash Berlin-Taylor
One thing to bear in mind here is the number of db connections - each 
connection can only be used by one thread or coroutine at a time, so even when 
the scheduler is changed to use async db calls, we might not be able to do a 
lot of the scheduled tasks concurrently.

At least not without thinking a bit more about it anyways. Connection pooling 
would likely help here (and in fact it's the only time it makes sense, right 
now there should "never" be a reason for a sync airflow process to have more 
than one open connection)

-ash

On 3 May 2024 18:10:45 BST, Daniel Standish 
 wrote:
>But you could run them in a thread or subprocess.
>
>Another option would be to just take all of the timed events and make them
>all asyncio and then run them all via asyncio in one continually running
>thread.  That would be a bite size step towards AIP-70.  Though, it might
>be a large bite :)
>
>On Fri, May 3, 2024 at 6:29 AM Hussein Awala  wrote:
>
>> If we don't have many Asyncio tasks running in the event loop, there will
>> not be any benefit from migrating to asynchronous, IMHO it will be anyway
>> rewritten to be asynchronous as a part of AIP-70
>> <
>> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-70+Migrating+to+asynchronous+programming
>> >
>> (WIP)
>> where we will need to rewrite the scheduler if the AIP is accepted.
>>
>> On Fri, May 3, 2024 at 2:49 PM Ryan Hatter
>>  wrote:
>>
>> > This might be a dumb question as I don't have experience with asyncio,
>> but
>> > should the EventScheduler
>> > <
>> >
>> https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L930
>> > >
>> > in
>> > the Airflow scheduler be rewritten to be asynchronous?
>> >
>> > The so called "timed events" (e.g. zombie reaping, handling tasks stuck
>> in
>> > queued, etc > > AsyncEventScheduler: def __init__(self): self.tasks = [] async def
>> > call_regular_interval(self, interval, action, *args, **kwargs):
>> > """Schedules action to be called every `interval` seconds
>> > asynchronously.""" while True: await asyncio.sleep(interval) await
>> > action(*args, **kwargs) def schedule_task(self, interval, action, *args,
>> > **kwargs): """Add tasks that run periodically in an asynchronous
>> manner."""
>> > task = asyncio.create_task( self.call_regular_interval(interval, action,
>> > *args, **kwargs) ) self.tasks.append(task) async def detect_zombies():
>> > print("履") async def detect_stuck_queued_tasks(): print("Oh no! A task
>> is
>> > stuck in queued!") def scheduler_loop(): while True: print("Starting
>> > scheduling loop...") time.sleep(10) def _do_scheduling(): thread =
>> > threading.Thread(target=scheduler_loop) thread.start() async def main():
>> > scheduler = AsyncEventScheduler() scheduler.schedule_task(3,
>> > detect_zombies) scheduler.schedule_task(5, detect_stuck_queued_tasks)
>> > _do_scheduling() while True: print("EventScheduler running") await
>> > asyncio.sleep(1) asyncio.run(main())>) scheduled by this EventScheduler
>> are
>> > blocking and run queries against the DB that can occasionally be
>> expensive
>> > and cause substantial delays in the scheduler, which can result in
>> repeated
>> > scheduler restarts.
>> >
>> > Below is a trivialized example of what this might look like -- curious to
>> > hear your thoughts!
>> >
>> > import asyncio
>> > import threading
>> > import time
>> >
>> > class AsyncEventScheduler:
>> > def __init__(self):
>> > self.tasks = []
>> >
>> > async def call_regular_interval(self, interval, action, *args, **kwargs):
>> > """Schedules action to be called every `interval` seconds
>> > asynchronously."""
>> > while True:
>> > await asyncio.sleep(interval)
>> > await action(*args, **kwargs)
>> >
>> > def schedule_task(self, interval, action, *args, **kwargs):
>> > """Add tasks that run periodically in an asynchronous manner."""
>> > task = asyncio.create_task(
>> > self.call_regular_interval(interval, action, *args, **kwargs)
>> > )
>> > self.tasks.append(task)
>> >
>> > async def detect_zombies():
>> > print("履")
>> >
>> > async def detect_stuck_queued_tasks():
>> > print("Oh no! A task is stuck in queued!")
>> >
>> > def scheduler_loop():
>> > while True:
>> > print("Starting scheduling loop...")
>> > time.sleep(10)
>> >
>> > def _do_scheduling():
>> > thread = threading.Thread(target=scheduler_loop)
>> > thread.start()
>> >
>> > async def main():
>> > scheduler = AsyncEventScheduler()
>> > scheduler.schedule_task(3, detect_zombies)
>> > scheduler.schedule_task(5, detect_stuck_queued_tasks)
>> >
>> > _do_scheduling()
>> >
>> > while True:
>> > print("EventScheduler running")
>> > await asyncio.sleep(1)
>> >
>> > asyncio.run(main())
>> >
>>


Re: Refactor Scheduler Timed Events to be Async?

2024-05-03 Thread Daniel Standish
But you could run them in a thread or subprocess.

Another option would be to just take all of the timed events and make them
all asyncio and then run them all via asyncio in one continually running
thread.  That would be a bite size step towards AIP-70.  Though, it might
be a large bite :)

On Fri, May 3, 2024 at 6:29 AM Hussein Awala  wrote:

> If we don't have many Asyncio tasks running in the event loop, there will
> not be any benefit from migrating to asynchronous, IMHO it will be anyway
> rewritten to be asynchronous as a part of AIP-70
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-70+Migrating+to+asynchronous+programming
> >
> (WIP)
> where we will need to rewrite the scheduler if the AIP is accepted.
>
> On Fri, May 3, 2024 at 2:49 PM Ryan Hatter
>  wrote:
>
> > This might be a dumb question as I don't have experience with asyncio,
> but
> > should the EventScheduler
> > <
> >
> https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L930
> > >
> > in
> > the Airflow scheduler be rewritten to be asynchronous?
> >
> > The so called "timed events" (e.g. zombie reaping, handling tasks stuck
> in
> > queued, etc  > AsyncEventScheduler: def __init__(self): self.tasks = [] async def
> > call_regular_interval(self, interval, action, *args, **kwargs):
> > """Schedules action to be called every `interval` seconds
> > asynchronously.""" while True: await asyncio.sleep(interval) await
> > action(*args, **kwargs) def schedule_task(self, interval, action, *args,
> > **kwargs): """Add tasks that run periodically in an asynchronous
> manner."""
> > task = asyncio.create_task( self.call_regular_interval(interval, action,
> > *args, **kwargs) ) self.tasks.append(task) async def detect_zombies():
> > print("履") async def detect_stuck_queued_tasks(): print("Oh no! A task
> is
> > stuck in queued!") def scheduler_loop(): while True: print("Starting
> > scheduling loop...") time.sleep(10) def _do_scheduling(): thread =
> > threading.Thread(target=scheduler_loop) thread.start() async def main():
> > scheduler = AsyncEventScheduler() scheduler.schedule_task(3,
> > detect_zombies) scheduler.schedule_task(5, detect_stuck_queued_tasks)
> > _do_scheduling() while True: print("EventScheduler running") await
> > asyncio.sleep(1) asyncio.run(main())>) scheduled by this EventScheduler
> are
> > blocking and run queries against the DB that can occasionally be
> expensive
> > and cause substantial delays in the scheduler, which can result in
> repeated
> > scheduler restarts.
> >
> > Below is a trivialized example of what this might look like -- curious to
> > hear your thoughts!
> >
> > import asyncio
> > import threading
> > import time
> >
> > class AsyncEventScheduler:
> > def __init__(self):
> > self.tasks = []
> >
> > async def call_regular_interval(self, interval, action, *args, **kwargs):
> > """Schedules action to be called every `interval` seconds
> > asynchronously."""
> > while True:
> > await asyncio.sleep(interval)
> > await action(*args, **kwargs)
> >
> > def schedule_task(self, interval, action, *args, **kwargs):
> > """Add tasks that run periodically in an asynchronous manner."""
> > task = asyncio.create_task(
> > self.call_regular_interval(interval, action, *args, **kwargs)
> > )
> > self.tasks.append(task)
> >
> > async def detect_zombies():
> > print("履")
> >
> > async def detect_stuck_queued_tasks():
> > print("Oh no! A task is stuck in queued!")
> >
> > def scheduler_loop():
> > while True:
> > print("Starting scheduling loop...")
> > time.sleep(10)
> >
> > def _do_scheduling():
> > thread = threading.Thread(target=scheduler_loop)
> > thread.start()
> >
> > async def main():
> > scheduler = AsyncEventScheduler()
> > scheduler.schedule_task(3, detect_zombies)
> > scheduler.schedule_task(5, detect_stuck_queued_tasks)
> >
> > _do_scheduling()
> >
> > while True:
> > print("EventScheduler running")
> > await asyncio.sleep(1)
> >
> > asyncio.run(main())
> >
>


Re: [DISCUSS] simplifying try_number handling

2024-05-03 Thread Brent Bovenzi
+1
Pumped to remove confusion around tries

On Fri, May 3, 2024 at 5:01 AM Wei Lee  wrote:

> Thanks, Daniel! +1 for this one. This was confusing when I worked on the
> starting from triggerer stuff.
>
> Best,
> Wei
>
>
> > On May 3, 2024, at 11:59 AM, Amogh Desai 
> wrote:
> >
> > Looks good to me.
> >
> > Personally I never ran into any issues with this so far but I agree with
> > the issues it solves.
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Fri, May 3, 2024 at 2:50 AM Vincent Beck  wrote:
> >
> >> I am all +1 on this one. This thing gave me headaches when working on
> >> AIP-44 and I could not understand the difference between the private
> >> "_try_number" and the public "try_number". Thanks for simplifying it!
> >>
> >> This is obviously assuming it does not break anything I am not aware of
> :)
> >>
> >> On 2024/05/02 19:37:32 Daniel Standish wrote:
> >>> TLDR
> >>> * changing handling of try_number in
> >>> https://github.com/apache/airflow/pull/39336
> >>> * no more private attr
> >>> * no more getter that changes value based on state of task
> >>> * no more decrementing
> >>> * try number now only handled by scheduler
> >>> * hope that sounds good to all of you
> >>>
> >>> For more detail read on...
> >>>
> >>> In https://github.com/apache/airflow/pull/39336 I am doing some work
> to
> >>> resolve some longstanding pain and frustration caused by try_number.
> >>>
> >>> The way we handle try_number has for quite some time been messy and
> >>> problematic.
> >>>
> >>> For example, if you access `ti.try_number` and then change the state to
> >> or
> >>> from RUNNING, you will get a different value if you access it again!
> >>>
> >>> And the responsibility for managing this number has been distributed
> >>> throughout the codebase.  For example the task itself always increments
> >>> when it starts running.  But then if it defers or reschedules itself,
> it
> >>> decrements it back down so that when it runs again and naively
> >> increments,
> >>> then it will be right again.
> >>>
> >>> Recently more issues have become visible as I have worked with AIP-44
> >>> because for example pydantic does not like private attrs and it's just
> >>> awkward to know *what value to use* when serializing it when the TI
> will
> >>> give you a different answer depending on the state of the task!
> >>>
> >>> And there's yet another edge case being solved in this community PR
> >>>  >.
> >>> And then when we start looking at try history and AIP-64, it also
> >> forces a
> >>> look at this.
> >>>
> >>> So it all sounds bad and indeed it is bad but I think I have a
> solution.
> >>>
> >>> What I do is, just have the scheduler increment try_number at the
> moment
> >>> when it schedules the task.  It alone will have the responsibility for
> >>> incrementing try_number.  And no where will it ever be decremented.  It
> >>> will not be incremented when resuming after deferral or reschedule.
> And
> >>> that's about all there is to it.
> >>>
> >>> I've tested it out and it works.  But I'm working through many test
> >>> failures that need to be resolved (there's lots of asserts re
> >> try_number).
> >>>
> >>> One small thing I just want to point out is that if a user were
> >> previously
> >>> to be doing `task.run()` sort of manually without the task having been
> >>> scheduled by the scheduler, well now their try_number won't be
> >>> automatically incremented.  Same if they just do `airflow tasks run` --
> >>> because now the responsibility is going to be solely with the
> scheduler.
> >>> But airflow was never designed to assume that tasks will be run without
> >>> having been scheduled, so I do not think that counts as a breaking
> >> change.
> >>> So I don't think that's a blocker for this.
> >>>
> >>> Thanks for the consideration.  Let me know if you have any concerns.
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: Refactor Scheduler Timed Events to be Async?

2024-05-03 Thread Hussein Awala
If we don't have many Asyncio tasks running in the event loop, there will
not be any benefit from migrating to asynchronous, IMHO it will be anyway
rewritten to be asynchronous as a part of AIP-70

(WIP)
where we will need to rewrite the scheduler if the AIP is accepted.

On Fri, May 3, 2024 at 2:49 PM Ryan Hatter
 wrote:

> This might be a dumb question as I don't have experience with asyncio, but
> should the EventScheduler
> <
> https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L930
> >
> in
> the Airflow scheduler be rewritten to be asynchronous?
>
> The so called "timed events" (e.g. zombie reaping, handling tasks stuck in
> queued, etc  AsyncEventScheduler: def __init__(self): self.tasks = [] async def
> call_regular_interval(self, interval, action, *args, **kwargs):
> """Schedules action to be called every `interval` seconds
> asynchronously.""" while True: await asyncio.sleep(interval) await
> action(*args, **kwargs) def schedule_task(self, interval, action, *args,
> **kwargs): """Add tasks that run periodically in an asynchronous manner."""
> task = asyncio.create_task( self.call_regular_interval(interval, action,
> *args, **kwargs) ) self.tasks.append(task) async def detect_zombies():
> print("履") async def detect_stuck_queued_tasks(): print("Oh no! A task is
> stuck in queued!") def scheduler_loop(): while True: print("Starting
> scheduling loop...") time.sleep(10) def _do_scheduling(): thread =
> threading.Thread(target=scheduler_loop) thread.start() async def main():
> scheduler = AsyncEventScheduler() scheduler.schedule_task(3,
> detect_zombies) scheduler.schedule_task(5, detect_stuck_queued_tasks)
> _do_scheduling() while True: print("EventScheduler running") await
> asyncio.sleep(1) asyncio.run(main())>) scheduled by this EventScheduler are
> blocking and run queries against the DB that can occasionally be expensive
> and cause substantial delays in the scheduler, which can result in repeated
> scheduler restarts.
>
> Below is a trivialized example of what this might look like -- curious to
> hear your thoughts!
>
> import asyncio
> import threading
> import time
>
> class AsyncEventScheduler:
> def __init__(self):
> self.tasks = []
>
> async def call_regular_interval(self, interval, action, *args, **kwargs):
> """Schedules action to be called every `interval` seconds
> asynchronously."""
> while True:
> await asyncio.sleep(interval)
> await action(*args, **kwargs)
>
> def schedule_task(self, interval, action, *args, **kwargs):
> """Add tasks that run periodically in an asynchronous manner."""
> task = asyncio.create_task(
> self.call_regular_interval(interval, action, *args, **kwargs)
> )
> self.tasks.append(task)
>
> async def detect_zombies():
> print("履")
>
> async def detect_stuck_queued_tasks():
> print("Oh no! A task is stuck in queued!")
>
> def scheduler_loop():
> while True:
> print("Starting scheduling loop...")
> time.sleep(10)
>
> def _do_scheduling():
> thread = threading.Thread(target=scheduler_loop)
> thread.start()
>
> async def main():
> scheduler = AsyncEventScheduler()
> scheduler.schedule_task(3, detect_zombies)
> scheduler.schedule_task(5, detect_stuck_queued_tasks)
>
> _do_scheduling()
>
> while True:
> print("EventScheduler running")
> await asyncio.sleep(1)
>
> asyncio.run(main())
>


Refactor Scheduler Timed Events to be Async?

2024-05-03 Thread Ryan Hatter
This might be a dumb question as I don't have experience with asyncio, but
should the EventScheduler

in
the Airflow scheduler be rewritten to be asynchronous?

The so called "timed events" (e.g. zombie reaping, handling tasks stuck in
queued, etc ) scheduled by this EventScheduler are
blocking and run queries against the DB that can occasionally be expensive
and cause substantial delays in the scheduler, which can result in repeated
scheduler restarts.

Below is a trivialized example of what this might look like -- curious to
hear your thoughts!

import asyncio
import threading
import time

class AsyncEventScheduler:
def __init__(self):
self.tasks = []

async def call_regular_interval(self, interval, action, *args, **kwargs):
"""Schedules action to be called every `interval` seconds asynchronously."""
while True:
await asyncio.sleep(interval)
await action(*args, **kwargs)

def schedule_task(self, interval, action, *args, **kwargs):
"""Add tasks that run periodically in an asynchronous manner."""
task = asyncio.create_task(
self.call_regular_interval(interval, action, *args, **kwargs)
)
self.tasks.append(task)

async def detect_zombies():
print("履")

async def detect_stuck_queued_tasks():
print("Oh no! A task is stuck in queued!")

def scheduler_loop():
while True:
print("Starting scheduling loop...")
time.sleep(10)

def _do_scheduling():
thread = threading.Thread(target=scheduler_loop)
thread.start()

async def main():
scheduler = AsyncEventScheduler()
scheduler.schedule_task(3, detect_zombies)
scheduler.schedule_task(5, detect_stuck_queued_tasks)

_do_scheduling()

while True:
print("EventScheduler running")
await asyncio.sleep(1)

asyncio.run(main())


Re: [VOTE] Airflow Providers prepared on May 01, 2024

2024-05-03 Thread Pankaj Koti
+1 (non-binding)

Tested my set of changes.

Best regards,

*Pankaj Koti*
Senior Software Engineer (Airflow OSS Engineering team)
Location: Pune, Maharashtra, India
Timezone: Indian Standard Time (IST)


On Fri, May 3, 2024 at 4:25 PM Hussein Awala  wrote:

> +1 (binding) checked licences, checksums, signatures and sources, and ran
> some testing dags for cncf.kuberenetes and amazon providers.
>
> On Fri, May 3, 2024 at 9:29 AM Wei Lee  wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Wei
> >
> > > On May 3, 2024, at 2:53 PM, Pankaj Singh 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > On Thu, May 2, 2024 at 6:03 PM Elad Kalif  wrote:
> > >
> > >> I will exclude pinecone from this release.
> > >> Please continue voting excluding pinecone provider
> > >>
> > >> On Thu, May 2, 2024 at 3:19 PM Ankit Chaurasia 
> > >> wrote:
> > >>
> > >>> -1 non-binding for Pinecone: 2.0.0rc1
> > >>>
> > >>> There is an issue with Pinecone: 2.0.0rc1 with the system test. This
> > bug
> > >> is
> > >>> introduced as part of the following PR Pinecone provider support for
> > >>> pinecone-client>=3 (#37307): @rawwar
> > >>> 
> > >>>
> > >>> PR #39365  should fix
> > it.
> > >>>
> > >>> *Ankit Chaurasia*
> > >>> HomePage  |  LinkedIn
> > >>> 
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Thu, May 2, 2024 at 10:47 AM Amogh Desai <
> amoghdesai@gmail.com>
> > >>> wrote:
> > >>>
> >  +1 non binding
> > 
> >  Installed the tarball and ran some example DAGs for hive and cncf
> > >>> provider,
> >  works as expected.
> > 
> >  Thanks & Regards,
> >  Amogh Desai
> > 
> > 
> >  On Wed, May 1, 2024 at 10:49 PM Vincent Beck 
> > >>> wrote:
> > 
> > > +1 non binding. All AWS system tests are working successfully
> against
> > > apache-airflow-providers-amazon==8.21.0rc1. You can see the results
> > >>> here:
> > >
> > 
> > >>>
> > >>
> >
> https://aws-mwaa.github.io/#/open-source/system-tests/version/fe4605a10e26f1b8a180979ba5765d1cb7fb0111_8.21.0rc1.html
> >  .
> > > The only failure (example_bedrock) is a known issue in the test
> > >> itself
> >  and
> > > is currently being worked on.
> > >
> > > On 2024/05/01 13:03:07 Elad Kalif wrote:
> > >> Hey all,
> > >>
> > >> I have just cut the new wave Airflow Providers packages. This
> email
> > >>> is
> > >> calling a vote on the release,
> > >> which will last for 72 hours - which means that it will end on May
> > >>> 04,
> > > 2024
> > >> 13:00 PM UTC and until 3 binding +1 votes have been received.
> > >>
> > >>
> > >> Consider this my (binding) +1.
> > >>
> > >> *Note some of the providers are rc2 and some rc1.*
> > >>
> > >> Airflow Providers are available at:
> > >> https://dist.apache.org/repos/dist/dev/airflow/providers/
> > >>
> > >> *apache-airflow-providers--*.tar.gz* are the binary
> > >> Python "sdist" release - they are also official "sources" for the
> > > provider
> > >> packages.
> > >>
> > >> *apache_airflow_providers_-*.whl are the binary
> > >> Python "wheel" release.
> > >>
> > >> The test procedure for PMC members is described in
> > >>
> > >
> > 
> > >>>
> > >>
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > >>
> > >> The test procedure for and Contributors who would like to test
> this
> > >>> RC
> >  is
> > >> described in:
> > >>
> > >
> > 
> > >>>
> > >>
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > >>
> > >>
> > >> Public keys are available at:
> > >> https://dist.apache.org/repos/dist/release/airflow/KEYS
> > >>
> > >> Please vote accordingly:
> > >>
> > >> [ ] +1 approve
> > >> [ ] +0 no opinion
> > >> [ ] -1 disapprove with the reason
> > >>
> > >> Only votes from PMC members are binding, but members of the
> > >> community
> >  are
> > >> encouraged to test the release and vote with "(non-binding)".
> > >>
> > >> Please note that the version number excludes the 'rcX' string.
> > >> This will allow us to rename the artifact without modifying
> > >> the artifact checksums when we actually release.
> > >>
> > >> The status of testing the providers by the community is kept here:
> > >> https://github.com/apache/airflow/issues/39346
> > >>
> > >> The issue is also the easiest way to see important PRs included in
> > >>> the
> >  RC
> > >> candidates.
> > >> Detailed changelog for the providers will be published in the
> > > documentation
> > >> after the
> > >> RC 

Re: [VOTE] Airflow Providers prepared on May 01, 2024

2024-05-03 Thread Hussein Awala
+1 (binding) checked licences, checksums, signatures and sources, and ran
some testing dags for cncf.kuberenetes and amazon providers.

On Fri, May 3, 2024 at 9:29 AM Wei Lee  wrote:

> +1 (non-binding)
>
> Best,
> Wei
>
> > On May 3, 2024, at 2:53 PM, Pankaj Singh 
> wrote:
> >
> > +1 (non-binding)
> >
> > On Thu, May 2, 2024 at 6:03 PM Elad Kalif  wrote:
> >
> >> I will exclude pinecone from this release.
> >> Please continue voting excluding pinecone provider
> >>
> >> On Thu, May 2, 2024 at 3:19 PM Ankit Chaurasia 
> >> wrote:
> >>
> >>> -1 non-binding for Pinecone: 2.0.0rc1
> >>>
> >>> There is an issue with Pinecone: 2.0.0rc1 with the system test. This
> bug
> >> is
> >>> introduced as part of the following PR Pinecone provider support for
> >>> pinecone-client>=3 (#37307): @rawwar
> >>> 
> >>>
> >>> PR #39365  should fix
> it.
> >>>
> >>> *Ankit Chaurasia*
> >>> HomePage  |  LinkedIn
> >>> 
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, May 2, 2024 at 10:47 AM Amogh Desai 
> >>> wrote:
> >>>
>  +1 non binding
> 
>  Installed the tarball and ran some example DAGs for hive and cncf
> >>> provider,
>  works as expected.
> 
>  Thanks & Regards,
>  Amogh Desai
> 
> 
>  On Wed, May 1, 2024 at 10:49 PM Vincent Beck 
> >>> wrote:
> 
> > +1 non binding. All AWS system tests are working successfully against
> > apache-airflow-providers-amazon==8.21.0rc1. You can see the results
> >>> here:
> >
> 
> >>>
> >>
> https://aws-mwaa.github.io/#/open-source/system-tests/version/fe4605a10e26f1b8a180979ba5765d1cb7fb0111_8.21.0rc1.html
>  .
> > The only failure (example_bedrock) is a known issue in the test
> >> itself
>  and
> > is currently being worked on.
> >
> > On 2024/05/01 13:03:07 Elad Kalif wrote:
> >> Hey all,
> >>
> >> I have just cut the new wave Airflow Providers packages. This email
> >>> is
> >> calling a vote on the release,
> >> which will last for 72 hours - which means that it will end on May
> >>> 04,
> > 2024
> >> 13:00 PM UTC and until 3 binding +1 votes have been received.
> >>
> >>
> >> Consider this my (binding) +1.
> >>
> >> *Note some of the providers are rc2 and some rc1.*
> >>
> >> Airflow Providers are available at:
> >> https://dist.apache.org/repos/dist/dev/airflow/providers/
> >>
> >> *apache-airflow-providers--*.tar.gz* are the binary
> >> Python "sdist" release - they are also official "sources" for the
> > provider
> >> packages.
> >>
> >> *apache_airflow_providers_-*.whl are the binary
> >> Python "wheel" release.
> >>
> >> The test procedure for PMC members is described in
> >>
> >
> 
> >>>
> >>
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> >>
> >> The test procedure for and Contributors who would like to test this
> >>> RC
>  is
> >> described in:
> >>
> >
> 
> >>>
> >>
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> >>
> >>
> >> Public keys are available at:
> >> https://dist.apache.org/repos/dist/release/airflow/KEYS
> >>
> >> Please vote accordingly:
> >>
> >> [ ] +1 approve
> >> [ ] +0 no opinion
> >> [ ] -1 disapprove with the reason
> >>
> >> Only votes from PMC members are binding, but members of the
> >> community
>  are
> >> encouraged to test the release and vote with "(non-binding)".
> >>
> >> Please note that the version number excludes the 'rcX' string.
> >> This will allow us to rename the artifact without modifying
> >> the artifact checksums when we actually release.
> >>
> >> The status of testing the providers by the community is kept here:
> >> https://github.com/apache/airflow/issues/39346
> >>
> >> The issue is also the easiest way to see important PRs included in
> >>> the
>  RC
> >> candidates.
> >> Detailed changelog for the providers will be published in the
> > documentation
> >> after the
> >> RC candidates are released.
> >>
> >> You can find the RC packages in PyPI following these links:
> >>
> >>
> >> https://pypi.org/project/apache-airflow-providers-airbyte/3.8.0rc1/
> >>
> >> https://pypi.org/project/apache-airflow-providers-alibaba/2.8.0rc1/
> >>
> >> https://pypi.org/project/apache-airflow-providers-amazon/8.21.0rc1/
> >>
> 
> >> https://pypi.org/project/apache-airflow-providers-apache-beam/5.7.0rc1/
> >>
> >
> 
> >>>
> >>
> https://pypi.org/project/apache-airflow-providers-apache-cassandra/3.5.0rc1/
> >>
> 
> >>
> 

Re: [DISCUSS] simplifying try_number handling

2024-05-03 Thread Wei Lee
Thanks, Daniel! +1 for this one. This was confusing when I worked on the 
starting from triggerer stuff. 

Best,
Wei


> On May 3, 2024, at 11:59 AM, Amogh Desai  wrote:
> 
> Looks good to me.
> 
> Personally I never ran into any issues with this so far but I agree with
> the issues it solves.
> Thanks & Regards,
> Amogh Desai
> 
> 
> On Fri, May 3, 2024 at 2:50 AM Vincent Beck  wrote:
> 
>> I am all +1 on this one. This thing gave me headaches when working on
>> AIP-44 and I could not understand the difference between the private
>> "_try_number" and the public "try_number". Thanks for simplifying it!
>> 
>> This is obviously assuming it does not break anything I am not aware of :)
>> 
>> On 2024/05/02 19:37:32 Daniel Standish wrote:
>>> TLDR
>>> * changing handling of try_number in
>>> https://github.com/apache/airflow/pull/39336
>>> * no more private attr
>>> * no more getter that changes value based on state of task
>>> * no more decrementing
>>> * try number now only handled by scheduler
>>> * hope that sounds good to all of you
>>> 
>>> For more detail read on...
>>> 
>>> In https://github.com/apache/airflow/pull/39336 I am doing some work to
>>> resolve some longstanding pain and frustration caused by try_number.
>>> 
>>> The way we handle try_number has for quite some time been messy and
>>> problematic.
>>> 
>>> For example, if you access `ti.try_number` and then change the state to
>> or
>>> from RUNNING, you will get a different value if you access it again!
>>> 
>>> And the responsibility for managing this number has been distributed
>>> throughout the codebase.  For example the task itself always increments
>>> when it starts running.  But then if it defers or reschedules itself, it
>>> decrements it back down so that when it runs again and naively
>> increments,
>>> then it will be right again.
>>> 
>>> Recently more issues have become visible as I have worked with AIP-44
>>> because for example pydantic does not like private attrs and it's just
>>> awkward to know *what value to use* when serializing it when the TI will
>>> give you a different answer depending on the state of the task!
>>> 
>>> And there's yet another edge case being solved in this community PR
>>> .
>>> And then when we start looking at try history and AIP-64, it also
>> forces a
>>> look at this.
>>> 
>>> So it all sounds bad and indeed it is bad but I think I have a solution.
>>> 
>>> What I do is, just have the scheduler increment try_number at the moment
>>> when it schedules the task.  It alone will have the responsibility for
>>> incrementing try_number.  And no where will it ever be decremented.  It
>>> will not be incremented when resuming after deferral or reschedule.  And
>>> that's about all there is to it.
>>> 
>>> I've tested it out and it works.  But I'm working through many test
>>> failures that need to be resolved (there's lots of asserts re
>> try_number).
>>> 
>>> One small thing I just want to point out is that if a user were
>> previously
>>> to be doing `task.run()` sort of manually without the task having been
>>> scheduled by the scheduler, well now their try_number won't be
>>> automatically incremented.  Same if they just do `airflow tasks run` --
>>> because now the responsibility is going to be solely with the scheduler.
>>> But airflow was never designed to assume that tasks will be run without
>>> having been scheduled, so I do not think that counts as a breaking
>> change.
>>> So I don't think that's a blocker for this.
>>> 
>>> Thanks for the consideration.  Let me know if you have any concerns.
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Re: [VOTE] Airflow Providers prepared on May 01, 2024

2024-05-03 Thread Wei Lee
+1 (non-binding)

Best,
Wei

> On May 3, 2024, at 2:53 PM, Pankaj Singh  wrote:
> 
> +1 (non-binding)
> 
> On Thu, May 2, 2024 at 6:03 PM Elad Kalif  wrote:
> 
>> I will exclude pinecone from this release.
>> Please continue voting excluding pinecone provider
>> 
>> On Thu, May 2, 2024 at 3:19 PM Ankit Chaurasia 
>> wrote:
>> 
>>> -1 non-binding for Pinecone: 2.0.0rc1
>>> 
>>> There is an issue with Pinecone: 2.0.0rc1 with the system test. This bug
>> is
>>> introduced as part of the following PR Pinecone provider support for
>>> pinecone-client>=3 (#37307): @rawwar
>>> 
>>> 
>>> PR #39365  should fix it.
>>> 
>>> *Ankit Chaurasia*
>>> HomePage  |  LinkedIn
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, May 2, 2024 at 10:47 AM Amogh Desai 
>>> wrote:
>>> 
 +1 non binding
 
 Installed the tarball and ran some example DAGs for hive and cncf
>>> provider,
 works as expected.
 
 Thanks & Regards,
 Amogh Desai
 
 
 On Wed, May 1, 2024 at 10:49 PM Vincent Beck 
>>> wrote:
 
> +1 non binding. All AWS system tests are working successfully against
> apache-airflow-providers-amazon==8.21.0rc1. You can see the results
>>> here:
> 
 
>>> 
>> https://aws-mwaa.github.io/#/open-source/system-tests/version/fe4605a10e26f1b8a180979ba5765d1cb7fb0111_8.21.0rc1.html
 .
> The only failure (example_bedrock) is a known issue in the test
>> itself
 and
> is currently being worked on.
> 
> On 2024/05/01 13:03:07 Elad Kalif wrote:
>> Hey all,
>> 
>> I have just cut the new wave Airflow Providers packages. This email
>>> is
>> calling a vote on the release,
>> which will last for 72 hours - which means that it will end on May
>>> 04,
> 2024
>> 13:00 PM UTC and until 3 binding +1 votes have been received.
>> 
>> 
>> Consider this my (binding) +1.
>> 
>> *Note some of the providers are rc2 and some rc1.*
>> 
>> Airflow Providers are available at:
>> https://dist.apache.org/repos/dist/dev/airflow/providers/
>> 
>> *apache-airflow-providers--*.tar.gz* are the binary
>> Python "sdist" release - they are also official "sources" for the
> provider
>> packages.
>> 
>> *apache_airflow_providers_-*.whl are the binary
>> Python "wheel" release.
>> 
>> The test procedure for PMC members is described in
>> 
> 
 
>>> 
>> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
>> 
>> The test procedure for and Contributors who would like to test this
>>> RC
 is
>> described in:
>> 
> 
 
>>> 
>> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
>> 
>> 
>> Public keys are available at:
>> https://dist.apache.org/repos/dist/release/airflow/KEYS
>> 
>> Please vote accordingly:
>> 
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove with the reason
>> 
>> Only votes from PMC members are binding, but members of the
>> community
 are
>> encouraged to test the release and vote with "(non-binding)".
>> 
>> Please note that the version number excludes the 'rcX' string.
>> This will allow us to rename the artifact without modifying
>> the artifact checksums when we actually release.
>> 
>> The status of testing the providers by the community is kept here:
>> https://github.com/apache/airflow/issues/39346
>> 
>> The issue is also the easiest way to see important PRs included in
>>> the
 RC
>> candidates.
>> Detailed changelog for the providers will be published in the
> documentation
>> after the
>> RC candidates are released.
>> 
>> You can find the RC packages in PyPI following these links:
>> 
>> 
>> https://pypi.org/project/apache-airflow-providers-airbyte/3.8.0rc1/
>> 
>> https://pypi.org/project/apache-airflow-providers-alibaba/2.8.0rc1/
>> 
>> https://pypi.org/project/apache-airflow-providers-amazon/8.21.0rc1/
>> 
 
>> https://pypi.org/project/apache-airflow-providers-apache-beam/5.7.0rc1/
>> 
> 
 
>>> 
>> https://pypi.org/project/apache-airflow-providers-apache-cassandra/3.5.0rc1/
>> 
 
>> https://pypi.org/project/apache-airflow-providers-apache-drill/2.7.0rc1/
>> 
> 
 
>>> 
>> https://pypi.org/project/apache-airflow-providers-apache-druid/3.10.0rc1/
>> 
 
>> https://pypi.org/project/apache-airflow-providers-apache-flink/1.4.0rc1/
>> 
 
>> https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc2/
>> 
 
>> https://pypi.org/project/apache-airflow-providers-apache-hive/8.1.0rc1/
>> 
> 
 
>>> 

Re: [VOTE] Airflow Providers prepared on May 01, 2024

2024-05-03 Thread Pankaj Singh
+1 (non-binding)

On Thu, May 2, 2024 at 6:03 PM Elad Kalif  wrote:

> I will exclude pinecone from this release.
> Please continue voting excluding pinecone provider
>
> On Thu, May 2, 2024 at 3:19 PM Ankit Chaurasia 
> wrote:
>
> > -1 non-binding for Pinecone: 2.0.0rc1
> >
> > There is an issue with Pinecone: 2.0.0rc1 with the system test. This bug
> is
> > introduced as part of the following PR Pinecone provider support for
> > pinecone-client>=3 (#37307): @rawwar
> > 
> >
> > PR #39365  should fix it.
> >
> > *Ankit Chaurasia*
> > HomePage  |  LinkedIn
> > 
> >
> >
> >
> >
> >
> >
> > On Thu, May 2, 2024 at 10:47 AM Amogh Desai 
> > wrote:
> >
> > > +1 non binding
> > >
> > > Installed the tarball and ran some example DAGs for hive and cncf
> > provider,
> > > works as expected.
> > >
> > > Thanks & Regards,
> > > Amogh Desai
> > >
> > >
> > > On Wed, May 1, 2024 at 10:49 PM Vincent Beck 
> > wrote:
> > >
> > > > +1 non binding. All AWS system tests are working successfully against
> > > > apache-airflow-providers-amazon==8.21.0rc1. You can see the results
> > here:
> > > >
> > >
> >
> https://aws-mwaa.github.io/#/open-source/system-tests/version/fe4605a10e26f1b8a180979ba5765d1cb7fb0111_8.21.0rc1.html
> > > .
> > > > The only failure (example_bedrock) is a known issue in the test
> itself
> > > and
> > > > is currently being worked on.
> > > >
> > > > On 2024/05/01 13:03:07 Elad Kalif wrote:
> > > > > Hey all,
> > > > >
> > > > > I have just cut the new wave Airflow Providers packages. This email
> > is
> > > > > calling a vote on the release,
> > > > > which will last for 72 hours - which means that it will end on May
> > 04,
> > > > 2024
> > > > > 13:00 PM UTC and until 3 binding +1 votes have been received.
> > > > >
> > > > >
> > > > > Consider this my (binding) +1.
> > > > >
> > > > > *Note some of the providers are rc2 and some rc1.*
> > > > >
> > > > > Airflow Providers are available at:
> > > > > https://dist.apache.org/repos/dist/dev/airflow/providers/
> > > > >
> > > > > *apache-airflow-providers--*.tar.gz* are the binary
> > > > >  Python "sdist" release - they are also official "sources" for the
> > > > provider
> > > > > packages.
> > > > >
> > > > > *apache_airflow_providers_-*.whl are the binary
> > > > >  Python "wheel" release.
> > > > >
> > > > > The test procedure for PMC members is described in
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-pmc-members
> > > > >
> > > > > The test procedure for and Contributors who would like to test this
> > RC
> > > is
> > > > > described in:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-candidate-by-contributors
> > > > >
> > > > >
> > > > > Public keys are available at:
> > > > > https://dist.apache.org/repos/dist/release/airflow/KEYS
> > > > >
> > > > > Please vote accordingly:
> > > > >
> > > > > [ ] +1 approve
> > > > > [ ] +0 no opinion
> > > > > [ ] -1 disapprove with the reason
> > > > >
> > > > > Only votes from PMC members are binding, but members of the
> community
> > > are
> > > > > encouraged to test the release and vote with "(non-binding)".
> > > > >
> > > > > Please note that the version number excludes the 'rcX' string.
> > > > > This will allow us to rename the artifact without modifying
> > > > > the artifact checksums when we actually release.
> > > > >
> > > > > The status of testing the providers by the community is kept here:
> > > > > https://github.com/apache/airflow/issues/39346
> > > > >
> > > > > The issue is also the easiest way to see important PRs included in
> > the
> > > RC
> > > > > candidates.
> > > > > Detailed changelog for the providers will be published in the
> > > > documentation
> > > > > after the
> > > > > RC candidates are released.
> > > > >
> > > > > You can find the RC packages in PyPI following these links:
> > > > >
> > > > >
> https://pypi.org/project/apache-airflow-providers-airbyte/3.8.0rc1/
> > > > >
> https://pypi.org/project/apache-airflow-providers-alibaba/2.8.0rc1/
> > > > >
> https://pypi.org/project/apache-airflow-providers-amazon/8.21.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-beam/5.7.0rc1/
> > > > >
> > > >
> > >
> >
> https://pypi.org/project/apache-airflow-providers-apache-cassandra/3.5.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-drill/2.7.0rc1/
> > > > >
> > > >
> > >
> >
> https://pypi.org/project/apache-airflow-providers-apache-druid/3.10.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-flink/1.4.0rc1/
> > > > >
> > >
> https://pypi.org/project/apache-airflow-providers-apache-hdfs/4.4.0rc2/
> > > > >
> > >
>