Re: [DISCUSS] Remove `max_active_tasks_per_dag`? Or at least the default

2024-10-04 Thread Ryan Hatter
I think I agree with this:

I feel it should be applied at the dag *run* scope
> and not across all dag runs.
>

Just a thought: If someone *did* want to run multiple DAG runs at the same
time and limit the max active tasks per DAG, they could create a pool for
that DAG and pass the pool in default_args.

On Fri, Oct 4, 2024 at 1:51 PM Daniel Standish
 wrote:

> Ok, sorry, these concurrency settings are confusing.
>
> Let me clarify.
>
> `max_active_tasks_per_dag` is a core airflow setting and it provides the
> default for DAG.max_active_tasks.
>
> DAG.max_active_tasks I think is a reasonable config to have but the problem
> in my view is the scope.  I feel it should be applied at the dag *run*
> scope
> and not across all dag runs.  That just gets into confusing and footgunish
> territory if you allow many concurrent dag runs but limit the number of
> concurrent tasks.  Then you might have many many dags running but all
> limping along.
>
> So I guess let me change my proposal.  I would propose that we have
> DAG.max_active_tasks be applied at the dag *run* scope.  Not limiting
> concurrency across all dag runs.
>
> I think in practice this is essentially what it already is, because I would
> expect that the vast majority of dag runs are the only dag run running for
> a given dag at a given time.  It's only when you have many dag runs of the
> same dag running that this parameter ends up meaning something different.
>
> So, I propose, DAG.max_active_tasks should be evaluated per-dag-run.  And
> we can change the name accordingly if folks on board.
>
> Now whether a mapped task is a task or not, I leave that for another day :)
>
>
>
>
>
>
> On Fri, Oct 4, 2024 at 10:28 AM Daniel Standish <
> daniel.stand...@astronomer.io> wrote:
>
> > The setting  max_active_tasks_per_dag seems mostly useless to me / and
> > footgunish.
> >
> > Why?
> >
> > Because you already have a setting for max active dag runs.  If you don't
> > want to run more tasks, don't create the extra dag runs.
> >
> > We also already have a mechanism (param on base operator) for limiting
> > individual tasks across all dag runs where that may be needed.  But just
> a
> > general "i don't want more than 16 tasks running across all dag runs of
> all
> > types and for all tasks" seems just, imprecise and not useful.
> >
> > I actually think it makes sense to remove this param entirely.  But at
> > least we should remove the default.
> >
> > WDYT
> >
>


Re: [VOTE] Airflow 2.11 as bridge release

2024-09-09 Thread Ryan Hatter
If there are no features, why wouldn't we follow semver here and release
2.10.x?

On Wed, Sep 4, 2024 at 9:07 PM Kaxil Naik  wrote:

> Hi all,
>
> As discussed in
> https://lists.apache.org/thread/7jf12p2mk0nr5495f26r67gnpm3jq8oj I am
> calling for a lazy consensus on using marking 2.11 as a bridge release with
> the following ideas:
>
>
>1. It would only have bug fixes & deprecation warnings as a BRIDGE
>release -- NO features. The exception is when adding something to aid
>Airflow 2 to 3 migration, for which there can be an explicit mailing
> list
>discussion
>2. We would release it after December so Airflow 3.0 features & removals
>are more concrete
>3. We only release 2.11 if we have to — if most of the things we have
>removed already had deprecation warnings from many minor releases, then
> it
>might not be worth it.
>
>
> The vote will be closed on Sun, 8th of Sep 2024, 3 am BST.
>
> Everyone is encouraged to vote, although only PMC members and Committer's
> votes are considered binding.
>
> This is my +1.
>
> Regards,
> Kaxil
>


Re: [ANNOUNCE] New committer: Ryan Hatter

2024-07-01 Thread Ryan Hatter
Thanks y'all! Much appreciated!

On Sun, Jun 30, 2024 at 3:48 PM Abhishek Bhakat
 wrote:

> Congratulations Ryan 👏
>
> On Sat, Jun 29, 2024 at 7:13 PM Ankit Chaurasia 
> wrote:
>
> > Congratulations Ryan!
> >
> > *Ankit Chaurasia*
> >
> >
> > On Sat, 29 Jun 2024 at 20:02, Hemkumar Chheda
> >  wrote:
> >
> > > Great News! Congratulations Ryan!!
> > >
> > > > On 29 Jun 2024, at 1:55 PM, Shahar Epstein  wrote:
> > > >
> > > > Congrats!
> > > >
> > > > -Original Message-
> > > > From: Jed Cunningham 
> > > > Sent: Friday, 28 June 2024 19:58
> > > > To: dev@airflow.apache.org
> > > > Subject: [ANNOUNCE] New committer: Ryan Hatter
> > > >
> > > > The Project Management Committee (PMC) for Apache Airflow has invited
> > > Ryan Hatter to become a committer and we are pleased to announce that
> > they
> > > have accepted.
> > > >
> > > > Ryan has been involved in the Airflow community for a few years now -
> > > contributing code, triaging and creating issues, answering questions on
> > > Slack, etc.
> > > > He also contributed the custom names for mapped tasks feature in
> > Airflow
> > > 2.9, which was a long time ask from the community.
> > > >
> > > > Congratulations Ryan, and welcome!
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> >
>


Refactor Scheduler Timed Events to be Async?

2024-05-03 Thread Ryan Hatter
This might be a dumb question as I don't have experience with asyncio, but
should the EventScheduler

in
the Airflow scheduler be rewritten to be asynchronous?

The so called "timed events" (e.g. zombie reaping, handling tasks stuck in
queued, etc ) scheduled by this EventScheduler are
blocking and run queries against the DB that can occasionally be expensive
and cause substantial delays in the scheduler, which can result in repeated
scheduler restarts.

Below is a trivialized example of what this might look like -- curious to
hear your thoughts!

import asyncio
import threading
import time

class AsyncEventScheduler:
def __init__(self):
self.tasks = []

async def call_regular_interval(self, interval, action, *args, **kwargs):
"""Schedules action to be called every `interval` seconds asynchronously."""
while True:
await asyncio.sleep(interval)
await action(*args, **kwargs)

def schedule_task(self, interval, action, *args, **kwargs):
"""Add tasks that run periodically in an asynchronous manner."""
task = asyncio.create_task(
self.call_regular_interval(interval, action, *args, **kwargs)
)
self.tasks.append(task)

async def detect_zombies():
print("🧟")

async def detect_stuck_queued_tasks():
print("Oh no! A task is stuck in queued!")

def scheduler_loop():
while True:
print("Starting scheduling loop...")
time.sleep(10)

def _do_scheduling():
thread = threading.Thread(target=scheduler_loop)
thread.start()

async def main():
scheduler = AsyncEventScheduler()
scheduler.schedule_task(3, detect_zombies)
scheduler.schedule_task(5, detect_stuck_queued_tasks)

_do_scheduling()

while True:
print("EventScheduler running")
await asyncio.sleep(1)

asyncio.run(main())


Re: [CALL FOR HELP] Help on Connexion 3 migration needed

2024-04-16 Thread Ryan Hatter
Does the scope of this PR warrant an AIP?

On Tue, Apr 16, 2024 at 6:40 AM Jarek Potiuk  wrote:

> Hello here,
>
> I have a kind request for help from maintainers (and other contributors who
> are not maintainers) - on the Connexion 3 migration for Airflow. PR here
> (unfortunately - it's one big PR and cannot be split):
> https://github.com/apache/airflow/pull/39055.
>
> I would love some general comments on this - especially from those who are
> more experts than me on those web frameworks - is it safe and ok to
> migrate, do we need to do some more testing on that? What do other
> maintainers think?
>
> This is not a "simple" change - it introduces a pretty fundamental change
> in how our web app is handled - It changes from WSGI to ASGI interface
> (though we use gunicorn as WSGI). But it's also absolutely needed -
> because we already had some security issues connected with old
> dependencies (Werkzeug) - raised - and Connexion 3 migration seems to be
> the easiest way to get to the latest, maintained versions of the
> dependencies.
>
> That's why I'd really like a few more maintainers - and people from the
> Astronomer, Google and AWS to take it for a spin and help to test that
> change and say "yep. It looks good, we can merge it".  I would especially
> appreciate some more "scale" testing on it. It seems that performance and
> resource usage is not affected and ASGI interface and uvicorn should nicely
> replace all the different worker types we could have for gunicorn - but I
> would love to have confirmation for that.
>
> The PR has been started by Vlada and Maks from the Google team - and
> with the help of Sudipto and Satoshi - two interns from Major League
> Hacking - supported by Royal Bank of Canada - finally we have a stable,
> working version and green PR.  Airflow webserver + API seems to work well,
> stable (and generally back-compatible) on both - development (local +
> breeze) and PROD image.
>
> I took a mentorship and leading role on it - but personally I have been
> learning on the go about WSGI/ASGI and all changes needed -  I am not an
> expert at all in those. We followed the directions from Connexion'\s
> maintainer Robb Snyders - and I asked him to help and comment on the PR in
> a number of places - but  that's why I need more help and experts' eyes and
> hands to be quite sure it can be safely nerged.
>
> I extracted it and squashed more than 100 commits on it into a single one
> to make it easier to start new conversations.
>
> Once again PR here: https://github.com/apache/airflow/pull/39055
>
> Also - we need to decide when is the best time to merge the PR - it does
> not introduce a lot of changes in the code of the app, but it changes a lot
> of test code to make it compatible with Startlette test client - we can
> continue rebasing it and fixing new changes for a short while - but I think
> the sooner we migrate it - the better - it will give more time for testing
> in the future MINOR airflow version.
>
> J.
>


Re: Subject: Expressing Interest in Contributing to Apache Airflow

2024-02-15 Thread Ryan Hatter
It's also worth checking out the community page
, and specifically joining the Airflow
community Slack .

On Thu, Feb 15, 2024 at 10:24 AM Jarek Potiuk  wrote:

> Hello Rahul,
>
> There is nothing more than the Contributor' guide:
> https://github.com/apache/airflow/blob/main/contributing-docs/README.rst ,
> everything is described there, including explanation on how to look for
> good first issues - https://github.com/apache/airflow/contribute.
>
> You do not need any permission to contribute. Your contributions start with
> just ... contributing
>
> J.
>
>
> On Thu, Feb 15, 2024 at 4:20 PM Mridula Pachen <
> chameleonlabs2...@gmail.com>
> wrote:
>
> > Dear Apache Foundation Open Source Community,
> >
> > I hope this message finds you well. My name is Rahul Poolanchalil, and I
> am
> > reaching out to express my keen interest in contributing to the Apache
> > Airflow project. As an admirer of open-source initiatives and a
> > professional with a background in data engineering, I am eager to lend my
> > skills and enthusiasm to further enhance the capabilities of Apache
> > Airflow.
> >
> > Having followed the project for some time, I am impressed by its
> > robustness, flexibility, and the vibrant community that supports it. The
> > way Airflow simplifies complex workflow management has not only been
> > revolutionary for data engineering tasks but also serves as a testament
> to
> > the power of collaborative, open-source development.
> >
> > I possess a strong foundation in Python programming. I am particularly
> > interested in improving documentation, developing new features, fixing
> bugs
> > etc. I believe that my background can contribute to the ongoing efforts
> to
> > make Airflow even more efficient, user-friendly, and accessible to a
> > broader audience.
> >
> > I am familiar with the contribution guidelines as outlined in the
> project's
> > documentation and am ready to start contributing under the guidance of
> the
> > community. I am also open to participating in discussions, code reviews,
> > and any other activities that can help me integrate into the community
> and
> > understand the project's current needs and future direction better.
> >
> > Could you please guide me on how best to get started or if there are any
> > specific areas or issues that require immediate attention? I am more than
> > willing to take on tasks that align with my skills and interests and
> where
> > I can make the most impact.
> >
> > Thank you for considering my interest in contributing to Apache Airflow.
> I
> > look forward to the opportunity to contribute to this incredible project
> > and to learn from the esteemed members of the community. Please let me
> know
> > the next steps or any additional information you need from me.
> >
> > Warm regards,
> >
> > Rahul Poolanchalil
> >
>


Re: [LAZY CONSENSUS] Rename slack channels

2024-02-15 Thread Ryan Hatter
Ah! Great idea!

On Tue, Feb 13, 2024 at 12:22 PM Akash Sharma <2akash111...@gmail.com>
wrote:

> +1
>
> On Tue, 13 Feb 2024, 22:50 Briana Okyere,
>  wrote:
>
> > +1
> >
> > On Mon, Feb 12, 2024 at 11:42 AM Jarek Potiuk  wrote:
> >
> > > Hey here,
> > >
> > > Following the earlier discussion, I am calling for a lazy consensus on
> > > renaming of our slack channels:
> > >
> > > #development -> *#contributors*
> > > #first-pr-support -> *#new-contributors*
> > > #troubleshooting -> *#user-troubleshooting*
> > > #random -> *#random* (hehe)
> > >
> > > The current #development channel will be archived and we will ask
> people
> > to
> > > add themselves to #contributors if they are working on contributing to
> > > airflow. If they are starting their journey, they will should ask
> > questions
> > > in #new-contributors
> > >
> > > New channel: *#user-best-practices* - this is for sharing best
> practices
> > > with and between users. Let's see how popular it will be - we can close
> > it
> > > if not used.
> > >
> > > Discussion was held here:
> > > https://lists.apache.org/thread/fjswvwg4ttbkr14bhq9s5xkc50h0qc76
> > >
> > > I will run the lazy consensus till Thursday, Feb 15th 2024, 8PM CET (72
> > hrs
> > > from now).
> > >
> > > There is no need or expectation to respond to it if you are in
> agreement.
> > > Lack of response is a silent agreement.
> > >
> > > J.
> > >
> >
>


Re: [VOTE] AIP 61 - Hybrid Executors

2024-02-01 Thread Ryan Hatter
+1 non-binding. This will be a great feature.

On Thu, Feb 1, 2024 at 1:27 PM Ferruzzi, Dennis 
wrote:

> +1 binding
>
>
>  - ferruzzi
>
>
> 
> From: Igor Kholopov 
> Sent: Thursday, February 1, 2024 5:31 AM
> To: dev@airflow.apache.org
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [VOTE] AIP 61 - Hybrid Executors
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1 (non-binding)
>
> Best,
> Igor
>
> On Thu, Feb 1, 2024 at 2:23 PM Wei Lee  wrote:
>
> > +1 (non-binding)
> >
> > Looking forward to it!
> >
> > Best,
> > Wei
> >
> > > On Feb 1, 2024, at 7:57 PM, Aritra Basu 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > This would definitely be a good value add. Looked through the proposal
> > and
> > > it looks solid! Great job!
> > >
> > > --
> > > Regards,
> > > Aritra Basu
> > >
> > > On Thu, Feb 1, 2024, 12:58 PM Amogh Desai 
> > wrote:
> > >
> > >> +1 binding
> > >>
> > >> Good work on the proposal, Niko.
> > >>
> > >> The most important part of this vote for me is the scoping you have
> > done.
> > >> Great work on that!
> > >>
> > >> Thanks & Regards,
> > >> Amogh Desai
> > >>
> > >> On Thu, Feb 1, 2024 at 12:12 PM Jarek Potiuk 
> wrote:
> > >>
> > >>> +1 (binding) . I think we know the scope (more importantly we also
> know
> > >>> what's out of the scope)
> > >>>
> > >>> On Thu, Feb 1, 2024 at 6:45 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> > >>>  wrote:
> > >>>
> >  +1 binding - looking forward to implementation!
> > 
> >  Sent from Outlook for iOS
> >  
> >  From: Oliveira, Niko 
> >  Sent: Thursday, February 1, 2024 6:12:02 AM
> >  To: dev@airflow.apache.org 
> >  Subject: [VOTE] AIP 61 - Hybrid Executors
> > 
> >  Hey folks,
> > 
> > 
> >  The AIP for Hybrid Executors has been out for a few weeks now. Some
> > >> great
> >  feedback came in and some challenges to scope which I think have all
> > >> been
> >  addressed, and the AIP document has been updated where applicable.
> > 
> > 
> >  At this point I'd like to call a vote, and if all goes well, begin
> >  development soon!
> > 
> > 
> >  You can find the AIP here:
> > 
> > 
> > >>>
> > >>
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-61%2BHybrid%2BExecution&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C0ee4ec946cc24c1a378408dc22e45aa4%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638423611370691730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=FAu5DPEq1%2FFmqdEKl06ZB%2F%2FfJ7ymgHgovKMvgzE8U2U%3D&reserved=0
> >  <
> > 
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-61+Hybrid+Execution
> > >
> > 
> > 
> >  Discussion threads:
> > 
> > 
> > >>>
> > >>
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2F94sg7l4m3qjk4b3vfq3lr94oc5fs9q4j&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C0ee4ec946cc24c1a378408dc22e45aa4%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638423611370702702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=xJYMlsb8u9pvDWuMwJP6If2ZIeva6XYmYa9%2Fac8007Q%3D&reserved=0
> >  
> > 
> >  The voting will last for 6 days (until 6th of February 2024, 22:00
> > >> PST),
> >  and until at least 3 binding votes have been cast.
> > 
> >  Please vote accordingly:
> > 
> >  [ ] + 1 approve
> >  [ ] + 0 no opinion
> >  [ ] - 1 disapprove with the reason
> > 
> >  Only votes from PMC members and committers are binding, but other
> > >> members
> >  of the community are encouraged to check the AIP and vote with
> >  "(non-binding)".
> > 
> >  Thanks!
> > 
> > 
> > 
> > >>>
> > >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>


Re: [ANNOUNCE] Starting experimenting with "Require conversation resolution" setting

2024-01-30 Thread Ryan Hatter
In my experience outside of Airflow, the benefit of not missing a review
comment outweighs the friction of being required to resolve each
conversation.

On Mon, Jan 29, 2024 at 8:47 PM Wei Lee  wrote:

> I didn't notice much of a difference as a contributor. +1 vote
>
> Best,
> Wei
>
> > On Jan 30, 2024, at 11:41 AM, Amogh Desai 
> wrote:
> >
> > Contrary to my initial expectation of the trouble this would bring in for
> > reviewers, it has been
> > pretty nice. I have not faced any issues in marking the conversations as
> > resolved for the pull
> > requests I have reviewed and it has even given me a chance to re review
> > prior to approval.
> >
> > I am happy with this overall and my vote will be a +1
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> > On Mon, Jan 29, 2024 at 7:56 PM Aritra Basu 
> > wrote:
> >
> >> I personally haven't had too much friction due to the change and it has
> >> helped me keep track of any comments people have made. I remain +1 to
> the
> >> change so far.
> >>
> >> --
> >> Regards,
> >> Aritra Basu
> >>
> >> On Mon, Jan 29, 2024, 6:11 PM Jarek Potiuk  wrote:
> >>
> >>> Just wanted to remind everyone, we are nearing the end of the trial
> >> period
> >>> for "require conversation" feature to be enabled. I have my own
> >>> observations and examples, but since I was the one to propose it, I am
> >>> likely biased, so I'd love to hear from others what their feedback and
> >>> assessment is. Or maybe we need more time to assess it ?
> >>>
> >>> I would love to hear your thoughts.
> >>>
> >>> J,
> >>>
> >>>
> >>> On Sat, Dec 30, 2023 at 2:20 PM Jarek Potiuk  wrote:
> >>>
>  After an initial indentation problem in .asf.yaml it's not working as
>  expected. So  let's see how resolving conversations will work for
> >> us.
> 
>  On Sat, Dec 30, 2023 at 12:17 PM Amogh Desai <
> amoghdesai@gmail.com
> >>>
>  wrote:
> 
> > Wooho! Looking to see how this turns out for airflow 😃
> >
> > On Sat, 30 Dec 2023 at 1:35 PM, Jarek Potiuk 
> >> wrote:
> >
> >> Hello everyone,
> >>
> >> As discussed in
> >> https://lists.apache.org/thread/cs6mcvpn2lk9w2p4oz43t20z3fg5nl7l I
> >>> just
> >> enabled "require conversation resolution" for our main/stable
> >>> branches.
> > We
> >> have not used it in the past so it might not work as we think or we
> > might
> >> need to tweak something.
> >>
> >> Generally speaking (if all works) all conversations on PRs should be
> >> resolved before we can merge the PR. This "resolving" is encouraged
> >> to
> > be
> >> done by the author when they think the conversation is resolved, but
> >>> it
> > can
> >> also be done by reviewers or the maintainer who wants to merge the
> >> PR.
> >>
> >> We attempted to describe some basic rules and expectations here:
> >>
> >>
> >
> >>>
> >>
> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#step-5-pass-pr-review
> >> but undoubtedly there will be questions and issues that we might
> >> want
> >>> to
> >> solve - so feel free to discuss it here or raise question/issues in
> >> #development channel in slack (I am also happy to be pinged directly
> > about
> >> it and help to resolve any issues/gather feedback).
> >>
> >> J.
> >>
> >
> 
> >>>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [VOTE] January 2024 PR of the Month

2024-01-23 Thread Ryan Hatter
Gotta agree with Constance and go with 22253 -- how cool that the author
stuck with it all this time!

On Tue, Jan 23, 2024 at 12:25 AM Aritra Basu 
wrote:

> My vote is for #36537 it's been a huge effort and it makes huge
> improvements in our packaging. Great to see it make it into airflow.
>
> --
> Regards,
> Aritra Basu
>
> On Tue, Jan 23, 2024, 10:13 AM Amogh Desai 
> wrote:
>
> > Is there a possibility to vote for more than one? I guess not :/
> >
> > My vote goes to #36537 for the enhancements that have come in with it. I
> > have followed the discussions
> > at a higher level and it surely wasn't easy :)
> > (If I could vote again, it would surely be #36537 for the endless
> > perseverance and dedication of the author)
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> > On Tue, Jan 23, 2024 at 3:21 AM Jarek Potiuk  wrote:
> >
> > > Heck, why not. I will shamelessly vote on my #36537. While it took
> just a
> > > few weeks to merge, It leapfrogged our legacy packaging setup to
> > > more-or-less bleeding edge from what was there since the beginning of
> > > Airflow (almost 10 years) and was already "old-ish" when I joined the
> > > project more than 4 years ago. And with hatch and cleanups in extras,
> it
> > > has a positive impact on both - contributors and users (or so I hope).
> > >
> > > On Mon, Jan 22, 2024 at 7:48 PM Constance Martineau
> > >  wrote:
> > >
> > > > +1 #22253
> > > >
> > > > The PR was opened in March 2022, and was finally merged last week! I
> > > admire
> > > > the author's persistence in getting this merged in, and think the
> > > > simplifications to the interface make the Operator more user-friendly
> > for
> > > > our Data Science users.
> > > >
> > > > On Mon, Jan 22, 2024 at 1:29 PM Briana Okyere
> > > >  wrote:
> > > >
> > > > > Hey All,
> > > > >
> > > > > It’s once again time to vote for the PR of the Month.
> > > > >
> > > > > With the help of the `get_important_pr_candidates` script in
> > dev/stats,
> > > > > we've identified the following candidates:
> > > > >
> > > > > PR #36513: Include plugins in the architecture diagrams.
> > > > > 
> > > > >
> > > > > PR #32867: Sanitize the conn_id to disallow potential script
> > > execution. <
> > > > > https://github.com/apache/airflow/pull/32867>
> > > > >
> > > > > PR #22253: Add SparkKubernetesOperator crd implementation.
> > > > > 
> > > > >
> > > > > PR #36171: Implement AthenaSQLHook.
> > > > > 
> > > > >
> > > > > PR #36537: Standardize airflow build process and switch to
> Hatchling
> > > > build
> > > > > backend. 
> > > > >
> > > > > Please reply to this thread with your selection or offer your own
> > > > > nominee(s).
> > > > >
> > > > > Voting will close on Jan. 26th at 1 PM PST. The winner(s) will be
> > > > featured
> > > > > in the next issue of the Airflow newsletter.
> > > > >
> > > > > Also, if there’s an article or event that you think should be
> > included
> > > in
> > > > > this or a future issue of the newsletter, please drop me a line at
> <
> > > > > briana.oky...@astronomer.io>.
> > > > >
> > > > > --
> > > > > Briana Okyere
> > > > > Community Manager
> > > > > *Astronomer*
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Constance Martineau
> > > > Senior Product Manager
> > > >
> > > > Email: consta...@astronomer.io
> > > > Time zone: US Eastern (EST UTC-5 / EDT UTC-4)
> > > >
> > > >
> > > > 
> > > >
> > >
> >
>


Re: [DISCUSSION] Enhanced Multi-Tenant Dataset Management in Airflow: Potential First Steps

2024-01-22 Thread Ryan Hatter
I don't think it makes sense to include the create endpoint without also
including dataset update and delete endpoints and updating the Datasets
view in the UI to be able to manage externally created Datasets.

With that said, I don't think the fact that Datasets are tightly coupled
with DAGs is a good reason not to include additional Dataset endpoints. It
makes sense to me to be able to interact with Datasets from outside of
Airflow.

On Sat, Jan 20, 2024 at 6:13 AM Eduardo Nicastro 
wrote:

> Hello all, I have created a Pull Request (
> https://github.com/apache/airflow/pull/36929) to make it possible to
> create a dataset through the API as a modest step forward. This PR is open
> for your feedback. I'm preparing another PR to build upon the insights from
> https://github.com/apache/airflow/pull/29433. Your thoughts and
> contributions are highly encouraged.
>
> Best Regards,
> Eduardo Nicastro
>
> On Thu, Jan 11, 2024 at 4:30 PM Eduardo Nicastro 
> wrote:
>
>> Hello all,
>>
>> I'm reaching out to propose a topic for discussion that has recently
>> emerged in our GitHub discussion threads (#36723
>> ). It revolves
>> around enhancing the management of datasets in a multi-tenant Airflow
>> architecture.
>>
>> Use case/motivation
>> In our multi-instance setup, synchronizing dataset dependencies across
>> instances poses significant challenges. With the advent of dataset
>> listeners, a new door has opened for cross-instance dataset awareness. I
>> propose we explore creating endpoints to export dataset updates to make it
>> possible to trigger DAGs consuming from a Dataset across tenants.
>>
>> Context
>> Below I will give some context about our current situation and solution
>> we have in place and propose a new workflow that would be more efficient.
>> To be able to implement this new workflow we would need a way to export
>> Dataset updates as mentioned.
>>
>> Current Workflow
>> In our organization, we're dealing with multiple Airflow tenants, let's
>> say Tenant 1 and Tenant 2, as examples. To synchronize Dataset A across
>> these tenants, we currently have a complex setup:
>>
>>1. Containers run on a schedule to export metadata to CosmosDB (these
>>will be replaced by the listener).
>>2. Additional scheduled containers pull data from CosmosDB and write
>>it to a shared file system, enabling generated DAGS to read it and mirror 
>> a
>>dataset across tenants.
>>
>>
>> Proposed Workflow
>> Here's a breakdown of our proposed workflow:
>>
>>1. Cross-Tenant Dataset Interaction: We have Dags in Tenant 1
>>producing Dataset A. We need a mechanism to trigger all Dags consuming
>>Dataset A in Tenant 2. This interaction is crucial for our data pipeline's
>>efficiency and synchronicity.
>>2. Dataset Listener Implementation: Our approach involves
>>implementing a Dataset listener that programmatically creates Dataset A in
>>all tenants where it's not present (like Tenant 2) and export Dataset
>>updates when they happen. This would trigger an update on all Dags
>>consuming from that Dataset.
>>3. Standardized Dataset Names: We plan to use standardized dataset
>>names, which makes sense since a URI is its identifier and uniqueness is a
>>logical requirement.
>>
>> [image: image.png]
>>
>> Why This Matters:
>>
>>- It offers a streamlined, automated way to manage datasets across
>>different Airflow instances.
>>- It aligns with a need for efficient, interconnected workflows in a
>>multi-tenant environment.
>>
>>
>> I invite the community to discuss:
>>
>>- Are there alternative methods within Airflow's current framework
>>that could achieve similar goals?
>>- Any insights or experiences that could inform our approach?
>>
>> Your feedback and suggestions are invaluable, and I look forward to a
>> collaborative discussion.
>>
>> Best Regards,
>> Eduardo Nicastro
>>
>


Re: AIP-61 - Hybrid Executors

2024-01-18 Thread Ryan Hatter
It sure does! Thank you.

On Thu, Jan 18, 2024 at 3:17 PM Oliveira, Niko 
wrote:

> Hey Ryan,
>
> Thanks for the reply! I'll make that note more clear in the AIP. It's a
> nuanced point, and the wording is a bit vague at best right now or
> borderline misleading ><
>
> What I'm trying to say here is that if you don't configure a specific
> executor for a task, it will run on the default/environment level executor
> and for each attempt that will be the case. But I was just pointing out
> that it's also true that the default executor can be reconfigured to a
> different executor at any time (with a scheduler restart of course) so that
> env/default executor can be _any_ executor the user has currently
> configured. And if that's done in the middle of a DAG run, or between
> retries, etc the default level could be different than it was before. And
> note that this is how Airflow behaves today already.
>
> Does that clear things up? Let me know if it doesn't and I'll have a third
> go at it :)
>
> Cheers,
> Niko
>
> 
> From: Ryan Hatter 
> Sent: Thursday, January 18, 2024 1:27:00 PM
> To: dev@airflow.apache.org
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] AIP-61 - Hybrid Executors
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> >
> > *IMPORTANT NOTE*: task instances that run on the default/environment
> > executor (i.e. with no specific override provided) will not persist the
> > executor in the same way so that they can be re-run/retried on any
> executor.
>
>
> Does this mean that any task that doesn't have the `executor` parameter
> specified will run on the default executor for its first attempt, but any
> executor on subsequent attempts? If so, how will users specify the default
> executor? Also, I think this will provide some debugging challenges when a
> task runs on one executor on one attempt, but a different executor on a
> different attempt. It might be nice if users could set a
> `use_default_executor_on_retry` kind of parameter.
>
> On Mon, Jan 15, 2024 at 1:05 PM Oliveira, Niko  >
> wrote:
>
> > Hey folks!
> >
> > I'd like to announce a new proposal for Airflow. I've teased this before
> > in my talk on executors at this year's summit and it was also mentioned
> in
> > the townhall last week.
> >
> > It's a proposal to allow using multiple executors concurrently within a
> > single Airflow environment.
> >
> >
> > Let me know what you think!
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-61+Hybrid+Execution
> >
> > Cheers,
> > Niko
> >
>


Re: [DISCUSSION] Enabling `pre-commit.ci` application for Airflow

2024-01-18 Thread Ryan Hatter
I'm in favor of this. I love making docs changes directly in GitHub, but I
often make a tiny mistake like a trailing space and the tests fail. I think
things like this discourage new contributors, as contributing to docs is
the easiest way to start getting involved.

On Thu, Jan 4, 2024 at 12:16 PM Jarek Potiuk  wrote:

> Yep. Also surprised by the 50/50 - so far the "easy" path is blocked
> by INFRA, so I am not sure if we will quickly do it, but I will likely
> see what we can do soon.
>
> And yes. This is the same for me - I **LOVE** black and always have
> pre-commit installed because I do not have to spend any mind-cycles on
> things that are extremely important for the project and readability
> (i.e. consistency) but extremely unnecessary to worry about it when I
> think about solving real problems.
>
> On Thu, Jan 4, 2024 at 8:05 PM Oliveira, Niko
>  wrote:
> >
> > Interesting how 50/50 this one has turned out to be!
> >
> > I'm personally in favour (+1). The less I have to worry about accidental
> typos, indentation, quoting, etc the better, I can focus on important
> changes. It will also unblock many PRs from contributors that are otherwise
> mergeable except for being stuck on very simple static check failures,
> which as a maintainer sounds very nice (it will solve having to post the
> regular comment of "please run and fix static checks").
> >
> > And ultimately if the bot does something silly (just as a human can and
> often does) we can catch it in the PR review.
> >
> >
> > Cheers,
> > Niko
> >
> > 
> > From: Wei Lee 
> > Sent: Tuesday, January 2, 2024 5:58:18 PM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSSION] Enabling `
> pre-commit.ci` application for Airflow
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
> ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> certain que le contenu ne présente aucun risque.
> >
> >
> >
> > Same as Amogh. Even though I would like to fix that myself, it would
> make it much easier for those who aren’t familiar with these tools and
> still be able to contribute. But we might need to doc this behavior
> somewhere (GitHub PR issue might make more sense 🤔). Otherwise, the
> contributor might be surprised by the new commit.
> >
> > Best,
> > Wei
> >
> > > On Jan 3, 2024, at 12:21 AM, Vincent Beck  wrote:
> > >
> > > I like the concept! +1
> > >
> > > On 2023/12/30 11:16:35 Amogh Desai wrote:
> > >> I am aligning here with Pierre, but I am not against the idea of
> enabling
> > >> the pre commit ci application.
> > >>
> > >> I’d rather have myself fix the issue as it sometimes also lets me have
> > >> second,third or multiple passes at my code which is up for review.
> This is
> > >> a personal choice where I feel that we are trying to fix a problem
> that is
> > >> not too problematic.
> > >>
> > >> Again, only a personal choice but not against it. If it makes lives
> of the
> > >> stakeholders involved easier, I am all up for it!
> > >>
> > >> Thanks & Regards,
> > >> Amogh Desai
> > >>
> > >> On Sat, 30 Dec 2023 at 2:35 PM, Pierre Jeambrun <
> pierrejb...@gmail.com>
> > >> wrote:
> > >>
> > >>> I like the idea, but in practice auto fixable static checks are very
> > >>> obvious to fix and doesn’t require much work.
> > >>>
> > >>> On the other hand most of static failure are ‘real issues’ and not
> auto
> > >>> fixable, for instance mypy, spelling, sphinx, db session usage etc….
> (And
> > >>> ruff fix is a little aggressive IMO regarding linting).
> > >>>
> > >>> So I would say that in practice it solves a painless problem
> (formatting,
> > >>> import sorting and other obvious things) and can’t do much about
> other
> > >>> issues.
> > >>>
> > >>> This is why I am not sure it is worth the confusion for users. (But
> I am
> > >>> not against it)
> > >>>
> > >>> On Sat 30 Dec 2023 at 09:19, Scheffler Jens (XC-DX/PJ-PACE-E03)
> > >>>  wrote:
> > >>>
> >  I‘d also like to have auto-fixing included in CI. It is a classic
> pitfall
> >  and all that can be automated does not need to be a manual burden.
> >  Though I am not sure whether the plugin is able to use all the
> custom
> >  stuff as we also depend during execution on the CI image and docker.
> >  Besides security things this would be something that needs testing
> if it
> >  works.
> > 
> >  TLDR: +1 opinion
> > 
> >  Sent from Outlook for iOS
> >  
> >  From: Pankaj Koti 
> >  Sent: Saturday, December 30, 2023 7:50:10 AM
> >  To: dev@airflow.apache.org 
> >  Subject: Re: [DISCUSSION] Enabling `pre-commit.ci` ap

Re: AIP-61 - Hybrid Executors

2024-01-18 Thread Ryan Hatter
>
> *IMPORTANT NOTE*: task instances that run on the default/environment
> executor (i.e. with no specific override provided) will not persist the
> executor in the same way so that they can be re-run/retried on any executor.


Does this mean that any task that doesn't have the `executor` parameter
specified will run on the default executor for its first attempt, but any
executor on subsequent attempts? If so, how will users specify the default
executor? Also, I think this will provide some debugging challenges when a
task runs on one executor on one attempt, but a different executor on a
different attempt. It might be nice if users could set a
`use_default_executor_on_retry` kind of parameter.

On Mon, Jan 15, 2024 at 1:05 PM Oliveira, Niko 
wrote:

> Hey folks!
>
> I'd like to announce a new proposal for Airflow. I've teased this before
> in my talk on executors at this year's summit and it was also mentioned in
> the townhall last week.
>
> It's a proposal to allow using multiple executors concurrently within a
> single Airflow environment.
>
>
> Let me know what you think!
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-61+Hybrid+Execution
>
> Cheers,
> Niko
>


Re: The "no_status" state

2023-11-28 Thread Ryan Hatter
rking out "backwards" - guessing why might be needed. But  possibly
> it
> > > could be helped with some extra information stored by the scheduler.
> > >
> > > I think we will not have a complete and fully accurate picture, but I
> > think
> > > iteratively we could get this better and better.
> > >
> > > J
> > >
> > >
> > > On Mon, Oct 16, 2023 at 11:55 PM Oliveira, Niko
> > > 
> > > wrote:
> > >
> > > > I really like this idea as well! One of the _the most common_
> > questions I
> > > > get from people managing an Airflow env is "Why is my task stuck in
> > state
> > > > X". Anything we can do to make that more discoverable and user
> > friendly,
> > > > especially in the UI instead of (or in addition to) logs would be
> > > fantastic!
> > > >
> > > > Thanks to Jens for having a think and pointing out a lot of the
> > > > implications, I agree a quick AIP might be nice for this one.
> > > >
> > > > Cheers,
> > > > Niko
> > > >
> > > > 
> > > > From: Scheffler Jens (XC-DX/ETV5)  > .INVALID>
> > > > Sent: Thursday, September 28, 2023 10:36:00 PM
> > > > To: dev@airflow.apache.org
> > > > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] The "no_status" state
> > > >
> > > > CAUTION: This email originated from outside of the organization. Do
> not
> > > > click links or open attachments unless you can confirm the sender and
> > > know
> > > > the content is safe.
> > > >
> > > >
> > > >
> > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> > externe.
> > > > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> > > pouvez
> > > > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> certain
> > > que
> > > > le contenu ne présente aucun risque.
> > > >
> > > >
> > > >
> > > > Hi Ryan,
> > > >
> > > > I really like the idea of exposing some more scheduler details. More
> > > > transparency in scheduling also in the UI would help the user in (1)
> > > seeing
> > > > and understanding what is going on and (2) reduces the need to crawl
> > for
> > > > logs and raise support tickets if status is “strange”. I often also
> see
> > > > this as a problem. This is also sometimes generating a bit of “mis
> > trust”
> > > > in the scheduler stability.
> > > >
> > > > From point of scheduler “overhead” I assume as long as we are not
> > making
> > > a
> > > > “full scan” just to ensure that each and every task is always
> > up-to-date
> > > > (Scheduler stops processing today after enough tasks have been
> > processes
> > > in
> > > > a loop or if scheduling limits are reached) this is OK for me and on
> > the
> > > > code side does not seem to be much overhead.
> > > > I have a bit of fear on the other hand that very many frequent
> updates
> > > > need to happen on the DB as another state would need to be written.
> So
> > > more
> > > > DB round trips are needed. This might hit performance for large DAGs
> or
> > > > cases where DAGs are scheduled. So at least it would need to filter
> to
> > > > update the state to DB only if changed to keep performance impact
> > > minimal.
> > > >
> > > > From point of naming I still think “no status” is good to indicate
> that
> > > > scheduler did not digest anything, maybe task was never looked at
> > because
> > > > scheduler actually is really stuck or too busy getting there. I would
> > > > propose if scheduler passes along a task and decides that it is not
> > ready
> > > > to schedule to have an additional state calling e.g. “not_ready” in
> the
> > > > state model between “none” and “scheduled”.
> > > >
> > > > Finally on the other hand, adding another state in the model, I am
> not
> > > > sure whether this 100% will help in the use case described by you.
> > Still
> > > > you might need to scratch your head a while if taking a look on UI
> > that a
> > > > DAG is “stuck” until you realize all the options you have configured.
> > > > Exp

Re: [PROPOSE] Airflow Monthly Town-Hall

2023-11-28 Thread Ryan Hatter
I'd like to be involved!

On Tue, Nov 28, 2023 at 4:16 PM Viraj Parekh 
wrote:

> I think this is a great idea -- I've heard from a lot of folks in the
> community that it can be hard to keep up with everything going on with
> Airflow. I think the community is really good at communicating these things
> within the contributors, but I'd imagine it can be a little hard to
> know where to look. A dedicated meeting every month for folks to hop on and
> hear updates, chat about AIPs, and other such things would go a long way!
>
> I also know that a lot of other OSS projects have open, monthly, virtual
> meetings where folks discuss development and other topics. It'll also be a
> great spot for folks to learn how they can get involved!
>
> The agenda/speakers/etc. will almost definitely evolve over time, but
> that's the fun part. We should also post notes + summaries on the devlist
> and slack after every meeting.
>
> When do you think would be the best time to start?
>
> Viraj
>
> On Tue, Nov 28, 2023 at 3:41 PM Briana Okyere
>  wrote:
>
> > Hey All,
> >
> > I've been speaking with Kaxil and Jarek about this, and would like to
> > propose it to you all.
> >
> > It seems those involved in contributing to Airflow regularly are very
> well
> > synced on the product. However, I think we can do a better job of
> involving
> > the community at large in the amazing work being done daily on this
> > product.
> >
> > So, I would like to propose a monthly town hall, where community members
> > can join together virtually each month to receive updates, give feedback,
> > and ideally, see where they can lend a hand.
> >
> > If anyone is interested in being involved, please let me know and we can
> > discuss further. Or if you simply have thoughts, I'd love to hear them!
> >
> > --
> > Briana Okyere
> > Community Manager
> > Email: briana.oky...@astronomer.io
> > Mobile: +1 415.713.9943
> > Time zone: US Pacific UTC
> >
> > 
> >
>


Re: [VOTE] November PR of the Month

2023-11-28 Thread Ryan Hatter
Another +1 for 32646... this will make DAG owners' lives much easier :)

On Tue, Nov 28, 2023 at 4:21 PM Hussein Awala  wrote:

> +1 for #32646
>
> On Tue, Nov 28, 2023 at 6:32 AM Rahul Vats  wrote:
>
> > +1 for #32646
> >
> > Regards,
> > Rahul Vats
> >
> > On Tue, 28 Nov, 2023, 09:55 Aritra Basu, 
> wrote:
> >
> > > +1 to #32646 from me too
> > >
> > > --
> > > Regards,
> > > Aritra Basu
> > >
> > > On Tue, Nov 28, 2023, 9:50 AM Amogh Desai 
> > > wrote:
> > >
> > > > I vote for #32646
> > > >
> > > > Great work!
> > > >
> > > > Thanks & Regards,
> > > > Amogh Desai
> > > >
> > > > On Tue, Nov 28, 2023 at 5:41 AM Wei Lee  wrote:
> > > >
> > > > > + 1 for 32646
> > > > >
> > > > > Best,
> > > > > Wei
> > > > >
> > > > > > On Nov 28, 2023, at 5:35 AM, utkarsh sharma <
> > utkarshar...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > +1 for 32646, great work.
> > > > > >
> > > > > > Thanks,
> > > > > > Utkarsh Sharma
> > > > > >
> > > > > > On Tue, Nov 28, 2023 at 2:50 AM Collin McNulty
> > > > > 
> > > > > > wrote:
> > > > > >
> > > > > >> +1 to 32646. Been wanting this for a long time!
> > > > > >>
> > > > > >> On Mon, Nov 27, 2023 at 11:50 AM Briana Okyere
> > > > > >>  wrote:
> > > > > >>
> > > > > >>> Happy Holidays Everyone :)
> > > > > >>>
> > > > > >>> It’s once again time to vote for the PR of the Month.
> > > > > >>>
> > > > > >>> With the help of the `get_important_pr_candidates` script in
> > > > dev/stats,
> > > > > >>> we've identified the following candidates:
> > > > > >>>
> > > > > >>> PR #32646: Add task context logging feature to allow forwarding
> > > > > messages
> > > > > >> to
> > > > > >>> task logs. 
> > > > > >>>
> > > > > >>> PR #34964: Add task parameter to set custom logger name. <
> > > > > >>> https://github.com/apache/airflow/pull/34964>
> > > > > >>>
> > > > > >>> PR #34921: Add Cohere Provider. <
> > > > > >>> https://github.com/apache/airflow/pull/34921>
> > > > > >>>
> > > > > >>> PR #35488: Implement login and logout in AWS auth manager. <
> > > > > >>> https://github.com/apache/airflow/pull/35488>
> > > > > >>>
> > > > > >>> PR #35530: feature(providers): added `OpsgenieNotifier`. <
> > > > > >>> https://github.com/apache/airflow/pull/35530>
> > > > > >>>
> > > > > >>> Please reply to this thread with your selection or offer your
> own
> > > > > >>> nominee(s).
> > > > > >>>
> > > > > >>> Voting will close on December 1st at 1 PM PST. The winner(s)
> will
> > > be
> > > > > >>> featured in the next issue of the Airflow newsletter.
> > > > > >>>
> > > > > >>> Also, if there’s an article or event that you think should be
> > > > included
> > > > > in
> > > > > >>> this or a future issue of the newsletter, please drop me a line
> > at
> > > <
> > > > > >>> briana.oky...@astronomer.io>.
> > > > > >>>
> > > > > >>> --
> > > > > >>> Briana Okyere
> > > > > >>> Community Manager
> > > > > >>> Email: briana.oky...@astronomer.io
> > > > > >>> Mobile: +1 415.713.9943
> > > > > >>> Time zone: US Pacific UTC
> > > > > >>>
> > > > > >>> 
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Add providers for Pinecone, OpenAI & Cohere to enable first-class LLMOps

2023-10-27 Thread Ryan Hatter
+1 (non-binding)

On Thu, Oct 26, 2023 at 9:32 AM Oliveira, Niko 
wrote:

> +1 (binding)
>
> looking forward to having more native LLM capabilities in Airflow!
>
> 
> From: Aritra Basu 
> Sent: Wednesday, October 25, 2023 12:10:00 PM
> To: dev@airflow.apache.org
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [VOTE] Add providers for
> Pinecone, OpenAI & Cohere to enable first-class LLMOps
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1 (non binding)
>
> --
> Regards,
> Aritra Basu
>
> On Wed, Oct 25, 2023, 11:02 PM Ferruzzi, Dennis
> 
> wrote:
>
> > +1 (binding)
> >
> >
> >  - ferruzzi
> >
> >
> > 
> > From: Jed Cunningham 
> > Sent: Wednesday, October 25, 2023 9:54 AM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [VOTE] Add providers for
> > Pinecone, OpenAI & Cohere to enable first-class LLMOps
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> pouvez
> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> que
> > le contenu ne présente aucun risque.
> >
> >
> >
> > +1 (binding)
> >
>


Re: Airflow Docs Development Issues

2023-10-26 Thread Ryan Hatter
yone who would like to commit to doing
> > it.
> > > > > > > >> > > >
> > > > > > > >> > > > J.
> > > > > > > >> > > >
> > > > > > > >> > > > On Fri, Oct 20, 2023 at 3:27 PM Pierre Jeambrun <
> > > > > > > >> pierrejb...@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > +1 from moving archived docs outside of
> airflow-site.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Even if that might mean a little more maintenance in
> > > case
> > > > we
> > > > > > > need
> > > > > > > >> to
> > > > > > > >> > > > > propagate changes to all historical versions, we
> would
> > > > have
> > > > > to
> > > > > > > >> > handle 2
> > > > > > > >> > > > > repositories, but that seems like a minor downside
> > > > compared
> > > > > to
> > > > > > > the
> > > > > > > >> > > > quality
> > > > > > > >> > > > > of life improvement that it would bring for
> > airflow-site
> > > > > > > >> > contributions.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Le jeu. 19 oct. 2023 à 16:11, Jarek Potiuk <
> > > > > ja...@potiuk.com>
> > > > > > a
> > > > > > > >> > écrit
> > > > > > > >> > > :
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Let me just clarify (because that could be
> unclear)
> > > what
> > > > > my
> > > > > > +1
> > > > > > > >> was
> > > > > > > >> > > > about.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > I was not talking (and I believe Ryan was not
> > talking
> > > > > > either)
> > > > > > > >> about
> > > > > > > >> > > > > > removing the old docs but about archiving them and
> > > > serving
> > > > > > > from
> > > > > > > >> > > > elsewhere
> > > > > > > >> > > > > > (cloud storage).
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > I think discussing changing to more shared
> > HTML/JS/CSS
> > > > is
> > > > > > > also a
> > > > > > > >> > good
> > > > > > > >> > > > > idea
> > > > > > > >> > > > > > to optimise it, but possibly can be handled
> > separately
> > > > as
> > > > > a
> > > > > > > >> longer
> > > > > > > >> > > > effort
> > > > > > > >> > > > > > of redesigning how the docs are built. But by all
> > > means
> > > > we
> > > > > > > could
> > > > > > > >> > also
> > > > > > > >> > > > > work
> > > > > > > >> > > > > > on that.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Maybe I jumped to conclusions, but the easiest,
> > > tactical
> > > > > > > >> solution
> > > > > > > >> > > (for
> > > > > > > >> > > > > the
> > > > > > > >> > > > > > most acute issue - size) is we just move the old
> > > > generated
> > > > > > > HTML
> > > > > > > >> > docs
> > > > > > > >> > > > from
> > > > > > > >> > > > > > the git repository of &quo

Airflow Docs Development Issues

2023-10-18 Thread Ryan Hatter
*tl;dr*

   1. The GitHub Action for building docs is running out of space. I think
   we should archive really old documentation for large packages to cloud
   storage.
   2. Contributing to and building Airflow docs is hard. We should migrate
   to a framework, preferably one that uses markdown (although I acknowledge
   rst -> md will be a massive overhaul).

*Problem Summary*
I recently set out to implement what I thought would be a straightforward
feature: warn users when they are viewing documentation for non-current
versions of Airflow and link them to the current/stable version
. Jed pointed me to the
airflow-site  repo, which contains
all of the archived docs (that is, documentation for non-current versions),
and from there, I ran into a brick wall.

I want to raise some concerns that I've developed after trying to
contribute what feel like a couple reasonably small docs updates:

   1. airflow-site
  1. Elad pointed out the problem posed by the sheer size of archived
  docs
  

(more
  on this later).
  2. The airflow-site repo is confusing, and rather poorly documented.
 1. Hugo (static site generator) exists, but appears to only be
 used for the landing pages
 2. In order to view any documentation locally other than the
 landing pages, you'll need to run the site.sh script then
copy the output
 from one dir to another?
  3. All of the archived docs are raw HTML, making migrating to a
  static site generator a significant challenge, which makes it
difficult to
  prevent the archived docs from continuing to grow and grow.
Perhaps this is the
  wheel Khaleesi was referring to
  ?
   2. airflow
  1. Building Airflow docs is a challenge. It takes several minutes and
  doesn't support auto-build, so the slightest issue could require waiting
  again and again until the changes are just so. I tried implementing
  sphinx-autobuild 
  to no avail.
  2. Sphinx/restructured text has a steep learning curve.

*The most acute issue: disk space*
The size of the archived docs is causing the docs build GitHub Action to
almost run out of space. From the "Build site" Action from a couple weeks
ago

(expand
the build site step, scroll all the way to the bottom, expand the `df -h`
command), we can see the GitHub Action runner (or whatever it's called) is
nearly running out of space:

df -h
  *Filesystem  Size  Used Avail Use% Mounted on*
  /dev/root84G   82G  2.1G  98% /


The available space is down to 1.8G on the most recent Action
.
If we assume that trend is accurate, we have about two months before the
Action runner runs out of disk space. Here's a breakdown of the space
consumed by the 10 largest package documentation directories:

du -h -d 1 docs-archive/ | sort -h -r
* 14G* docs-archive/
*4.0G* docs-archive//apache-airflow-providers-google
*3.2G* docs-archive//apache-airflow
*1.7G* docs-archive//apache-airflow-providers-amazon
*560M* docs-archive//apache-airflow-providers-microsoft-azure
*254M* docs-archive//apache-airflow-providers-cncf-kubernetes
*192M* docs-archive//apache-airflow-providers-apache-hive
*153M* docs-archive//apache-airflow-providers-snowflake
*139M* docs-archive//apache-airflow-providers-databricks
*104M* docs-archive//apache-airflow-providers-docker
*101M* docs-archive//apache-airflow-providers-mysql


*Proposed solution: Archive old docs html for large packages to cloud
storage*
I'm wondering if it would be reasonable to truly archive the docs for some
of the older versions of these packages. Perhaps the last 18 months? Maybe
we could drop the html in a blob storage bucket with instructions for
building the docs if absolutely necessary?

*Improving docs building moving forward*
There's an open Issue  for
migrating the docs to a framework, but it's not at all a straightforward
task for the archived docs. I think that we should institute a policy of
archiving old documentation to cloud storage after X time and use a
framework for building docs in a scalable and sustainable way moving
forward. Maybe we could chat with iceberg folks about how they moved from
mkdocs to hugo? 


Shoutout to Utkarsh for helping me through all this!


The "no_status" state

2023-09-28 Thread Ryan Hatter
Over the last couple weeks I've come across a rather tricky problem a few
times. One DAG run gets "stuck" in the queued state, while subsequent DAG
runs will be stuck running (screenshot below). One of these issues was
caused by `max_active_runs` being met when a task instance from a
previously run DAG was cleared, and one of the tasks had
`depends_on_past=True`. This caused the DAG run to be stuck in queued in
perpetuity until it was realized that the task that wasn't getting
scheduled needed the failed task in the preceding DAG run to be re-run,
which in turn causes the stuck running DAG runs to be stuck in running.
which caused quite a bit of confusion and stress.

Given that Airflow is pretty burnt out on task instance states and colors,
I propose replacing "no_status" with "dependencies_not_met" and surfacing
dependencies in the grid view instead of forcing users to already know
where to look (i.e. "more details" task instance details). Now that I typed
it out, I'm not sure there should be a reason for the "more details" button
and not just laying out all of a task instance's details in the grid view
similar to how the graph and code views are now included in the grid view.

Anyway, I wanted to solicit feedback before I open an issue / start work on
this.

[image: image.png]


Re: Airflow projects

2023-09-27 Thread Ryan Hatter
You can find some Airflow community resources here:
https://airflow.apache.org/community/

Also feel free to join the Airflow community Slack:
https://apache-airflow-slack.herokuapp.com/

On Mon, Sep 18, 2023 at 7:59 AM Avitabayan Sarmah 
wrote:

> Thank you, I will do that.
>
> Regards,
> Avitab Ayan
>
> On Sun, Sep 17, 2023, 9:27 PM Brian Proffitt  wrote:
>
> > Avitabayan:
> >
> > I recommend you reach out to the Airflow community and share what you
> > have with them. I have copied their dev mailing list in this thread.
> >
> > BKP
> >
> >
> > Brian Proffitt
> > VP, Marketing & Publicity
> >
> > On Sun, Sep 17, 2023 at 11:55 AM Avitabayan Sarmah
> >  wrote:
> > >
> > > Hi Press,
> > >
> > > I am currently working on airflow with docker. I am practicing airflow
> > on different datasets.
> > >  I am a data engineer having 4-6 years of experience, it's been 3 years
> > I started working with airflow. I have strong working experience with
> > python programming language. I think I will be good fit for this project.
> > >
> > > Let me know what you think.
> > >
> > > Regards,
> > > Avitab Ayan Sarmah
> > > (Data engineer| software engineer)
> >
>


Re: [VOTE] September 2023 PR of the Month

2023-09-27 Thread Ryan Hatter
Gotta go with #28900 -- what a huge scope!

On Tue, Sep 26, 2023 at 1:45 PM Michael Robinson
 wrote:

> Hi folks,
>
> It’s once again time to vote for the PR of the Month.
>
> With the help of the `get_important_pr_candidates` script in dev/stats,
> I’ve identified the following candidates:
>
>  * PR #28900 by @vincbeck: Convert DagFileProcessor.execute_callbacks to
> Internal API. 
>  * PR #33903 by @dstandish: Ensure that tasks wait for running indirect
> setup. 
>  * PR #33901 by @vandonr-amz: Fix inheritance chain in security manager. <
> https://github.com/apache/airflow/pull/33901>
>  * PR #33637 by @pankajkoti: Use a trimmed version of README.md for PyPI. <
> https://github.com/apache/airflow/pull/33637>
>  * PR #34269 by @amoghrajesh: Add verification if short options in breeze
> are repeated. 
>
> Please reply to this thread with your selection or offer your own
> nominee(s).
>
> Voting will close on September 27th at 10 am ET. The willer will be
> featured in the next issue of the Airflow newsletter.
>
> Also, if there’s an article or event that you think should be included in
> this or a future issue, please drop Briana Okyere a line at <
> briana.oky...@astronomer.io >.
>
> Thanks!
>
> Michael Robinson
> Community Manager
> michael.robin...@astronomer.io 


Re: [VOTE] Airflow Providers prepared on September 08, 2023

2023-09-11 Thread Ryan Hatter
+1 (non-binding). My change works *mostly* as expected, and the unexpected
behavior isn't really a problem


On Mon, Sep 11, 2023 at 1:40 PM Josh Fell 
wrote:

> +1 (non-binding)
>
> Tested my changes (and another related one). Looks good.
>
> On Mon, Sep 11, 2023 at 2:58 AM Rahul Vats  wrote:
>
> > +1 (non-binding)
> >
> > Regards,
> > Rahul Vats
> > 9953794332
> >
> >
> > On Mon, 11 Sept 2023 at 11:37, Wei Lee  wrote:
> >
> > > +1 (non-binding)
> > >
> > > 1. Tested with #33825 ,
> > > #33822 , #34098 <
> > > https://github.com/apache/airflow/pull/34098>
> > > 2. astronomer-providers example DAGs ran fine, as Pankaj mentioned.
> > >
> > > It would be nice if we could include these documentation changes as
> well
> > > #34104 , #34103 <
> > > https://github.com/apache/airflow/pull/34103>, #34102 <
> > > https://github.com/apache/airflow/pull/34102>, #34101 <
> > > https://github.com/apache/airflow/pull/34101>, #34097 <
> > > https://github.com/apache/airflow/pull/34097>, #34096 <
> > > https://github.com/apache/airflow/pull/34096>, #34095 <
> > > https://github.com/apache/airflow/pull/34095>, #34094 <
> > > https://github.com/apache/airflow/pull/34094>, #34074 <
> > > https://github.com/apache/airflow/pull/34074>, #34073 <
> > > https://github.com/apache/airflow/pull/34073>?
> > > Thanks!
> > >
> > > Best,
> > > Wei
> > >
> > > > On Sep 11, 2023, at 12:48 PM, Jarek Potiuk  wrote:
> > > >
> > > > +1 (binding) - checked my changes, signatures, licences, checksums,
> > > > verified sources are the same in packages as in tags.
> > > >
> > > > On Mon, Sep 11, 2023 at 5:35 AM Phani Kumar
> > > >  wrote:
> > > >
> > > >> +1 non binding
> > > >>
> > > >> On Mon, 11 Sept 2023, 01:39 Pankaj Koti,  > > >> .invalid>
> > > >> wrote:
> > > >>
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> 1. Tested my set of changes in PR #34018.
> > > >>> 2. astronomer-providers DAGs ran fine for below list of RCs:
> > > >>>  apache-airflow-providers-amazon==8.7.0rc1
> > > >>>  apache-airflow-providers-apache-hive==6.1.6rc1
> > > >>>  apache-airflow-providers-apache-livy==3.5.4rc1
> > > >>>  apache-airflow-providers-cncf-kubernetes==7.5.1rc1
> > > >>>  apache-airflow-providers-databricks==4.5.0rc1
> > > >>>  apache-airflow-providers-dbt-cloud==3.3.0rc1
> > > >>>  apache-airflow-providers-google==10.8.0rc1
> > > >>>  apache-airflow-providers-http==4.5.2rc1
> > > >>>  apache-airflow-providers-microsoft-azure==7.0.0rc1
> > > >>>  apache-airflow-providers-sftp==4.6.1rc1
> > > >>>  apache-airflow-providers-snowflake==5.0.1rc1
> > > >>> 3. There is a discussion on PR
> > > >>> https://github.com/apache/airflow/pull/34257
> > > >>>WRT to Google provider RC's dependency on common-sql.
> > > >>>And I am okay with the release manager's decision on it.
> > > >>>
> > > >>>
> > > >>> Regards,
> > > >>>
> > > >>>
> > > >>>
> > > >>> Pankaj Koti
> > > >>>
> > > >>> *Senior Software Engineer, *OSS Engineering Team.
> > > >>> Location: Pune, India
> > > >>>
> > > >>> Timezone: Indian Standard Time (IST)
> > > >>>
> > > >>> Email: pankaj.k...@astronomer.io
> > > >>>
> > > >>> Mobile: +91 9730079985
> > > >>>
> > > >>>
> > > >>> On Sun, Sep 10, 2023 at 8:02 PM Pankaj Koti <
> > pankaj.k...@astronomer.io
> > > >
> > > >>> wrote:
> > > >>>
> > >  Hi,
> > > 
> > >  Tested my change #34018 <
> > https://github.com/apache/airflow/pull/34018
> > > >
> > > >>> in
> > >  the Google RC 10.8.0rc1. It works fine,
> > >  but it also has a dependency on common-sql provider 1.7.2.rc1 for
> > >  the change in the same PR. If the common-sql provider is not
> updated
> > >  then it fails. How do we handle cross-provider dependency bumps
> > >  during releases? Does it get handled automatically or do we need a
> > >  manual minimum version dependency bump here in Google RC to
> contain
> > >  common-sql>=1.7.2?
> > > 
> > >  Regards,
> > > 
> > > 
> > > 
> > >  Pankaj Koti
> > > 
> > >  *Senior Software Engineer, *OSS Engineering Team.
> > >  Location: Pune, India
> > > 
> > >  Timezone: Indian Standard Time (IST)
> > > 
> > >  Email: pankaj.k...@astronomer.io
> > > 
> > >  Mobile: +91 9730079985
> > > 
> > > 
> > >  On Sun, Sep 10, 2023 at 9:51 AM Amogh Desai <
> > amoghdesai@gmail.com
> > > >
> > >  wrote:
> > > 
> > > > I didn't have many changes this time but I tested out by running
> a
> > > few
> > > > dags, mostly on cncf provider and they work as expected.
> > > >
> > > > +1 non binding
> > > >
> > > > Thanks,
> > > > Amogh Desai
> > > >
> > > > On Sun, Sep 10, 2023, 03:41 Hussein Awala 
> > wrote:
> > > >

Re: Lazy Consensus - Removing the Experimental tag for Pluggy

2023-09-09 Thread Ryan Hatter
+1 (non-binding)

I've seen this used as a workaround for implementing a cluster policy

when
(for whatever reason) modifying airflow_local_settings.py is not possible.

On Fri, Sep 8, 2023 at 8:18 PM  wrote:

>
> Oh! Yes I agree
>
> > and has been since it’s release in 2.6
>
> Misunderstood. Thought it was only released a few months ago. Time flies!
>
> +1 (non-binding)
>
> > On Sep 8, 2023, at 3:44 PM, Hussein Awala  wrote:
> >
> > 
> >>
> >> What’s the motivation to remove this now?
> >
> > The feature was introduced in Airflow 2.3.0 (1 year and five months ago);
> > IMHO, this period is more than sufficient to apply all possible changes
> to
> > the design or decide to remove the feature.
> >
> >> On Fri, Sep 8, 2023 at 9:30 PM  wrote:
> >>
> >> What’s the motivation to remove this now?
> >>
>  On Sep 8, 2023, at 2:55 AM, Cody Rich  wrote:
> >>>
> >>> Hi Everyone,
> >>>
> >>> I'm calling a lazy consensus for removing the experimental tag for the
> >>> pluggy interface (pr #34174 <
> >> https://github.com/apache/airflow/pull/34174>).
> >>> It's currently denoted as experimental and has been since its release
> in
> >>> 2.6.  Since its release it has been pretty stable and after doing a
> quick
> >>> search of the Issues in Github and in the slack channel, I didn't see
> any
> >>> problems reported.
> >>>
> >>> This lazy consensus will conclude Friday, September 15th at 12:00pm
> >> (EDT).
> >>>
> >>> (This is my 1st lazy consensus, so please let me know if I'm missing
> >>> anything!)
> >>>
> >>> Thanks,
> >>> Cody
> >>> --
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [DISCUSS] move from semver to a more "rolling" release cycle for core

2023-09-01 Thread Ryan Hatter
What about more frequently using news fragments + using configuration
settings as a way to introduce breaking changes that users can revert?
Disabling
"trigger dag with config" without Params by default

is more or less what I'm referring to. While this was a breaking change,
users were given the option to "roll back".

On Thu, Aug 31, 2023 at 12:14 AM Amogh Desai 
wrote:

> Very good discussions going on here.
>
> Semver has been a point of concern for us too in our internal product.
> Some ideas emerging out of this could help me there. Thanks, Jarek and
> Niko.
>
> There are two points I'd like to stress on to say why semver is that
> important:
>
> - Compatibility: Without versioning that indicates compatibility,
> users might hesitate to upgrade due to concerns about potential
> breaking changes. This could lead to fragmentation, where different
> users are using different versions of the software, making support and
> maintenance more challenging.
>
> - Release Communication: Semantic versioning helps communicate the
> impact of a release to users, maintainers, and contributors. Without
> clear versioning, understanding the significance of a release becomes
> more difficult.
>
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Thu, Aug 31, 2023 at 3:56 AM Jarek Potiuk  wrote:
> >
> > > Now, one elephant in the room - the 5 year security patches thing Jarek
> > brought up. I freely admit I haven't tracked this at all, so please
> correct
> > me if I'm wrong. If that ends up panning out though, I think we will have
> > to reassess our strategy with providers too.
> >
> > Just to answer the last point. Yes. I believe so. I was never a big fan
> of
> > introducing breaking changes in providers to make maintainers' lives a
> bit
> > easier. While I was a big fan of doing it when we had a very good reason.
> >
> > I think we have too many breaking changes in providers now. And we do it
> > because we "can" - we mostly only consider maintainer's lives easier
> > currently. But I think when those regulations are in place we will have
> to
> > make a much more deliberate judgment on when we make a major release
> > in providers and take on a deliberate decision "is it worth doing it,
> > knowing that we will have to deliver patches for previous major
> versions".
> > This will be something that the regulations might make us think about
> when
> > making a decision "should we break?". And when we do - we should be
> > prepared to cherry-pick security fixes to those old versions. We
> currently
> > can have a process for it - and it is off-loaded mainly to stakeholders.
> >
> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#mixed-governance-model
> > - where we mainly take care about releasing cherry-picked code.
> >
> > I believe the overwhelming majority of those "breaking" releases that we
> > really needed, were in providers where there is an active stakeholder
> > already cooperating with us. I have - honestly quite an expectation that
> > this will stay like that. In the proposed regulations, the stakeholders
> are
> > (much more than the OSS foundations) responsible for security of the
> > software they provide to their users and they will have incentive to fix
> > and provide fixes also for past releases of those integration. And I
> think
> > we can work out a collaborative model on that - very close to the "mixed
> > governance" we already have. And in other cases where we have no active
> > stakeholders, we might simply refrain from making breaking changes if not
> > absolutely needed.
> >
> > That would be my approach to the elephant.
> >
> > J,
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [VOTE] Drop MsSQL as supported backend

2023-08-30 Thread Ryan Hatter
+1 non-binding

On Mon, Aug 28, 2023 at 3:29 PM Aritra Basu 
wrote:

> +1 (non-binding)
> Based on reading the previous mails, looks like a good idea to drop along
> with the migration support
>
> --
> Regards,
> Aritra Basu
>
> On Mon, Aug 28, 2023, 11:33 PM Oliveira, Niko  >
> wrote:
>
> > +1 (binding)
> >
> > 
> > From: Jed Cunningham 
> > Sent: Monday, August 28, 2023 9:32:43 AM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [VOTE] Drop MsSQL as supported backend
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > +1 binding
> >
>


Re: [DISCUSS] AIP-1 and Airflow multi-tenancy

2021-04-14 Thread Ryan Hatter
I’d also like to be added please :)

> On Apr 13, 2021, at 21:27, Xinbin Huang  wrote:
> 
> 
> Hi Daniel & Ian, 
> 
> I am also interested in the idea of a serialization representation that can 
> be executed by workers directly. Can you also add me to the call?
> 
> Thanks
> Bin
> 
>> On Tue, Apr 13, 2021 at 2:49 PM Ian Buss  wrote:
>> Daniel,
>> 
>> Thanks for your warm welcome and quick response and the advice on providers! 
>> Will certainly check out the examples you sent.
>> 
>> 1. An "airflow register" command definitely sounds promising, would love to 
>> collaborate on an AIP there so let's set something up.
>> 2. We use KubernetesExecutor exclusively as well. We've noticed significant 
>> additional load on the metadata DB as we scale up task pods so I've also 
>> thought about an API-based approach. Such an API could also open up the 
>> possibility of per-task security tokens which are injected by the scheduler, 
>> which should improve the security of such a system. Food for thought at 
>> least. I will start putting some of these thoughts down on paper in a 
>> sharable format.
>> 
>> Ian
>> 
>>> On Tue, Apr 13, 2021 at 7:46 PM Daniel Imberman  
>>> wrote:
>>> Hi Ian,
>>> 
>>> Firstly, welcome to the Airflow community :). I'm glad to hear you've had a 
>>> positive experience so far. It's great to hear that you want to contribute 
>>> back, and I think that multi-tenancy/DAG isolation is a pretty fantastic 
>>> project for the community as a whole (a lot of things are are things we 
>>> want but are limited by hours in a day).
>>> 
>>> 1. I've personally been kicking around some ideas lately about an "airflow 
>>> register" command that would write the DAG into the metadata DB in a way 
>>> that could be "gettable" by the workers via the API. This work is very 
>>> early. I'd love to get some help on it. Perhaps we can set up a zoom chat 
>>> to discuss drafting an AIP?
>>> 
>>> 2. Limiting worker access to the DB is not only good security practice; it 
>>> also opens up the door to a lot of valuable features. This feature would be 
>>> especially close to my heart as it would make the KubernetesExecutor 
>>> significantly more efficient. It should be possible to set up a system 
>>> where the workers only ever speak to an API server and never need to touch 
>>> the DB.
>>> 
>>> 3. This is not something I personally have insight into, but I think it 
>>> sounds like a good idea. 
>>> 
>>> Finally, addressing your question about a Cloudera provider. If anything, 
>>> it would probably give the provider _more_ legitimacy if you hosted it 
>>> under the Cloudera GitHub org (we very purposely created the provider 
>>> packages with this workflow in mind). There are multiple places where we 
>>> can work to surface this provider so it is easy to find and use.
>>> 
>>> Astronomer has a pretty good sample provider here. One example of it 
>>> running in the wild is the Great Expectations provider here. I'd also be 
>>> glad to get you in contact with people who have built providers in the past 
>>> to help you with that process.
>>> 
>>> Looking forward to seeing some of these things come to fruition!
>>> 
>>> Daniel
>>> 
>>> On Tue, Apr 13, 2021 at 9:43 AM, Ian Buss  wrote:
>>> Hi all,
>>> 
>>> First a quick introduction: I'm an engineer with Cloudera working on our 
>>> Data Engineering product (CDE). Airflow is working great for us so far. 
>>> We've been looking into how we can enhance the multi-tenancy story of 
>>> Apache Airflow as we currently deploy it. We have the following areas which 
>>> we'd like (with community consensus) to work on and contribute back to 
>>> Apache Airflow to enhance the isolation between tenants in a single Airflow 
>>> deployment.
>>> 
>>> 1. Isolating code execution and parsing of DAG files. At the moment, DAG 
>>> files are parsed in a few locations in Airflow, including the scheduler and 
>>> in tasks. There is already the concept of DAG serialization (and we're 
>>> using that for the web component) but we'd be interested to see if we can 
>>> sandbox the execution of arbitrary user code to a locked down 
>>> process/container without full access to the metadata DB and connection 
>>> secrets etc. The idea would be to parse and serialize the DAG in this 
>>> isolated container and pass back a serialized representation for 
>>> persistence in the DB. Has anyone explored this idea?
>>> 
>>> 2. Limiting task access to the metadata DB. It would be great if we could 
>>> remove the requirement for tasks to have full access to the metadata DB and 
>>> to report task status in a different (but still scalable) way. We'd need to 
>>> tackle access or injection of connection, variable and xcom data as well 
>>> for each task naturally.
>>> 
>>> 3. Finer-grained access controls on connection secrets. Right now, although 
>>> there are nice at-rest encryption options with Fernet or Vault, IIUC any 
>>> DAG can access any connection (and thus any secret). Since the "run a

Re: [DISCUSS] Add Breeze Support for CeleryExecutor and KubernetesExecutor

2021-03-27 Thread Ryan Hatter
This seems to work, with a tiny caveat: `airflow celery worker` is the
command to start the celery worker instead of `airflow worker`:

   1. Launch breeze with `--integration rabbitq` or `--integration redis`
   2.  Set the executor to celery with `export
   AIRFLOW__CORE__EXECUTOR=CeleryExecutor`
   3. Start a celery worker with `airflow celery worker -D`

Would I be crazy to ask if Breeze should be rewritten in Python? Arguments
in support, I think, might be...

   1. Easier modularization/extensibility
   2. I'd guess that more people are comfortable in Python than Bash
   3. At least one of the macOS issues could be resolved

Thoughts?


On Mon, Mar 22, 2021 at 5:51 PM Jarek Potiuk  wrote:

> Actually it is not that strightforward, but possibly we can make it works
> much more easily
>
> In order to make Celery Executor works you need to do a bit more (but
> should be easy to add as an option to Breeze):
>
> * you need to start rabitmq or redis as integration (`--integration
> rabbitq --integration redis')
> * you need to start worker(s) (`airflow worker` in the background)
> * you might want to start flower optionally (the celery monitoring tool)
>
> So maybe we could add extra switch to start-airflow command
> (--use-celery-executor) that could set those integrations and start
> worker/flower additionally to running webserver/scheduler now in tmux ?
>
> WDYT? Maybe Ryan you can check if my recipe works ? I could add it then as
> an option.
>
> BTW. We already have a number of CeleryExecutor tests that use the
> integrations, so Breeze has all what's needed:
>
>
> https://github.com/apache/airflow/blob/master/tests/executors/test_celery_executor.py#L109
>
>  J.
>
> On Mon, Mar 22, 2021 at 2:24 PM Ryan Hatter  wrote:
>
>> Hmm, maybe I was just getting twisted around with docker then. I’ll have
>> a look at what you shared.
>>
>> Thanks Bin :)
>>
>> On Mar 22, 2021, at 01:52, Xinbin Huang  wrote:
>>
>> 
>> Hi Ryan,
>>
>> I believe breeze already provides tools for you to do that:
>>
>> 1.Would it make sense to allow developers to choose the executor for
>> their Breeze environment?
>>
>> You can set up environment variables and any other custom setup you want
>> in the file: */files/airflow-breeze-config/variables.env. *To set up
>> CeleryExecutor, you just need to put `export
>> AIRFLOW__CORE__EXECUTOR=CeleryExecutor` in the file.
>>
>> 2. Developing using Docker
>> <https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html> 
>> would
>> require an update to the image each time you want to make a change to the
>> codebase
>>
>> Breeze automatically mounts the local source to the container unless you
>> explicitly skip it with the flag *--skip-mounting-local-sources. *You
>> can find more details here
>> https://github.com/apache/airflow/blob/master/BREEZE.rst#mounting-local-sources-to-breeze
>>
>> Best
>> Bin
>>
>>
>> On Sun, Mar 21, 2021 at 5:32 PM Ryan Hatter 
>> wrote:
>>
>>> I recently had some trouble trying to fix a bug in the CeleryExecutor
>>> <https://github.com/apache/airflow/pull/14883>. The code change was
>>> small, but it was really difficult to set up a development environment
>>> using the CeleryExecutor. I ultimately had to muck around with the test
>>> case that covers this situation, default_airflow.cfg, and
>>> default_celery.py. Developing using Docker
>>> <https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html>
>>> would require an update to the image each time you want to make a change to
>>> the codebase (or maybe `exec`ing into the relevant container?) which is a
>>> pain.
>>>
>>> This led me to two questions:
>>>
>>>1. Would it make sense to allow developers to choose the executor
>>>for their Breeze environment?
>>>2. If not, how do folks test out changes they make to the
>>>CeleryExecutor or KubernetesExecutor?
>>>
>>> Thanks!
>>> Ryan
>>>
>>>
>>>
>
> --
> +48 660 796 129
>


Re: [DISCUSS] Add Breeze Support for CeleryExecutor and KubernetesExecutor

2021-03-22 Thread Ryan Hatter
Yeah I'll give that a shot.

Do you think something with KubernetesExecutor with something like MiniKube
would also make sense?

On Mon, Mar 22, 2021 at 5:51 PM Jarek Potiuk  wrote:

> Actually it is not that strightforward, but possibly we can make it works
> much more easily
>
> In order to make Celery Executor works you need to do a bit more (but
> should be easy to add as an option to Breeze):
>
> * you need to start rabitmq or redis as integration (`--integration
> rabbitq --integration redis')
> * you need to start worker(s) (`airflow worker` in the background)
> * you might want to start flower optionally (the celery monitoring tool)
>
> So maybe we could add extra switch to start-airflow command
> (--use-celery-executor) that could set those integrations and start
> worker/flower additionally to running webserver/scheduler now in tmux ?
>
> WDYT? Maybe Ryan you can check if my recipe works ? I could add it then as
> an option.
>
> BTW. We already have a number of CeleryExecutor tests that use the
> integrations, so Breeze has all what's needed:
>
>
> https://github.com/apache/airflow/blob/master/tests/executors/test_celery_executor.py#L109
>
>  J.
>
> On Mon, Mar 22, 2021 at 2:24 PM Ryan Hatter  wrote:
>
>> Hmm, maybe I was just getting twisted around with docker then. I’ll have
>> a look at what you shared.
>>
>> Thanks Bin :)
>>
>> On Mar 22, 2021, at 01:52, Xinbin Huang  wrote:
>>
>> 
>> Hi Ryan,
>>
>> I believe breeze already provides tools for you to do that:
>>
>> 1.Would it make sense to allow developers to choose the executor for
>> their Breeze environment?
>>
>> You can set up environment variables and any other custom setup you want
>> in the file: */files/airflow-breeze-config/variables.env. *To set up
>> CeleryExecutor, you just need to put `export
>> AIRFLOW__CORE__EXECUTOR=CeleryExecutor` in the file.
>>
>> 2. Developing using Docker
>> <https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html> 
>> would
>> require an update to the image each time you want to make a change to the
>> codebase
>>
>> Breeze automatically mounts the local source to the container unless you
>> explicitly skip it with the flag *--skip-mounting-local-sources. *You
>> can find more details here
>> https://github.com/apache/airflow/blob/master/BREEZE.rst#mounting-local-sources-to-breeze
>>
>> Best
>> Bin
>>
>>
>> On Sun, Mar 21, 2021 at 5:32 PM Ryan Hatter 
>> wrote:
>>
>>> I recently had some trouble trying to fix a bug in the CeleryExecutor
>>> <https://github.com/apache/airflow/pull/14883>. The code change was
>>> small, but it was really difficult to set up a development environment
>>> using the CeleryExecutor. I ultimately had to muck around with the test
>>> case that covers this situation, default_airflow.cfg, and
>>> default_celery.py. Developing using Docker
>>> <https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html>
>>> would require an update to the image each time you want to make a change to
>>> the codebase (or maybe `exec`ing into the relevant container?) which is a
>>> pain.
>>>
>>> This led me to two questions:
>>>
>>>1. Would it make sense to allow developers to choose the executor
>>>for their Breeze environment?
>>>2. If not, how do folks test out changes they make to the
>>>CeleryExecutor or KubernetesExecutor?
>>>
>>> Thanks!
>>> Ryan
>>>
>>>
>>>
>
> --
> +48 660 796 129
>


Re: [DISCUSS] Add Breeze Support for CeleryExecutor and KubernetesExecutor

2021-03-22 Thread Ryan Hatter
Hmm, maybe I was just getting twisted around with docker then. I’ll have a look 
at what you shared. 

Thanks Bin :)

> On Mar 22, 2021, at 01:52, Xinbin Huang  wrote:
> 
> 
> Hi Ryan,
> 
> I believe breeze already provides tools for you to do that:
> 
> 1.Would it make sense to allow developers to choose the executor for their 
> Breeze environment?
> 
> You can set up environment variables and any other custom setup you want in 
> the file: /files/airflow-breeze-config/variables.env. To set up 
> CeleryExecutor, you just need to put `export 
> AIRFLOW__CORE__EXECUTOR=CeleryExecutor` in the file.
> 
> 2. Developing using Docker would require an update to the image each time you 
> want to make a change to the codebase
> 
> Breeze automatically mounts the local source to the container unless you 
> explicitly skip it with the flag --skip-mounting-local-sources. You can find 
> more details here 
> https://github.com/apache/airflow/blob/master/BREEZE.rst#mounting-local-sources-to-breeze
> 
> Best
> Bin
> 
> 
>> On Sun, Mar 21, 2021 at 5:32 PM Ryan Hatter  wrote:
>> I recently had some trouble trying to fix a bug in the CeleryExecutor. The 
>> code change was small, but it was really difficult to set up a development 
>> environment using the CeleryExecutor. I ultimately had to muck around with 
>> the test case that covers this situation, default_airflow.cfg, and 
>> default_celery.py. Developing using Docker would require an update to the 
>> image each time you want to make a change to the codebase (or maybe 
>> `exec`ing into the relevant container?) which is a pain.
>> 
>> This led me to two questions: 
>> Would it make sense to allow developers to choose the executor for their 
>> Breeze environment?
>> If not, how do folks test out changes they make to the CeleryExecutor or 
>> KubernetesExecutor?
>> Thanks!
>> Ryan
>> 
>> 


[DISCUSS] Add Breeze Support for CeleryExecutor and KubernetesExecutor

2021-03-21 Thread Ryan Hatter
I recently had some trouble trying to fix a bug in the CeleryExecutor
. The code change was small,
but it was really difficult to set up a development environment using the
CeleryExecutor. I ultimately had to muck around with the test case that
covers this situation, default_airflow.cfg, and default_celery.py.
Developing using Docker

would require an update to the image each time you want to make a change to
the codebase (or maybe `exec`ing into the relevant container?) which is a
pain.

This led me to two questions:

   1. Would it make sense to allow developers to choose the executor for
   their Breeze environment?
   2. If not, how do folks test out changes they make to the CeleryExecutor
   or KubernetesExecutor?

Thanks!
Ryan


Re: dbt Provider

2021-03-05 Thread Ryan Hatter
Hey Pete! 

That blog is actually what piqued my interest. It seems like most of the heavy 
lifting is done. I’m happy to help if I can.

I’ll reach out on slack :)

> On Mar 5, 2021, at 10:26, Peter DeJoy  wrote:
> 
> 
> Hey Ryan,
> 
> Excited that you’re interested in this- as it turns out, we @ Astronomer have 
> done some initial work with the dbt team to build an official provider 
> package for dbt and dbt Cloud. We wrote this blog series a couple of months 
> back, but we’re really hoping to evolve the integration between Airflow and 
> dbt so that running dbt models from your Airflow tasks is supported 
> first-party by a provider and its subsequent modules.
> 
> More on that soon, but happy to chat with you and anyone else in the 
> community about the user patterns you’d like to have supported. Feel free to 
> get in touch.
> 
> Pete
>> On Mar 4, 2021, 9:30 AM -0500, Ryan Hatter , wrote:
>> Would it be appropriate for me to reach out to one of the airflow-dbt 
>> maintainers and see if they’d be interested in managing a provider?
>> 
>>> On Mar 3, 2021, at 13:23, Ash Berlin-Taylor  wrote:
>>> 
>>> I honestly think no: not because it's not useful, but because dbt have the 
>>> time and ability to maintain their own provider package, much like Great 
>>> Expectations do:
>>> 
>>> https://greatexpectations.io/blog/airflow-operator/
>>> 
>>> I'd much rather we work with dbt to do what ever is needed to make it a 
>>> full provider than pull it in tree when it already exists as a third party 
>>> package.
>>> 
>>> -ash
>>> 
>>>> On 3 March 2021 18:04:01 GMT, Ryan Hatter  wrote:
>>>> Hey all,
>>>> 
>>>> dbt seems to continue to gain momentum. There's already an airflow-dbt 
>>>> project that is essentially a provider package. Would it make sense to 
>>>> fold the dbt_hook and dbt_operator into a provider package in the official 
>>>> airflow repo?


Re: dbt Provider

2021-03-04 Thread Ryan Hatter
Would it be appropriate for me to reach out to one of the airflow-dbt 
maintainers and see if they’d be interested in managing a provider?

> On Mar 3, 2021, at 13:23, Ash Berlin-Taylor  wrote:
> 
> I honestly think no: not because it's not useful, but because dbt have the 
> time and ability to maintain their own provider package, much like Great 
> Expectations do: 
> 
> https://greatexpectations.io/blog/airflow-operator/
> 
> I'd much rather we work with dbt to do what ever is needed to make it a full 
> provider than pull it in tree when it already exists as a third party package.
> 
> -ash
> 
>> On 3 March 2021 18:04:01 GMT, Ryan Hatter  wrote:
>> Hey all,
>> 
>> dbt seems to continue to gain momentum. There's already an airflow-dbt 
>> project that is essentially a provider package. Would it make sense to fold 
>> the dbt_hook and dbt_operator into a provider package in the official 
>> airflow repo?


dbt Provider

2021-03-03 Thread Ryan Hatter
Hey all,

dbt  seems to continue to gain
momentum. There's already an airflow-dbt project
 that is essentially a provider
package. Would it make sense to fold the dbt_hook

and dbt_operator

into a provider package in the official airflow repo?


Re: [VOTE] Airflow Providers - release candidates from 2021-02-27

2021-03-02 Thread Ryan Hatter
There were some changes to the operator after my PR was merged: 
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/transfers/gdrive_to_gcs.py

Pak Andrey (Scuall1992 on GitHub) might be able to confirm the operator is 
functional. 

> On Mar 2, 2021, at 13:16, Jarek Potiuk  wrote:
> 
> Hello everyone - just a reminder that we have voting (hopefully) finishing 
> tomorrow.
> 
> I'd love to get some votes for that.
> 
> Just to clarify what the PMC votes mean, because I believe there were some 
> question raised about the release process which we are going to discuss it 
> tomorrow at the dev call but let me just express my interpretation of 
> https://infra.apache.org/release-publishing.html
> 
> PMC member vote (as I understand it) does not mean that this PMC member 
> tested the release functionality (neither Release Manager).
> This merely means that the PMC member agrees that the software was released 
> according to the requirements and process described in 
> https://infra.apache.org/release-publishing.html and that the signatures, 
> hash-sums and software packages are as expected by the process. 
> This is how I interpret this part of the release process "Release managers do 
> the mechanical work; but the PMC in general, and the PMC chair in particular 
> (as an officer of the Foundation), are responsible for compliance with ASF 
> requirements." 
> 
> My understanding is that it is not feasible (neither for Airflow nor 
> Providers) that the PMC members (nor release manager) tests the software and 
> all features/bugfixes. We've never done that and I believe we will never do. 
> We are reaching out to the community to test and we make a best effort to 
> test whatever we release automatically (unit tests, integration tests, 
> testing if providers are installable/importable with Airflow 2.0 and latest 
> source code  of Airflow). And we hardly can do more than that.
> 
> Happy to discuss it tomorrow, but in the meantime If some of the PMCs could 
> do the review of the process and check the compliance, to be ready to cast 
> your votes - I'd love that.
> 
> J.
> 
> On Tue, Mar 2, 2021 at 8:44 PM Jarek Potiuk  wrote:
>> Hey Ryan,
>> 
>> There is no **must** in re-testing it. Providing that you tested it before 
>> with real GSuite account is for me enough of a confirmation ;).
>> 
>> J.
>> 
>> On Sun, Feb 28, 2021 at 10:00 PM Abdur-Rahmaan Janhangeer 
>>  wrote:
>>> Salutes for having a GSuite account just for the functionality 👍👍👍
>>> 
>>> On Mon, 1 Mar 2021, 00:05 Ryan Hatter,  wrote:
>>>> I canceled my GSuite account when my PR for the gdrive to gcs operator was 
>>>> approved & merged. Could anyone maybe help me ensure correct functionality?
>>>> 
>>>> 
>>>>> On Feb 27, 2021, at 08:48, Jarek Potiuk  wrote:
>>>>> 
>>>>> I created issue, where we will track the status of tests for the 
>>>>> providers (again - it is experiment - but I'd really love to get feedback 
>>>>> on the new providers from those who contributed): 
>>>>> https://github.com/apache/airflow/issues/14511
>>>>> 
>>>>> On Sat, Feb 27, 2021 at 4:28 PM Jarek Potiuk  wrote:
>>>>>> 
>>>>>> Hey all,
>>>>>> 
>>>>>> I have just cut the new wave Airflow Providers packages. This email is 
>>>>>> calling a vote on the release,
>>>>>> which will last for 72 hours +  day for the weekend - which means that 
>>>>>> it will end on Wed 3 Mar 15:59:34 CET 2021.
>>>>>> 
>>>>>> Consider this my (binding) +1.
>>>>>> 
>>>>>> KIND REQUEST
>>>>>> 
>>>>>> There was a recent discussion about test quality of the providers and I 
>>>>>> would like to try to address it, still keeping the batch release process 
>>>>>> every 3 weeks.
>>>>>> 
>>>>>> We need a bit of help from the community. I have a kind request to the 
>>>>>> authors of fixes and new features. I group the providers into those that 
>>>>>> likely need more testing, and those that do not. I also added names of 
>>>>>> those who submitted the changes and are most likely to be able to verify 
>>>>>> if the RC packages are solving the problems/adding features. 
>>>>>> 
>>>>>> This is a bit of experiment (apologies for calling out)  - but if we 
>>>>>

Re: New Committers: James Timmins, Elad Kalif & Daniel Standish

2021-03-01 Thread Ryan Hatter
Awesome! Congrats all!

> On Mar 1, 2021, at 11:20, Tomasz Urbaszek  wrote:
> 
> 
> Congrats Elad, Daniel and James! Well deserved indeed! 
> 
> T.
> 
>> On Mon, 1 Mar 2021 at 19:12, Jarek Potiuk  wrote:
>> Woohoo!
>> 
>>> On Mon, Mar 1, 2021 at 6:04 PM Sumit Maheshwari  
>>> wrote:
>>> Congratulations to all!!
>>> 
 On Mon, Mar 1, 2021 at 9:26 PM Felix Uellendall  
 wrote:
 Congrats! Well deserved! :)
 
 
 
 Sent from ProtonMail Mobile
 
 
> On Mon, Mar 1, 2021 at 16:42, Vikram Koka  
> wrote:
> Congratulations James, Elad, and Daniel, great work! 
> 
> 
>> On Mon, Mar 1, 2021 at 6:57 AM Ry Walker  wrote:
>> Congrats to all the new committers - appreciate the effort!
>> 
>>> On Mon, Mar 1, 2021 at 9:10 AM Ryan Hamilton 
>>>  wrote:
>>> Congrats James, Elad, and Daniel!
>>> 
 On Mon, Mar 1, 2021 at 9:06 AM Kaxil Naik  wrote:
 Hello Airflow Community,
 
 The Project Management Committee (PMC) for Apache Airflow
 has invited James Timmins, Elad Kalif & Daniel Standish to become a 
 committer and we are pleased
 to announce that they have accepted.
 
 Congratulations James, Elad and Daniel. Welcome aboard 🛳.
 
 Regards,
 Kaxil
 on behalf of Airflow PMC
 
 
>> 
>> 
>> -- 
>> +48 660 796 129


Re: [VOTE] Airflow Providers - release candidates from 2021-02-27

2021-02-28 Thread Ryan Hatter
I canceled my GSuite account when my PR for the gdrive to gcs operator was 
approved & merged. Could anyone maybe help me ensure correct functionality?


> On Feb 27, 2021, at 08:48, Jarek Potiuk  wrote:
> 
> 
> I created issue, where we will track the status of tests for the providers 
> (again - it is experiment - but I'd really love to get feedback on the new 
> providers from those who contributed): 
> https://github.com/apache/airflow/issues/14511
> 
>> On Sat, Feb 27, 2021 at 4:28 PM Jarek Potiuk  wrote:
>> 
>> Hey all,
>> 
>> I have just cut the new wave Airflow Providers packages. This email is 
>> calling a vote on the release,
>> which will last for 72 hours +  day for the weekend - which means that it 
>> will end on Wed 3 Mar 15:59:34 CET 2021.
>> 
>> Consider this my (binding) +1.
>> 
>> KIND REQUEST
>> 
>> There was a recent discussion about test quality of the providers and I 
>> would like to try to address it, still keeping the batch release process 
>> every 3 weeks.
>> 
>> We need a bit of help from the community. I have a kind request to the 
>> authors of fixes and new features. I group the providers into those that 
>> likely need more testing, and those that do not. I also added names of those 
>> who submitted the changes and are most likely to be able to verify if the RC 
>> packages are solving the problems/adding features. 
>> 
>> This is a bit of experiment (apologies for calling out)  - but if we find 
>> that it works, we can automate that. I will create a separate Issue in 
>> Github where you will be able to "tick" the boxes for those providers which 
>> they are added to. It would not be a blocker if not tested, but it will be a 
>> great help if you could test the new RC provider and see if it works as 
>> expected according to your changes.
>> 
>> Providers with new features and fixes - likely needs some testing.:
>> 
>> * amazon : Cristòfol Torrens, Ruben Laguna, Arati Nagmal, Ivica Kolenkaš, 
>> JavierLopezT
>> * apache.druid: Xinbin Huang 
>> * apache.spark: Igor Khrol
>> * cncf.kubernetes: jpyen, Ash Berlin-Taylor, Daniel Imberman
>> * google:  Vivek Bhojawala, Xinbin Huang, Pak Andrey, uma66, Ryan Yuan, 
>> morrme, Sam Wheating, YingyingPeng22, Ryan Hatter,Tobiasz Kędzierski
>> * jenkins: Maxim Lisovsky
>> * microsift.azure: flvndh, yyu
>> * mysql: Constantino Schillebeeckx
>> * qubole: Xinbin Huang
>> * salesforce: Jyoti Dhiman
>> * slack: Igor Khrol
>> * tableau: Jyoti Dhiman
>> * telegram: Shekhar Sing, Adil Khashtamov
>> 
>> Providers with doc only changes (no need to test):
>> 
>> * apache-beam
>> * apache-hive
>> * dingding
>> * docker
>> * elasticsearch
>> * exasol
>> * http
>> * neo4j
>> * openfaas
>> * papermill
>> * presto
>> * sendgrid
>> * sftp
>> * snowflake
>> * sqlite
>> * ssh
>> * 
>> 
>> 
>> Airflow Providers are available at:
>> https://dist.apache.org/repos/dist/dev/airflow/providers/
>> 
>> *apache-airflow-providers--*-bin.tar.gz* are the binary
>>  Python "sdist" release - they are also official "sources" for the provider 
>> packages.
>> 
>> *apache_airflow_providers_-*.whl are the binary
>>  Python "wheel" release.
>> 
>> The test procedure for PMC members who would like to test the RC candidates 
>> are described in
>> https://github.com/apache/airflow/blob/master/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-by-pmc-members
>> 
>> and for Contributors:
>> 
>> https://github.com/apache/airflow/blob/master/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-by-contributors
>> 
>> 
>> Public keys are available at:
>> https://dist.apache.org/repos/dist/release/airflow/KEYS
>> 
>> Please vote accordingly:
>> 
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove with the reason
>> 
>> 
>> Only votes from PMC members are binding, but members of the community are
>> encouraged to test the release and vote with "(non-binding)".
>> 
>> Please note that the version number excludes the 'rcX' string.
>> This will allow us to rename the artifact without modifying
>> the artifact checksums when we actually release.
>> 
>> 
>> Each of the packages contains a link to the detailed changelog. The 
>> changelogs are moved to the official airflow documentation:
>> https://github.com/apache/airflow-site/
>> 

Re: Create official Apache Airflow publication on Medium.com

2021-02-20 Thread Ryan Hatter
I’d be interested in helping edit/proofread articles :)

> On Feb 20, 2021, at 11:40, Deng Xiaodong  wrote:
> 
> 
> It's a good idea to me!
> 
> Maybe worthwhile discussing how the Publication will be managed to ensure 
> high quality, like: criteria of stories/articles to include, review/approval 
> process, etc.
> 
> 
> Regards,
> XD
> 
>> On Sat, Feb 20, 2021 at 9:29 AM Jarek Potiuk  wrote:
>> I think this is a great idea. I did not know that "publication" exists, but 
>> I think it perfectly fits the purpose, reading the description.
>> 
>> J.
>> 
>> 
>>> On Sat, Feb 20, 2021 at 7:41 AM Tomasz Urbaszek  wrote:
>>> Hello,
>>> 
>>> Currently our community have a blog on our official website:
>>> http://airflow.apache.org/blog/
>>> 
>>> But I was wondering if it would make sense to create official Apache
>>> Airflow publication on medium.com which is much more often used for
>>> publishing blogspot:
>>> https://help.medium.com/hc/en-us/articles/115004681607-Getting-started-with-a-Medium-publication
>>> 
>>> There are currently many Airflow related posts that are not associated
>>> with any existing publication. By creating an Airflow umbrella we may
>>> achieve few things:
>>> - collect many Airflow posts in one place (one publication)
>>> - increase audience and reach of all those posts
>>> - improve even more the Apache Airflow public image
>>> 
>>> 
>>> What do you think?
>>> 
>>> Cheers,
>>> Tomek
>> 
>> 
>> -- 
>> +48 660 796 129