Re: Call with Nielsen team demoing their DAG debugging feature

2024-06-09 Thread Abhishek Bhakat
Hi Jarek,

I would also like to join as well, please.

Thanks,
Avi

On Sat, Jun 8, 2024 at 3:32 PM Buğra Öztürk  wrote:

> Hello Jarek,
>
> Thanks for sharing! It sounds very interesting. I would like to join. Could
> you please forward to me as well?
>
> Thanks!
>
> On Sat, 8 Jun 2024, 17:28 Jed Cunningham, 
> wrote:
>
> > Interesting. Can you forward to me as well Jarek? Thanks!
> >
>


Re: [DISCUSS] AIP-71 Generalizing DAG Loader and Processor for Ephemeral Storage

2024-06-09 Thread Jarek Potiuk
Re: what Jed wrote: I think concern about local git /github is not a big
issue (custom FSSpec implementation can be added as TP mentioned) , but I
fully agree with the second concern - and I think FSSpec is a bad choice to
base this work on.

Versioning of DAGs is way more than versioning files. We should also
version various metadata if task isolation is going to be in-place. Passing
connections, variables, potentially xcom results (or handles to those) from
previous tasks to isolated tasks should result eventually in "bundles" of
execution. I called them "snapshots" in some earlier comments, but the more
I think about the isolation case, the more "bundle" is a better name - as
it will be much more than a single "commit" snapshot.

I think simply FSSpec (and filesystem in general) is a wrong abstraction
for what Airflow needs in the new world of Airflow 3. FSSpec - even with
versioning  - is basically a more fancy way of DAG synchronization we
already have in Airflow 2 and sticking to "file" abstraction is mostly
rooting us in the past that we want to move away from.

I think whatever abstraction we come up with in Airflow 3 for task
isolation and DAG versioning should be opened to multiple cases (even if we
will not implement them in Airflow 3.0 straight away):

1) back-compatibility with the "shared FS" abstraction we have currently in
Airflow 2 - known as "DAG Folder"
2) bundling only needed Python files that are needed to run the task
together with Connections, Variables, XCom it needs
3) bunding **just** Connections, Variables, XComs  without even passing any
Python files (i.e. running any language native tasks with Airflow API to
access those)
4) "virtual" tasks where task are actually mapped into externally executing
workflows (similar to Cosmos case where DBT model DAG tasks are mapped into
Airflow Tasks)  - without running anything at all as "airflow task"

So while 1) and 2) are indeed mostly "FS" backed and FSSpec abstraction
could be somewhat OK (maybe connections and variables could be passed by
some tags/metadata potentially) - it pretty much breaks completely in case
3) and 4) - where we have generally "no fs at all".

J




On Sun, Jun 9, 2024 at 12:34 AM Tzu-ping Chung 
wrote:

> On Git (and other VCS for the matter) specifically, I believe respec only
> supports GitHub because it uses the GitHub API instead of Git. I’m only
> guessing, but using Git for random access would have terrible performance
> and is likely not an option for them.
>
> Airflow does not have the same problems though since we are allowed to
> cache things on-disk and do a batch update on each DAG-parsing iteration. I
> believe it is possible to add custom backends to fsspec, so Airflow should
> be able to extend the feature to cover arbitrary repositories without too
> much work.
>
>
> > On Jun 8, 2024, at 23:25, Jed Cunningham 
> wrote:
> >
> > Sorry for the delayed reply here. I've been chewing on this one a bit
> > though.
> >
> > One concern I have is that I highly value having a provider agnostic
> remote
> > git integration. fsspec, however, has local git or github - no arbitrary
> > remote git support. That means Airflow, in my view, can't just rely on
> > fsspec alone, but will have to also deal with cloning arbitrary repos. Or
> > we expect a "ready to go" cloned repo on local disk, which might be a
> good
> > tradeoff. This was something we'd have to tackle for AIP-63 anyways,
> though.
> >
> > Another concern, as I've been considering Airflow 3, is that I see
> wanting
> > to version "more" than just the DAG folder - possibly a whole venv (or
> > similar). Doing that versioning file-by-file doesn't feel particularly
> > practical to me. Which means that, I think, we could end up using fsspec
> to
> > get zips/tarballs/"assets", which is still very helpful imo. But it does
> > limit the "slickness" of being sorta behind the scenes like a custom
> module
> > loader and resource loader would be.
> >
> > I actually see this as also being coupled to Ash's upcoming “Task
> Execution
> > interface”.
> >
> > I admittedly didn't quite follow the VersionedFS idea you all were
> > discussing. I don't see how it could easily allow Airflows goal of "this
> > version of the dag folder and maybe more" to a backend that is versioned
> by
> > file. Maybe you can eli5 and connect the dots for me?
> >
> > All that said, my goal this next week is to spend some quality time in
> this
> > area so we can all start nailing down options for Airflow 3.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [VOTE] Release Airflow 2.9.2 from 2.9.2rc1

2024-06-09 Thread Elad Kalif
+1 binding ran some of my example dags all looks good

On Sun, Jun 9, 2024 at 1:48 PM Jarek Potiuk  wrote:

> Changing my vote to +1 (binding).
>
> Updated the constraints for 2.9.2rc1 with **just released** providers
> (celery 3.7.1, k8s 8.3.1 , amazon 8.24.0). Installed Airflow 2.9.2rc1 and
> ran it with celery executor - and it works well this time). I also earlier
> yanked celery-provider 3.7.1 to avoid people installing it accidentally.
>
> All looks good from my side - without having to release rc2.
>
> J.
>
> On Sun, Jun 9, 2024 at 7:37 AM Rahul Vats  wrote:
>
> > +1 (non-binding). Verified running our example DAGs. LGTM
> >
> > Regards,
> > Rahul Vats
> > 9953794332
> >
> >
> > On Sun, 9 Jun 2024 at 01:21, Scheffler Jens (XC-AS/EAE-ADA-T)
> >  wrote:
> >
> > > +1 non binding
> > >
> > > Sent from Outlook for iOS
> > > 
> > > From: Phani Kumar 
> > > Sent: Saturday, June 8, 2024 3:45:18 PM
> > > To: dev@airflow.apache.org 
> > > Subject: Re: [VOTE] Release Airflow 2.9.2 from 2.9.2rc1
> > >
> > > +1 non binding
> > >
> > > On Sat, 8 Jun 2024, 15:02 Hussein Awala,  wrote:
> > >
> > > > +1 (binding) checked signatures, checksums, licences and sources.
> > > >
> > > > On Saturday, June 8, 2024, rom sharon 
> wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release Airflow 2.9.2 from 2.9.2rc1

2024-06-09 Thread Jarek Potiuk
Changing my vote to +1 (binding).

Updated the constraints for 2.9.2rc1 with **just released** providers
(celery 3.7.1, k8s 8.3.1 , amazon 8.24.0). Installed Airflow 2.9.2rc1 and
ran it with celery executor - and it works well this time). I also earlier
yanked celery-provider 3.7.1 to avoid people installing it accidentally.

All looks good from my side - without having to release rc2.

J.

On Sun, Jun 9, 2024 at 7:37 AM Rahul Vats  wrote:

> +1 (non-binding). Verified running our example DAGs. LGTM
>
> Regards,
> Rahul Vats
> 9953794332
>
>
> On Sun, 9 Jun 2024 at 01:21, Scheffler Jens (XC-AS/EAE-ADA-T)
>  wrote:
>
> > +1 non binding
> >
> > Sent from Outlook for iOS
> > 
> > From: Phani Kumar 
> > Sent: Saturday, June 8, 2024 3:45:18 PM
> > To: dev@airflow.apache.org 
> > Subject: Re: [VOTE] Release Airflow 2.9.2 from 2.9.2rc1
> >
> > +1 non binding
> >
> > On Sat, 8 Jun 2024, 15:02 Hussein Awala,  wrote:
> >
> > > +1 (binding) checked signatures, checksums, licences and sources.
> > >
> > > On Saturday, June 8, 2024, rom sharon  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > >
> >
>


[ANNOUNCE] Apache Airflow Providers prepared on June 07, 2024 are released

2024-06-09 Thread Elad Kalif
Dear Airflow community,

I'm happy to announce that new versions of Airflow Providers packages
prepared on June 07, 2024 were just released. Full list of PyPI packages
released is added at the end of the message.

The source release, as well as the binary releases, are available here:

https://airflow.apache.org/docs/apache-airflow-providers/installing-from-sources

You can install the providers via PyPI:
https://airflow.apache.org/docs/apache-airflow-providers/installing-from-pypi

The documentation is available at https://airflow.apache.org/docs/ and
linked from the PyPI packages.



Full list of released PyPI packages:

https://pypi.org/project/apache-airflow-providers-amazon/8.24.0/
https://pypi.org/project/apache-airflow-providers-celery/3.7.2/
https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/8.3.1/

Cheers,
Elad Kalif


[RESULT][VOTE] Airflow Providers - release of June 07, 2024

2024-06-09 Thread Elad Kalif
Hello,

Apache Airflow Providers prepared on June 07, 2024 have been accepted.

4 "+1" binding votes received:
- Elad Kalif
- Jarek Potiuk
- Jed Cunningham
- Ephraim Anierobi

2 "+1" non-binding votes received:
- Rom Sharon
- Rahul Vats

Vote thread:
https://lists.apache.org/thread/zlh22nmt2sqkp38551plhc45vs1ndnl7

I'll continue with the release process, and the release announcement will
follow shortly.

Cheers,
Elad Kalif