Some initial learning and application of dask-distributed unit testing is WIP in - https://github.com/apache/airflow/pull/6984
All the dask-distributed pytest fixtures are already available in Airflow (with the dask option installed). The contribution docs are helpful, i.e. - https://docs.dask.org/en/latest/develop.html - https://distributed.dask.org/en/latest/develop.html On Mon, Jan 20, 2020 at 8:57 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Yes definitely, we are actually going to add some content from the Workshop > during the Airflow Summit to Airflow Website and a link to that page would > be added to CONTRIBUTING.md. > > > On Mon, Jan 20, 2020 at 10:18 PM Darren Weber <dweber.consult...@gmail.com > > > wrote: > > > Via the GSOC thread, I found > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/First+time+contributor%27s+workshop > > - agree with a comment on that thread that a wiki page link from > > CONTRIBUTING.md could be useful too > > > > On Mon, Jan 20, 2020 at 8:30 AM Jarek Potiuk <jarek.pot...@polidea.com> > > wrote: > > > > > We are actually planning (pending confirmation) "First time Apache > > Airflow > > > Contributor's" training at PyCon US in April. I think if there is a > good > > > usage of Dask and we got Scientific -oriented users using Airflow with > > Dask > > > - I am all for having a closer cooperation on that topic :). > > > > > > J. > > > > > > > > > On Mon, Jan 20, 2020 at 5:25 PM Darren Weber < > > dweber.consult...@gmail.com> > > > wrote: > > > > > > > Thanks for the ping on https://github.com/dask/dask/issues/5803 > > > > > > > > I'm curious about how dask async features might be low-hanging fruit > > for > > > > Airflow scaling > > > > - https://distributed.dask.org/en/latest/asynchronous.html > > > > - https://github.com/apache/airflow/pull/6984 > > > > > > > > Our company has scientific workflows and it uses dask, usually on > large > > > EC2 > > > > instances or batch jobs. I've been getting familiar with dask from a > > > user > > > > perspective; I don't yet know the internals from a dev-perspective. > I > > > > mostly use dask.delayed to scale threads/processes on a local host, > > with > > > a > > > > simple concurrent.futures API. Dask.distributed can also run a > cluster > > > > with client connections (I previously worked with spark a bit and > dask > > > has > > > > some good documentation on the comparisons between spark and dask). > > > There > > > > are also some options for auto-scaling a dask cluster using k8s - > > > > https://docs.dask.org/en/latest/setup/adaptive.html - so you get an > > > > auto-scaling cluster with a lot of features for scientific computing > > with > > > > the scipy-compatible stack. > > > > > > > > I can't promise to complete anything in a timely manner, despite any > > > > proposals to remove dask executors entirely. I may be in-n-out of > > these > > > > discussions from time-to-time, possibly silent for several weeks at a > > > time > > > > while I'm heads down on my full-time position. So if Airflow 2.0 > > removes > > > > them for whatever reason, I would hope it could be possible to add > them > > > > back in Airflow 2.1 if the work can be done to get it working and the > > > > design patterns make sense and/or there is a larger user community > than > > > > anyone is yet aware of. At present, I don't hear a clear > specification > > > for > > > > having it work or an argument that it doesn't work at all, but I hear > > and > > > > see that unit tests are disabled. It might be possible to identify > in > > > dask > > > > itself how to setup the test environment. It might help to better > > > > understand the niche that dask serves well. > > > > > > > > The online forums and github may suffice, but if it would be possible > > to > > > > find funding to sponsor a joint hack-a-thon at PyCon or something, > that > > > > would be great. As a new contributor to Airflow, I'm still learning > > the > > > > ropes and it would be good to attend an Airflow contributor workshop > > > (maybe > > > > someone could spin one up in the bay-area?). > > > > > > > > Best, > > > > Darren > > > > > > > > > > > > > > > > > > > > On Sun, Jan 19, 2020 at 9:28 AM Jarek Potiuk < > jarek.pot...@polidea.com > > > > > > > wrote: > > > > > > > > > Seems like there is an interest > > > https://github.com/dask/dask/issues/5803 > > > > > :). > > > > > Let's see where it gets us. > > > > > > > > > > J. > > > > > > > > > > On Sat, Jan 18, 2020 at 9:46 PM Jarek Potiuk < > > jarek.pot...@polidea.com > > > > > > > > > wrote: > > > > > > > > > > > Following discussion Dask's gitter, I created an issue in Dask's > > > > github : > > > > > > https://github.com/dask/dask/issues/5803 > > > > > > > > > > > > Let's see if we can get someone from Dask community interested. > > > > > > > > > > > > On Fri, Jan 17, 2020 at 10:00 PM Jarek Potiuk < > > > > jarek.pot...@polidea.com> > > > > > > wrote: > > > > > > > > > > > >> Good idea :) doing that, > > > > > >> > > > > > >> On Fri, Jan 17, 2020 at 9:58 PM Daniel Imberman < > > > > > >> daniel.imber...@gmail.com> wrote: > > > > > >> > > > > > >>> Maybe we can reach out to a company that does Dask as a > service? > > > > > >>> > > > > > >>> via Newton Mail [ > > > > > >>> > > > > > > > > > > > > > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2 > > > > > >>> ] > > > > > >>> On Fri, Jan 17, 2020 at 9:31 AM, Jarek Potiuk < > > > > > jarek.pot...@polidea.com> > > > > > >>> wrote: > > > > > >>> Yeah. I think if we do not find anyone willing to champion it > (no > > > > > matter > > > > > >>> committer or contributor), I would be for dropping it. > > > > > >>> > > > > > >>> J. > > > > > >>> > > > > > >>> On Fri, Jan 17, 2020 at 6:07 PM Daniel Imberman < > > > > > >>> daniel.imber...@gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > >>> > I think we need to ask “who is going to champion this > > executor.” > > > I > > > > > see > > > > > >>> > that it is being used (a bit), but am concerned if no one > with > > > > > >>> knowledge of > > > > > >>> > this executor is willing to maintain it. > > > > > >>> > > > > > > >>> > I’ve personally never used Dask and the DaskExecutor isn’t > > super > > > > high > > > > > >>> on > > > > > >>> > my priority list compared to things like autoscaling, DAG > > > > > >>> serialization, > > > > > >>> > etc. > > > > > >>> > > > > > > >>> > via Newton Mail [ > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2 > > > > > >>> > ] > > > > > >>> > On Fri, Jan 17, 2020 at 6:07 AM, Jarek Potiuk < > > > > > >>> jarek.pot...@polidea.com> > > > > > >>> > wrote: > > > > > >>> > Do we have anyone here who uses Dask Executor and would like > to > > > > test > > > > > >>> it/fix > > > > > >>> > the tests. They are marked now as xfailed (expected to fail) > > and > > > it > > > > > >>> would > > > > > >>> > be great to fix them. > > > > > >>> > > > > > > >>> > J. > > > > > >>> > > > > > > >>> > > > > > > >>> > On Tue, Jan 14, 2020 at 12:18 AM Darren Weber < > > > > > >>> dweber.consult...@gmail.com > > > > > >>> > > > > > > > >>> > wrote: > > > > > >>> > > > > > > >>> > > +1 for keeping it and fixing tests > > > > > >>> > > > > > > > >>> > > PS, I also noticed the skipped tests while looking at an > > option > > > > to > > > > > >>> use > > > > > >>> > the > > > > > >>> > > async client feature; if/when I get time to get back on > that > > > and > > > > > >>> figure > > > > > >>> > out > > > > > >>> > > how the test setup needs to work, I might also discover how > > to > > > > > enable > > > > > >>> > tests > > > > > >>> > > for the non-async executor. No promises, just noting that > I'm > > > > aware > > > > > >>> of it > > > > > >>> > > too. > > > > > >>> > > > > > > > >>> > > On Mon, Jan 13, 2020 at 8:06 AM Jarek Potiuk < > > > > > >>> jarek.pot...@polidea.com> > > > > > >>> > > wrote: > > > > > >>> > > > > > > > >>> > > > For now I marked the skipped tests we had (including > Dask) > > as > > > > > >>> > > > pytest.mark.xfail (means - expected to fail). They will > be > > > > > >>> executed and > > > > > >>> > > > summarized as XFail tests and we will have to deal with > > them > > > at > > > > > >>> some > > > > > >>> > > point. > > > > > >>> > > > > > > > > >>> > > > I think we will have to decide if we want to keep it or > > not, > > > > and > > > > > >>> either > > > > > >>> > > > remove both tests and executor or fix the tests. > > > > > >>> > > > > > > > > >>> > > > J. > > > > > >>> > > > > > > > > >>> > > > On Mon, Jan 13, 2020 at 4:40 PM Shaw, Damian P. < > > > > > >>> > > > damian.sha...@credit-suisse.com> wrote: > > > > > >>> > > > > > > > > >>> > > > > FYI I used Dash instead of Local Executor when first > > > starting > > > > > >>> > Airflow, > > > > > >>> > > it > > > > > >>> > > > > was a great way to make sure the Executor and Scheduler > > > > weren’t > > > > > >>> tied > > > > > >>> > to > > > > > >>> > > > > each other with no difficulty in set-up. But once I > > > actually > > > > > >>> started > > > > > >>> > > > > deploying to multiple boxes I needed queue names pretty > > > > > quickly. > > > > > >>> So > > > > > >>> > not > > > > > >>> > > > > going to say it's needed but for me it was a helpful > > > stepping > > > > > >>> stone. > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > -----Original Message----- > > > > > >>> > > > > From: Ash Berlin-Taylor <a...@apache.org> > > > > > >>> > > > > Sent: Sunday, January 12, 2020 17:38 > > > > > >>> > > > > To: dev@airflow.apache.org > > > > > >>> > > > > Cc: dev@airflow.apache.org > > > > > >>> > > > > Subject: Re: Remove Dask Executor in Airflow 2.0 ? > > > > > >>> > > > > > > > > > >>> > > > > It hasn't been discussed before, but unlike the Mesos > one > > > > this > > > > > >>> one > > > > > >>> > was > > > > > >>> > > > > seen a (tiny) bit of activity in 1.10 so at least one > > > person > > > > is > > > > > >>> using > > > > > >>> > > it > > > > > >>> > > > > https://github.com/apache/airflow/pull/5273 > > > > > >>> > > > > > > > > > >>> > > > > On Jan 12 2020, at 9:05 pm, Jarek Potiuk < > > > > > >>> jarek.pot...@polidea.com> > > > > > >>> > > > wrote: > > > > > >>> > > > > > I am finishing the PR on separating integrations and > > > > > improving > > > > > >>> our > > > > > >>> > CI > > > > > >>> > > > > > footprint ( > https://github.com/apache/airflow/pull/7091 > > ) > > > > but > > > > > >>> during > > > > > >>> > > > > > this change I have found that we have - apparently - > > > > > >>> dysfunctional > > > > > >>> > > > > > DaskExecutor in Airflow 2.0. > > > > > >>> > > > > > > > > > > >>> > > > > > There is a "test_dask_executor.py" for which all > tests > > > are > > > > > >>> skipped. > > > > > >>> > > > > > And they fail when I try to run the tests. I tried to > > > look > > > > > for > > > > > >>> any > > > > > >>> > > > > > reference in devlist archives but I couldn't find > > > anything > > > > > >>> about > > > > > >>> > it. > > > > > >>> > > > > > > > > > > >>> > > > > > Can someone shed some light on this? Should we remove > > > Dask > > > > > >>> executor > > > > > >>> > > > > > completely from Airflow 2.0 ? Or should we fix the > > > > > >>> tests/executor ? > > > > > >>> > > > > > Has it been discussed ? > > > > > >>> > > > > > > > > > > >>> > > > > > J. > > > > > >>> > > > > > > > > > > >>> > > > > > -- > > > > > >>> > > > > > Jarek Potiuk > > > > > >>> > > > > > Polidea <https://www.polidea.com/> | Principal > > Software > > > > > >>> Engineer > > > > > >>> > > > > > > > > > > >>> > > > > > M: +48 660 796 129 <+48660796129> > > > > > >>> > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > =============================================================================== > > > > > >>> > > > > > > > > > >>> > > > > Please access the attached hyperlink for an important > > > > > electronic > > > > > >>> > > > > communications disclaimer: > > > > > >>> > > > > > > > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > =============================================================================== > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > -- > > > > > >>> > > > > > > > > >>> > > > Jarek Potiuk > > > > > >>> > > > Polidea <https://www.polidea.com/> | Principal Software > > > > Engineer > > > > > >>> > > > > > > > > >>> > > > M: +48 660 796 129 <+48660796129> > > > > > >>> > > > [image: Polidea] <https://www.polidea.com/> > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > -- > > > > > >>> > > Darren L. Weber, Ph.D. > > > > > >>> > > http://psdlw.users.sourceforge.net/ > > > > > >>> > > http://psdlw.users.sourceforge.net/wordpress/ > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > >>> > -- > > > > > >>> > > > > > > >>> > Jarek Potiuk > > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software > > Engineer > > > > > >>> > > > > > > >>> > M: +48 660 796 129 <+48660796129> > > > > > >>> > [image: Polidea] <https://www.polidea.com/> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> -- > > > > > >>> > > > > > >>> Jarek Potiuk > > > > > >>> Polidea <https://www.polidea.com/> | Principal Software > Engineer > > > > > >>> > > > > > >>> M: +48 660 796 129 <+48660796129> > > > > > >>> [image: Polidea] <https://www.polidea.com/> > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> > > > > > >> Jarek Potiuk > > > > > >> Polidea <https://www.polidea.com/> | Principal Software > Engineer > > > > > >> > > > > > >> M: +48 660 796 129 <+48660796129> > > > > > >> [image: Polidea] <https://www.polidea.com/> > > > > > >> > > > > > >> > > > > > > > > > > > > -- > > > > > > > > > > > > Jarek Potiuk > > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Jarek Potiuk > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > > > > > > > > -- > > > > Darren L. Weber, Ph.D. > > > > http://psdlw.users.sourceforge.net/ > > > > http://psdlw.users.sourceforge.net/wordpress/ > > > > > > > > > > > > > -- > > > > > > Jarek Potiuk > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > M: +48 660 796 129 <+48660796129> > > > [image: Polidea] <https://www.polidea.com/> > > > > > >