We are actually planning (pending confirmation) "First time Apache Airflow Contributor's" training at PyCon US in April. I think if there is a good usage of Dask and we got Scientific -oriented users using Airflow with Dask - I am all for having a closer cooperation on that topic :).
J. On Mon, Jan 20, 2020 at 5:25 PM Darren Weber <[email protected]> wrote: > Thanks for the ping on https://github.com/dask/dask/issues/5803 > > I'm curious about how dask async features might be low-hanging fruit for > Airflow scaling > - https://distributed.dask.org/en/latest/asynchronous.html > - https://github.com/apache/airflow/pull/6984 > > Our company has scientific workflows and it uses dask, usually on large EC2 > instances or batch jobs. I've been getting familiar with dask from a user > perspective; I don't yet know the internals from a dev-perspective. I > mostly use dask.delayed to scale threads/processes on a local host, with a > simple concurrent.futures API. Dask.distributed can also run a cluster > with client connections (I previously worked with spark a bit and dask has > some good documentation on the comparisons between spark and dask). There > are also some options for auto-scaling a dask cluster using k8s - > https://docs.dask.org/en/latest/setup/adaptive.html - so you get an > auto-scaling cluster with a lot of features for scientific computing with > the scipy-compatible stack. > > I can't promise to complete anything in a timely manner, despite any > proposals to remove dask executors entirely. I may be in-n-out of these > discussions from time-to-time, possibly silent for several weeks at a time > while I'm heads down on my full-time position. So if Airflow 2.0 removes > them for whatever reason, I would hope it could be possible to add them > back in Airflow 2.1 if the work can be done to get it working and the > design patterns make sense and/or there is a larger user community than > anyone is yet aware of. At present, I don't hear a clear specification for > having it work or an argument that it doesn't work at all, but I hear and > see that unit tests are disabled. It might be possible to identify in dask > itself how to setup the test environment. It might help to better > understand the niche that dask serves well. > > The online forums and github may suffice, but if it would be possible to > find funding to sponsor a joint hack-a-thon at PyCon or something, that > would be great. As a new contributor to Airflow, I'm still learning the > ropes and it would be good to attend an Airflow contributor workshop (maybe > someone could spin one up in the bay-area?). > > Best, > Darren > > > > > On Sun, Jan 19, 2020 at 9:28 AM Jarek Potiuk <[email protected]> > wrote: > > > Seems like there is an interest https://github.com/dask/dask/issues/5803 > > :). > > Let's see where it gets us. > > > > J. > > > > On Sat, Jan 18, 2020 at 9:46 PM Jarek Potiuk <[email protected]> > > wrote: > > > > > Following discussion Dask's gitter, I created an issue in Dask's > github : > > > https://github.com/dask/dask/issues/5803 > > > > > > Let's see if we can get someone from Dask community interested. > > > > > > On Fri, Jan 17, 2020 at 10:00 PM Jarek Potiuk < > [email protected]> > > > wrote: > > > > > >> Good idea :) doing that, > > >> > > >> On Fri, Jan 17, 2020 at 9:58 PM Daniel Imberman < > > >> [email protected]> wrote: > > >> > > >>> Maybe we can reach out to a company that does Dask as a service? > > >>> > > >>> via Newton Mail [ > > >>> > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2 > > >>> ] > > >>> On Fri, Jan 17, 2020 at 9:31 AM, Jarek Potiuk < > > [email protected]> > > >>> wrote: > > >>> Yeah. I think if we do not find anyone willing to champion it (no > > matter > > >>> committer or contributor), I would be for dropping it. > > >>> > > >>> J. > > >>> > > >>> On Fri, Jan 17, 2020 at 6:07 PM Daniel Imberman < > > >>> [email protected]> > > >>> wrote: > > >>> > > >>> > I think we need to ask “who is going to champion this executor.” I > > see > > >>> > that it is being used (a bit), but am concerned if no one with > > >>> knowledge of > > >>> > this executor is willing to maintain it. > > >>> > > > >>> > I’ve personally never used Dask and the DaskExecutor isn’t super > high > > >>> on > > >>> > my priority list compared to things like autoscaling, DAG > > >>> serialization, > > >>> > etc. > > >>> > > > >>> > via Newton Mail [ > > >>> > > > >>> > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2 > > >>> > ] > > >>> > On Fri, Jan 17, 2020 at 6:07 AM, Jarek Potiuk < > > >>> [email protected]> > > >>> > wrote: > > >>> > Do we have anyone here who uses Dask Executor and would like to > test > > >>> it/fix > > >>> > the tests. They are marked now as xfailed (expected to fail) and it > > >>> would > > >>> > be great to fix them. > > >>> > > > >>> > J. > > >>> > > > >>> > > > >>> > On Tue, Jan 14, 2020 at 12:18 AM Darren Weber < > > >>> [email protected] > > >>> > > > > >>> > wrote: > > >>> > > > >>> > > +1 for keeping it and fixing tests > > >>> > > > > >>> > > PS, I also noticed the skipped tests while looking at an option > to > > >>> use > > >>> > the > > >>> > > async client feature; if/when I get time to get back on that and > > >>> figure > > >>> > out > > >>> > > how the test setup needs to work, I might also discover how to > > enable > > >>> > tests > > >>> > > for the non-async executor. No promises, just noting that I'm > aware > > >>> of it > > >>> > > too. > > >>> > > > > >>> > > On Mon, Jan 13, 2020 at 8:06 AM Jarek Potiuk < > > >>> [email protected]> > > >>> > > wrote: > > >>> > > > > >>> > > > For now I marked the skipped tests we had (including Dask) as > > >>> > > > pytest.mark.xfail (means - expected to fail). They will be > > >>> executed and > > >>> > > > summarized as XFail tests and we will have to deal with them at > > >>> some > > >>> > > point. > > >>> > > > > > >>> > > > I think we will have to decide if we want to keep it or not, > and > > >>> either > > >>> > > > remove both tests and executor or fix the tests. > > >>> > > > > > >>> > > > J. > > >>> > > > > > >>> > > > On Mon, Jan 13, 2020 at 4:40 PM Shaw, Damian P. < > > >>> > > > [email protected]> wrote: > > >>> > > > > > >>> > > > > FYI I used Dash instead of Local Executor when first starting > > >>> > Airflow, > > >>> > > it > > >>> > > > > was a great way to make sure the Executor and Scheduler > weren’t > > >>> tied > > >>> > to > > >>> > > > > each other with no difficulty in set-up. But once I actually > > >>> started > > >>> > > > > deploying to multiple boxes I needed queue names pretty > > quickly. > > >>> So > > >>> > not > > >>> > > > > going to say it's needed but for me it was a helpful stepping > > >>> stone. > > >>> > > > > > > >>> > > > > > > >>> > > > > -----Original Message----- > > >>> > > > > From: Ash Berlin-Taylor <[email protected]> > > >>> > > > > Sent: Sunday, January 12, 2020 17:38 > > >>> > > > > To: [email protected] > > >>> > > > > Cc: [email protected] > > >>> > > > > Subject: Re: Remove Dask Executor in Airflow 2.0 ? > > >>> > > > > > > >>> > > > > It hasn't been discussed before, but unlike the Mesos one > this > > >>> one > > >>> > was > > >>> > > > > seen a (tiny) bit of activity in 1.10 so at least one person > is > > >>> using > > >>> > > it > > >>> > > > > https://github.com/apache/airflow/pull/5273 > > >>> > > > > > > >>> > > > > On Jan 12 2020, at 9:05 pm, Jarek Potiuk < > > >>> [email protected]> > > >>> > > > wrote: > > >>> > > > > > I am finishing the PR on separating integrations and > > improving > > >>> our > > >>> > CI > > >>> > > > > > footprint (https://github.com/apache/airflow/pull/7091) > but > > >>> during > > >>> > > > > > this change I have found that we have - apparently - > > >>> dysfunctional > > >>> > > > > > DaskExecutor in Airflow 2.0. > > >>> > > > > > > > >>> > > > > > There is a "test_dask_executor.py" for which all tests are > > >>> skipped. > > >>> > > > > > And they fail when I try to run the tests. I tried to look > > for > > >>> any > > >>> > > > > > reference in devlist archives but I couldn't find anything > > >>> about > > >>> > it. > > >>> > > > > > > > >>> > > > > > Can someone shed some light on this? Should we remove Dask > > >>> executor > > >>> > > > > > completely from Airflow 2.0 ? Or should we fix the > > >>> tests/executor ? > > >>> > > > > > Has it been discussed ? > > >>> > > > > > > > >>> > > > > > J. > > >>> > > > > > > > >>> > > > > > -- > > >>> > > > > > Jarek Potiuk > > >>> > > > > > Polidea <https://www.polidea.com/> | Principal Software > > >>> Engineer > > >>> > > > > > > > >>> > > > > > M: +48 660 796 129 <+48660796129> > > >>> > > > > > [image: Polidea] <https://www.polidea.com/> > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > =============================================================================== > > >>> > > > > > > >>> > > > > Please access the attached hyperlink for an important > > electronic > > >>> > > > > communications disclaimer: > > >>> > > > > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > =============================================================================== > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > -- > > >>> > > > > > >>> > > > Jarek Potiuk > > >>> > > > Polidea <https://www.polidea.com/> | Principal Software > Engineer > > >>> > > > > > >>> > > > M: +48 660 796 129 <+48660796129> > > >>> > > > [image: Polidea] <https://www.polidea.com/> > > >>> > > > > > >>> > > > > >>> > > > > >>> > > -- > > >>> > > Darren L. Weber, Ph.D. > > >>> > > http://psdlw.users.sourceforge.net/ > > >>> > > http://psdlw.users.sourceforge.net/wordpress/ > > >>> > > > > >>> > > > >>> > > > >>> > -- > > >>> > > > >>> > Jarek Potiuk > > >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer > > >>> > > > >>> > M: +48 660 796 129 <+48660796129> > > >>> > [image: Polidea] <https://www.polidea.com/> > > >>> > > >>> > > >>> > > >>> -- > > >>> > > >>> Jarek Potiuk > > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >>> > > >>> M: +48 660 796 129 <+48660796129> > > >>> [image: Polidea] <https://www.polidea.com/> > > >> > > >> > > >> > > >> -- > > >> > > >> Jarek Potiuk > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >> > > >> M: +48 660 796 129 <+48660796129> > > >> [image: Polidea] <https://www.polidea.com/> > > >> > > >> > > > > > > -- > > > > > > Jarek Potiuk > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > M: +48 660 796 129 <+48660796129> > > > [image: Polidea] <https://www.polidea.com/> > > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > > > -- > Darren L. Weber, Ph.D. > http://psdlw.users.sourceforge.net/ > http://psdlw.users.sourceforge.net/wordpress/ > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
