Some initial learning and application of dask-distributed unit testing is
WIP in
- https://github.com/apache/airflow/pull/6984

All the dask-distributed pytest fixtures are already available in Airflow
(with the dask option installed).

The contribution docs are helpful, i.e.
- https://docs.dask.org/en/latest/develop.html
- https://distributed.dask.org/en/latest/develop.html


On Mon, Jan 20, 2020 at 8:57 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Yes definitely, we are actually going to add some content from the Workshop
> during the Airflow Summit to Airflow Website and a link to that page would
> be added to CONTRIBUTING.md.
>
>
> On Mon, Jan 20, 2020 at 10:18 PM Darren Weber <dweber.consult...@gmail.com
> >
> wrote:
>
> > Via the GSOC thread, I found
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/First+time+contributor%27s+workshop
> > - agree with a comment on that thread that a wiki page link from
> > CONTRIBUTING.md could be useful too
> >
> > On Mon, Jan 20, 2020 at 8:30 AM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> > > We are actually planning (pending confirmation) "First time Apache
> > Airflow
> > > Contributor's" training at PyCon US in April. I think if there is a
> good
> > > usage of Dask and we got Scientific -oriented users using Airflow with
> > Dask
> > > - I am all for having a closer cooperation on that topic :).
> > >
> > > J.
> > >
> > >
> > > On Mon, Jan 20, 2020 at 5:25 PM Darren Weber <
> > dweber.consult...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for the ping on https://github.com/dask/dask/issues/5803
> > > >
> > > > I'm curious about how dask async features might be low-hanging fruit
> > for
> > > > Airflow scaling
> > > > - https://distributed.dask.org/en/latest/asynchronous.html
> > > > - https://github.com/apache/airflow/pull/6984
> > > >
> > > > Our company has scientific workflows and it uses dask, usually on
> large
> > > EC2
> > > > instances or batch jobs.  I've been getting familiar with dask from a
> > > user
> > > > perspective; I don't yet know the internals from a dev-perspective.
> I
> > > > mostly use dask.delayed to scale threads/processes on a local host,
> > with
> > > a
> > > > simple concurrent.futures API.  Dask.distributed can also run a
> cluster
> > > > with client connections (I previously worked with spark a bit and
> dask
> > > has
> > > > some good documentation on the comparisons between spark and dask).
> > > There
> > > > are also some options for auto-scaling a dask cluster using k8s -
> > > > https://docs.dask.org/en/latest/setup/adaptive.html - so you get an
> > > > auto-scaling cluster with a lot of features for scientific computing
> > with
> > > > the scipy-compatible stack.
> > > >
> > > > I can't promise to complete anything in a timely manner, despite any
> > > > proposals to remove dask executors entirely.  I may be in-n-out of
> > these
> > > > discussions from time-to-time, possibly silent for several weeks at a
> > > time
> > > > while I'm heads down on my full-time position.  So if Airflow 2.0
> > removes
> > > > them for whatever reason, I would hope it could be possible to add
> them
> > > > back in Airflow 2.1 if the work can be done to get it working and the
> > > > design patterns make sense and/or there is a larger user community
> than
> > > > anyone is yet aware of.  At present, I don't hear a clear
> specification
> > > for
> > > > having it work or an argument that it doesn't work at all, but I hear
> > and
> > > > see that unit tests are disabled.  It might be possible to identify
> in
> > > dask
> > > > itself how to setup the test environment.  It might help to better
> > > > understand the niche that dask serves well.
> > > >
> > > > The online forums and github may suffice, but if it would be possible
> > to
> > > > find funding to sponsor a joint hack-a-thon at PyCon or something,
> that
> > > > would be great.  As a new contributor to Airflow, I'm still learning
> > the
> > > > ropes and it would be good to attend an Airflow contributor workshop
> > > (maybe
> > > > someone could spin one up in the bay-area?).
> > > >
> > > > Best,
> > > > Darren
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jan 19, 2020 at 9:28 AM Jarek Potiuk <
> jarek.pot...@polidea.com
> > >
> > > > wrote:
> > > >
> > > > > Seems like there is an interest
> > > https://github.com/dask/dask/issues/5803
> > > > > :).
> > > > > Let's see where it gets us.
> > > > >
> > > > > J.
> > > > >
> > > > > On Sat, Jan 18, 2020 at 9:46 PM Jarek Potiuk <
> > jarek.pot...@polidea.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Following discussion Dask's gitter, I created an issue in Dask's
> > > > github :
> > > > > > https://github.com/dask/dask/issues/5803
> > > > > >
> > > > > > Let's see if we can get someone from Dask community interested.
> > > > > >
> > > > > > On Fri, Jan 17, 2020 at 10:00 PM Jarek Potiuk <
> > > > jarek.pot...@polidea.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Good idea :) doing that,
> > > > > >>
> > > > > >> On Fri, Jan 17, 2020 at 9:58 PM Daniel Imberman <
> > > > > >> daniel.imber...@gmail.com> wrote:
> > > > > >>
> > > > > >>> Maybe we can reach out to a company that does Dask as a
> service?
> > > > > >>>
> > > > > >>> via Newton Mail [
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> > > > > >>> ]
> > > > > >>> On Fri, Jan 17, 2020 at 9:31 AM, Jarek Potiuk <
> > > > > jarek.pot...@polidea.com>
> > > > > >>> wrote:
> > > > > >>> Yeah. I think if we do not find anyone willing to champion it
> (no
> > > > > matter
> > > > > >>> committer or contributor), I would be for dropping it.
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> On Fri, Jan 17, 2020 at 6:07 PM Daniel Imberman <
> > > > > >>> daniel.imber...@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > I think we need to ask “who is going to champion this
> > executor.”
> > > I
> > > > > see
> > > > > >>> > that it is being used (a bit), but am concerned if no one
> with
> > > > > >>> knowledge of
> > > > > >>> > this executor is willing to maintain it.
> > > > > >>> >
> > > > > >>> > I’ve personally never used Dask and the DaskExecutor isn’t
> > super
> > > > high
> > > > > >>> on
> > > > > >>> > my priority list compared to things like autoscaling, DAG
> > > > > >>> serialization,
> > > > > >>> > etc.
> > > > > >>> >
> > > > > >>> > via Newton Mail [
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> > > > > >>> > ]
> > > > > >>> > On Fri, Jan 17, 2020 at 6:07 AM, Jarek Potiuk <
> > > > > >>> jarek.pot...@polidea.com>
> > > > > >>> > wrote:
> > > > > >>> > Do we have anyone here who uses Dask Executor and would like
> to
> > > > test
> > > > > >>> it/fix
> > > > > >>> > the tests. They are marked now as xfailed (expected to fail)
> > and
> > > it
> > > > > >>> would
> > > > > >>> > be great to fix them.
> > > > > >>> >
> > > > > >>> > J.
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > On Tue, Jan 14, 2020 at 12:18 AM Darren Weber <
> > > > > >>> dweber.consult...@gmail.com
> > > > > >>> > >
> > > > > >>> > wrote:
> > > > > >>> >
> > > > > >>> > > +1 for keeping it and fixing tests
> > > > > >>> > >
> > > > > >>> > > PS, I also noticed the skipped tests while looking at an
> > option
> > > > to
> > > > > >>> use
> > > > > >>> > the
> > > > > >>> > > async client feature; if/when I get time to get back on
> that
> > > and
> > > > > >>> figure
> > > > > >>> > out
> > > > > >>> > > how the test setup needs to work, I might also discover how
> > to
> > > > > enable
> > > > > >>> > tests
> > > > > >>> > > for the non-async executor. No promises, just noting that
> I'm
> > > > aware
> > > > > >>> of it
> > > > > >>> > > too.
> > > > > >>> > >
> > > > > >>> > > On Mon, Jan 13, 2020 at 8:06 AM Jarek Potiuk <
> > > > > >>> jarek.pot...@polidea.com>
> > > > > >>> > > wrote:
> > > > > >>> > >
> > > > > >>> > > > For now I marked the skipped tests we had (including
> Dask)
> > as
> > > > > >>> > > > pytest.mark.xfail (means - expected to fail). They will
> be
> > > > > >>> executed and
> > > > > >>> > > > summarized as XFail tests and we will have to deal with
> > them
> > > at
> > > > > >>> some
> > > > > >>> > > point.
> > > > > >>> > > >
> > > > > >>> > > > I think we will have to decide if we want to keep it or
> > not,
> > > > and
> > > > > >>> either
> > > > > >>> > > > remove both tests and executor or fix the tests.
> > > > > >>> > > >
> > > > > >>> > > > J.
> > > > > >>> > > >
> > > > > >>> > > > On Mon, Jan 13, 2020 at 4:40 PM Shaw, Damian P. <
> > > > > >>> > > > damian.sha...@credit-suisse.com> wrote:
> > > > > >>> > > >
> > > > > >>> > > > > FYI I used Dash instead of Local Executor when first
> > > starting
> > > > > >>> > Airflow,
> > > > > >>> > > it
> > > > > >>> > > > > was a great way to make sure the Executor and Scheduler
> > > > weren’t
> > > > > >>> tied
> > > > > >>> > to
> > > > > >>> > > > > each other with no difficulty in set-up. But once I
> > > actually
> > > > > >>> started
> > > > > >>> > > > > deploying to multiple boxes I needed queue names pretty
> > > > > quickly.
> > > > > >>> So
> > > > > >>> > not
> > > > > >>> > > > > going to say it's needed but for me it was a helpful
> > > stepping
> > > > > >>> stone.
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > > -----Original Message-----
> > > > > >>> > > > > From: Ash Berlin-Taylor <a...@apache.org>
> > > > > >>> > > > > Sent: Sunday, January 12, 2020 17:38
> > > > > >>> > > > > To: dev@airflow.apache.org
> > > > > >>> > > > > Cc: dev@airflow.apache.org
> > > > > >>> > > > > Subject: Re: Remove Dask Executor in Airflow 2.0 ?
> > > > > >>> > > > >
> > > > > >>> > > > > It hasn't been discussed before, but unlike the Mesos
> one
> > > > this
> > > > > >>> one
> > > > > >>> > was
> > > > > >>> > > > > seen a (tiny) bit of activity in 1.10 so at least one
> > > person
> > > > is
> > > > > >>> using
> > > > > >>> > > it
> > > > > >>> > > > > https://github.com/apache/airflow/pull/5273
> > > > > >>> > > > >
> > > > > >>> > > > > On Jan 12 2020, at 9:05 pm, Jarek Potiuk <
> > > > > >>> jarek.pot...@polidea.com>
> > > > > >>> > > > wrote:
> > > > > >>> > > > > > I am finishing the PR on separating integrations and
> > > > > improving
> > > > > >>> our
> > > > > >>> > CI
> > > > > >>> > > > > > footprint (
> https://github.com/apache/airflow/pull/7091
> > )
> > > > but
> > > > > >>> during
> > > > > >>> > > > > > this change I have found that we have - apparently -
> > > > > >>> dysfunctional
> > > > > >>> > > > > > DaskExecutor in Airflow 2.0.
> > > > > >>> > > > > >
> > > > > >>> > > > > > There is a "test_dask_executor.py" for which all
> tests
> > > are
> > > > > >>> skipped.
> > > > > >>> > > > > > And they fail when I try to run the tests. I tried to
> > > look
> > > > > for
> > > > > >>> any
> > > > > >>> > > > > > reference in devlist archives but I couldn't find
> > > anything
> > > > > >>> about
> > > > > >>> > it.
> > > > > >>> > > > > >
> > > > > >>> > > > > > Can someone shed some light on this? Should we remove
> > > Dask
> > > > > >>> executor
> > > > > >>> > > > > > completely from Airflow 2.0 ? Or should we fix the
> > > > > >>> tests/executor ?
> > > > > >>> > > > > > Has it been discussed ?
> > > > > >>> > > > > >
> > > > > >>> > > > > > J.
> > > > > >>> > > > > >
> > > > > >>> > > > > > --
> > > > > >>> > > > > > Jarek Potiuk
> > > > > >>> > > > > > Polidea <https://www.polidea.com/> | Principal
> > Software
> > > > > >>> Engineer
> > > > > >>> > > > > >
> > > > > >>> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > >>> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> ===============================================================================
> > > > > >>> > > > >
> > > > > >>> > > > > Please access the attached hyperlink for an important
> > > > > electronic
> > > > > >>> > > > > communications disclaimer:
> > > > > >>> > > > >
> > > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> ===============================================================================
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > > >
> > > > > >>> > > > --
> > > > > >>> > > >
> > > > > >>> > > > Jarek Potiuk
> > > > > >>> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > > Engineer
> > > > > >>> > > >
> > > > > >>> > > > M: +48 660 796 129 <+48660796129>
> > > > > >>> > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > --
> > > > > >>> > > Darren L. Weber, Ph.D.
> > > > > >>> > > http://psdlw.users.sourceforge.net/
> > > > > >>> > > http://psdlw.users.sourceforge.net/wordpress/
> > > > > >>> > >
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > --
> > > > > >>> >
> > > > > >>> > Jarek Potiuk
> > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > > > >>> >
> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> Jarek Potiuk
> > > > > >>> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > >>>
> > > > > >>> M: +48 660 796 129 <+48660796129>
> > > > > >>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> Jarek Potiuk
> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > >>
> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48660796129>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > > >
> > > > --
> > > > Darren L. Weber, Ph.D.
> > > > http://psdlw.users.sourceforge.net/
> > > > http://psdlw.users.sourceforge.net/wordpress/
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
>

Reply via email to