> For the docker example, you'd almost
want to inject or "layer" the DAG script and airflow package at run time.

Something sort of like Heroku build packs?

-a

On 20 December 2019 23:43:30 GMT, Maxime Beauchemin 
<[email protected]> wrote:
>This reminds me of the "DagFetcher" idea. Basically a new abstraction
>that
>can fetch a DAG object from anywhere and run a task. In theory you
>could
>extend it to do "zip on s3", "pex on GFS", "docker on artifactory" or
>whatever makes sense to your organization. In the proposal I wrote
>about
>using a universal uri scheme to identify DAG artifacts, with support
>for
>versioning, as in s3://company_dagbag/some_dag@latest
>
>One challenge is around *not* serializing Airflow specific code in the
>artifact/docker, otherwise you end up with a messy heterogenous cluster
>that runs multiple Airflow versions. For the docker example, you'd
>almost
>want to inject or "layer" the DAG script and airflow package at run
>time.
>
>Max
>
>On Mon, Dec 16, 2019 at 7:17 AM Dan Davydov
><[email protected]>
>wrote:
>
>> The zip support is a bit of a hack and was a bit controversial when
>it was
>> added. I think if we go down the path of supporting more DAG sources,
>we
>> should make sure we have the right interface in place so we avoid the
>> current `if format == zip then: else:` and make sure that we don't
>tightly
>> couple to specific DAG sourcing implementations. Personally I feel
>that
>> Docker makes more sense than wheels (since they are fully
>self-contained
>> even at the binary dependency level), but if we go down the interface
>route
>> it might be fine to add support for both Docker and wheels.
>>
>> On Mon, Dec 16, 2019 at 11:19 AM Björn Pollex
>> <[email protected]> wrote:
>>
>> > Hi Jarek,
>> >
>> > This sounds great. Is this possibly related to the work started in
>> > https://github.com/apache/airflow/pull/730? <
>> > https://github.com/apache/airflow/pull/730?>
>> >
>> > I'm not sure I’m following your proposal entirely. Initially, what
>would
>> > be a great first step would be to support loading DAGs from
>entry_point,
>> as
>> > proposed in the closed PR above. This would already enable most of
>the
>> > features you’ve mentioned below. Each DAG could be a Python
>package, and
>> it
>> > would carry all the information about required packages in its
>package
>> > meta-data.
>> >
>> > Is that what you’re envisioning? If so, I’d be happy to support you
>with
>> > the implementation!
>> >
>> > Also, I think while the idea of creating a temporary virtual
>environment
>> > for running tasks is very useful, I’d like this to be optional, as
>it can
>> > also create a lot of overhead to running tasks.
>> >
>> > Cheers,
>> >
>> >         Björn
>> >
>> > > On 14. Dec 2019, at 11:10, Jarek Potiuk
><[email protected]>
>> > wrote:
>> > >
>> > > I had a lot of interesting discussions last few days with Apache
>> Airflow
>> > > users at PyDataWarsaw 2019 (I was actually quite surprised how
>many
>> > people
>> > > use Airflow in Poland). One discussion brought an interesting
>subject:
>> > > Packaging dags in wheel format. The users mentioned that they are
>> > > super-happy using .zip-packaged DAGs but they think it could be
>> improved
>> > > with wheel format (which is also .zip BTW). Maybe it was already
>> > mentioned
>> > > in some discussions before but I have not found any.
>> > >
>> > > *Context:*
>> > >
>> > > We are well on the way of implementing "AIP-21 Changing import
>paths"
>> and
>> > > will provide backport packages for Airflow 1.10. As a next step
>we want
>> > to
>> > > target AIP-8.
>> > > One of the problems to implement AIP-8 (split hooks/operators
>into
>> > separate
>> > > packages) is the problem of dependencies. Different
>operators/hooks
>> might
>> > > have different dependencies if maintained separately. Currently
>we
>> have a
>> > > common set of dependencies as we have only one setup.py, but if
>we
>> split
>> > to
>> > > separate packages, this might change.
>> > >
>> > > *Proposal:*
>> > >
>> > > Our users - who love the .zip DAG distribution - proposed that we
>> package
>> > > the DAGs and all related packages in a wheel package instead of
>pure
>> > .zip.
>> > > This would allow the users to install extra dependencies needed
>by the
>> > DAG.
>> > > And it struck me that we could indeed do that for DAGs but also
>> mitigate
>> > > most of the dependency problems for separately-packaged
>operators.
>> > >
>> > > The proposal from our users was to package the extra dependencies
>> > together
>> > > with the DAG in a wheel file. This is quite cool on it's own, but
>I
>> > thought
>> > > we might actually use the same approach to solve dependency
>problem
>> with
>> > > AIP-8.
>> > >
>> > > I think we could implement "operator group" -> extra -> "pip
>packages"
>> > > dependencies (we need them anyway for AIP-21) and then we could
>have
>> > wheel
>> > > packages with all the "extra" dependencies for each group of
>operators.
>> > >
>> > > Worker executing an operator could have the "core" dependencies
>> installed
>> > > initially but then when it is supposed to run an operator it
>could
>> > create a
>> > > virtualenv, install the required "extra" from wheels and run the
>task
>> for
>> > > this operator in this virtualenv (and remove virtualenv). We
>could have
>> > > such package-wheels prepared (one wheel package per operator
>group) and
>> > > distributed either same way as DAGs or using some shared binary
>> > repository
>> > > (and cached in the worker).
>> > >
>> > > Having such dynamically created virtualenv has also the advantage
>that
>> if
>> > > someone has a DAG with specific dependencies - they could be
>embedded
>> in
>> > > the DAG wheel, installed from it to this virtualenv, and the
>virtualenv
>> > > would be removed after the task is finished.
>> > >
>> > > The advantage of this approach is that each DAG's extra
>dependencies
>> are
>> > > isolated and you could have even different versions of the same
>> > dependency
>> > > used by different DAGs. I think that could save a lot of
>headaches for
>> > many
>> > > users.
>> > >
>> > > For me that whole idea sounds pretty cool.
>> > >
>> > > Let me know what you think.
>> > >
>> > > J.
>> > >
>> > >
>> > > --
>> > >
>> > > Jarek Potiuk
>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >
>> > > M: +48 660 796 129 <+48660796129>
>> > > [image: Polidea] <https://www.polidea.com/>
>> >
>> >
>>

Reply via email to