Hello Andrey,

I think both myself and Maxime - we asked some important questions. If you
want to proceed with the donation, I think it would be great if you let us
know what do you think about the issues we mentioned. I know also Michael
whom I met at the workshops in Berlin - was very interested in this - so
maybe you can take part in the discussion.  If you are willing to donate
the code and continue the discussion on it , I think we have to start well
... discussing :).

I just copied our point below to make it easier to answer both of us at the
same time:

Jarek:

1) is the CWL package more of a converter of CWL to Python DAG files (that
can then be scheduled as usual) or whether it is running alongside of the
scheduler and schedules tasks and operators separately using different
scheduling engine?. As a reference there is an
https://github.com/GoogleCloudPlatform/oozie-to-airflow converter from
Oozie XML to airflow DAGs. I think the biggest advantage of Airflow is
being able to modify and iterate quickly using python code so having
aPython Dag generated from CWL might be a good idea - even if it is not
perfect, user can still modify it and extend later manually rather than
relaying that all the features of CWL are implemented.

2 I'd also like to understand what dependencies it introduces on Airflow -
whether it relies on certain internals of Airflow that could make Airflow's
evolution more difficult? Also we have a roadmap for Airflow 2.0 already
and there are certain incompatibilities implemented, more is planned
already (and more to come not planned yet). Is the CWL importer 1.10
compatible or both 1.10 and (current state of)  2.0? Have you been
following some of the discussions with 2.0 and are you aware of some
potential incompatibilities?

3) What are the benefits you see to have Airflow CWL package  managed by
the Airflow community rather than CWL one? It could work both ways - it
could be managed by either of the communities (as usual in case of such
imports), but I think it has to be carefully weighted who maintains it
eventually - it all depends on how much one could rely on other, what is
the release cycle of CWL new versions  vs. Airflow versions etc. Could you
share your thought process and why you think it should be part of Airflow ?

Maxime:

4) Personally I like the idea of an ecosystem of packages (and repos)
managed
and maintained by their specialist. That way they can have their own CI,
their own release processes and cycles, and "namespaced" notifications. If
anything I'd rather push in the direction of breaking Airflow into many
smaller packages (core, scheduler, web, ...) as opposed to tacking other
projects on top of it.

5) Also arguably Airflow's DSL may be more "common" than CWL. Clearly CWL
has
more focussed intentions around creating something universal, but to me
that doesn't necessarily make it more legitimate or common than other specs
(Oozie, Azkaban , Informatica, ...) and should be treated similarly (would
we want to include extensions to all these as part of Airflow?).

6) I also prefer the codegen/migration approach (I think the
`oozie-to-airflow` tool does that) to allow a path that resolves the common
denominator lmitations. How can this tooling expose features that are
proper to Airflow (pools, priority weights, xcoms, callbacks!, ...)?

J.

On Thu, Oct 31, 2019 at 1:57 AM Maxime Beauchemin <
[email protected]> wrote:

> As someone who has spent a lot of time acting as a maintainer, a code
> "donation" seems like dangerous gift to accept.
>
> Personally I like the idea of an ecosystem of packages (and repos) managed
> and maintained by their specialist. That way they can have their own CI,
> their own release processes and cycles, and "namespaced" notifications. If
> anything I'd rather push in the direction of breaking Airflow into many
> smaller packages (core, scheduler, web, ...) as opposed to tacking other
> projects on top of it.
>
> Also arguably Airflow's DSL may be more "common" than CWL. Clearly CWL has
> more focussed intentions around creating something universal, but to me
> that doesn't necessarily make it more legitimate or common than other specs
> (Oozie, Azkaban , Informatica, ...) and should be treated similarly (would
> we want to include extensions to all these as part of Airflow?).
>
> I also prefer the codegen/migration approach (I think the
> `oozie-to-airflow` tool does that) to allow a path that resolves the common
> denominator lmitations. How can this tooling expose features that are
> proper to Airflow (pools, priority weights, xcoms, callbacks!, ...)?
>
> Max
>
> On Wed, Oct 30, 2019 at 12:32 PM Andrey Kartashov <[email protected]>
> wrote:
>
> > My name is Andrey and I'm developer behind CWL-Airflow.
> > This message is follow up slack conversation. I copy past some messages
> > from there here.
> >
> >
> > >> Slack chat:
> >
> > When I've met CWL team there were no pipeline managers to support it.
> I've
> > picked up Airflow to just prove the concept that it is possible.
> >
> > The same time I was looking for a pipeline manager to use  for
> > bioinformatic analysis and asked tons of questions from Airflow team as a
> > result special note in documentation: "Beyond the Horizon".
> Nevertheless, I
> > adopted Airflow for our bioinformatic use
> >
> > There are more than 200 different pipeline managers, and to believe that
> > in nearest future there will the only one and perfect one sounds
> > impossible. So, to exchange pipeline logic between different pipeline
> > managers and people it is good to have a standard (CWL is a a perfect
> fit)
> > like JavaScript standard and different executers, browsers...
> >
> > Apache taverna (pipeline manager) is working on adopting CWL for a while
> > now, we have  code it is already working.
> >
> > So yes, CWL-Airflow is developed and the use is simple it extends Airflow
> > DAG class. However it is still required to put .py file with DAG (CWLDAG
> in
> > our case) to the dag directory. I would like just to put .cwl file into
> DAG
> > directory to simplify the usage
> >
> > I'm ready to develop what is necessary, but I'm not quite sure (I'm not a
> > big expert in airflow code) which way to go, plugin or some native core
> > code, or ...
> >
> > The project by itself lives https://github.com/Barski-lab/cwl-airflow,
> > there are tons of CWL tests
> > https://ci.commonwl.org/job/airflow-conformance/
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to