I believe there are just two big questions:
- what is CWL/why CWL-Airflow exists/should be CWL or CWL-Airflow be a part of 
Apache-Airflow 
-technical implementation of CWL-Airflow

I'd like to start by saying I'm a big big fan of Airflow, otherwise there 
wouldn’t be CWL-Airflow :). I like everything about Airflow development. 
Especially the way of adding extra packages to Airflow like Kubernetes for 
example with just pip install 'apache-airflow[kubernetes]'. I believe it would 
be nice to have pip install 'apache-airflow[cwl]' too.

1)
My thoughts below might seem emotional, but they express the idea of CWL in 
simple words.

Here I'm citing from http://commonwl.org: "The Common Workflow Language (CWL) 
is an open standard for describing analysis workflows and tools in a way that 
makes them portable and scalable across a variety of software and hardware 
environment". So CWL is a standard and it is more about how describe a workflow 
in a way that everybody will understand and be able to reproduce. It might be 
not yet widely implemented and lacks some features, but it’s definitely not 
about favorite programming language or pipeline manager. Of course, this CWL 
specification brings some limitations and they exist because of CWL attempts to 
formalize the most common features of any pipeline.

We believe that it’s worth moving towards CWL standard based on the growing 
interest of people and big companies who run scientific pipelines. There are a 
lot of published scientific papers. Projects such as Toil, Arvados, Galaxy, 
Taverna and others already have solutions to run CWL pipelines. People are 
interested in pipelines that are easy to share. Even IBM released CWL HPC 
executer

The main idea that I want to express is that CWL is not another pipeline 
manager or executer, it is a specification, a standard. So, we cannot compare 
it with Oozie or Azkaban etc. We agree that there are more than 200 pipeline 
managers and no standard/specification. 

That’s why we are trying to show that CWL is not a burden but a solution to 
make at least some pipelines easily shareable and reproducible.

2)
I believe that it will not take too much time for Airflow team to implement 
CWL-reader within Airflow structure, if you don’t like donations. I understand 
that CWL-Airflow depends on cwltool package which in its turn depends on other 
libs like: ruamel.yaml,rdflib, shellescape, schema-salad, psutil, scandir, 
pathlib, …, but it can be easily simplified if necessary. CWL-Airflow creates 
Airflow DAG with all its steps on the fly on every dagbag refresh. I think this 
behavior can be considered as converter. More details can be found here 
https://cwl-airflow.readthedocs.io/en/1.0.18/readme/how_it_works.html#what-s-inside
 


Reply via email to