I believe there are just two big questions: - what is CWL/why CWL-Airflow exists/should be CWL or CWL-Airflow be a part of Apache-Airflow -technical implementation of CWL-Airflow
I'd like to start by saying I'm a big big fan of Airflow, otherwise there wouldn’t be CWL-Airflow :). I like everything about Airflow development. Especially the way of adding extra packages to Airflow like Kubernetes for example with just pip install 'apache-airflow[kubernetes]'. I believe it would be nice to have pip install 'apache-airflow[cwl]' too. 1) My thoughts below might seem emotional, but they express the idea of CWL in simple words. Here I'm citing from http://commonwl.org: "The Common Workflow Language (CWL) is an open standard for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environment". So CWL is a standard and it is more about how describe a workflow in a way that everybody will understand and be able to reproduce. It might be not yet widely implemented and lacks some features, but it’s definitely not about favorite programming language or pipeline manager. Of course, this CWL specification brings some limitations and they exist because of CWL attempts to formalize the most common features of any pipeline. We believe that it’s worth moving towards CWL standard based on the growing interest of people and big companies who run scientific pipelines. There are a lot of published scientific papers. Projects such as Toil, Arvados, Galaxy, Taverna and others already have solutions to run CWL pipelines. People are interested in pipelines that are easy to share. Even IBM released CWL HPC executer The main idea that I want to express is that CWL is not another pipeline manager or executer, it is a specification, a standard. So, we cannot compare it with Oozie or Azkaban etc. We agree that there are more than 200 pipeline managers and no standard/specification. That’s why we are trying to show that CWL is not a burden but a solution to make at least some pipelines easily shareable and reproducible. 2) I believe that it will not take too much time for Airflow team to implement CWL-reader within Airflow structure, if you don’t like donations. I understand that CWL-Airflow depends on cwltool package which in its turn depends on other libs like: ruamel.yaml,rdflib, shellescape, schema-salad, psutil, scandir, pathlib, …, but it can be easily simplified if necessary. CWL-Airflow creates Airflow DAG with all its steps on the fly on every dagbag refresh. I think this behavior can be considered as converter. More details can be found here https://cwl-airflow.readthedocs.io/en/1.0.18/readme/how_it_works.html#what-s-inside