Hello Andrey, I have nothing about emotions, I actually think emotions are super important and drive our passion to do stuff, but I'd love if you can answer the questions we ask by quoting and answering them.
I'd like to start by saying I'm a big big fan of Airflow, otherwise there > wouldn’t be CWL-Airflow :). I like everything about Airflow development. > Especially the way of adding extra packages to Airflow like Kubernetes for > example with just pip install 'apache-airflow[kubernetes]'. I believe it > would be nice to have pip install 'apache-airflow[cwl]' too. This can be easily done as Maxime said - have a separate cwl-airflow package in pypi and have it configured as dependency in 'cwl' extra. I see no problem with that. I assume you have a perfect converter that always produces good and schedulable .dag file. I imagine the users will run (on demand) the converter to convert the cwl descriptor + job to a RESULT.py in the dag folder and they otherwise run a pretty standard airflow to process it. Do you imagine a more "involved" integration with Airlfow? If so - how do you imagine the use case? Could you explain how do you envision the life-cycyle of such workflow? > > Here I'm citing from http://commonwl.org: "The Common Workflow Language > (CWL) is an open standard for describing analysis workflows and tools in a > way that makes them portable and scalable across a variety of software and > hardware environment". So CWL is a standard and it is more about how > describe a workflow in a way that everybody will understand and be able to > reproduce. It might be not yet widely implemented and lacks some features, > but it’s definitely not about favorite programming language or pipeline > manager. Of course, this CWL specification brings some limitations and they > exist because of CWL attempts to formalize the most common features of any > pipeline. > Do you imagine that ALL airflow users starts using CWL as their main workflow language? For the reasons described above (Python use/Data Scientist approach) I think this is not going to happen for Airflow, I can understand that you want to run a CWL workflows using Airflow, but I do not see Airflow DAG developers switching to use CWL as their main workflow language. And for people who have 100s or 1000s of DAGs there is no easy way to convert them to CWL (is there?). > We believe that it’s worth moving towards CWL standard based on the > growing interest of people and big companies who run scientific pipelines. > There are a lot of published scientific papers. Projects such as Toil, > Arvados, Galaxy, Taverna and others already have solutions to run CWL > pipelines. People are interested in pipelines that are easy to share. Even > IBM released CWL HPC executer > I understand there are papers/ I also understand that there is a group of people who would like to have easily shareable pipelines. Why do you think this is important for Airflow users to have them? > > The main idea that I want to express is that CWL is not another pipeline > manager or executer, it is a specification, a standard. So, we cannot > compare it with Oozie or Azkaban etc. We agree that there are more than 200 > pipeline managers and no standard/specification. That's fine. I understand it's a standard. My point is - as long as you just want to treat airflow as executor of CWL and provide converter to make a python DAG out of that - that's perfectly fine (for someone who already uses CWL). Do you need anything more?. > > > That’s why we are trying to show that CWL is not a burden but a solution > to make at least some pipelines easily shareable and reproducible. > Is there anything lacking in the current "converter" that makes it not good in this job? From what I understand if someone wants to write a shareable and reproducible workflow, they can write it in CWL and use your converter and run it via Airflow (or HPC or whatever other pipeline managers). Is there anything else that you think deeper Airflow integration can help with here ? I believe that it will not take too much time for Airflow team to implement > CWL-reader within Airflow structure, if you don’t like donations. I > understand that CWL-Airflow depends on cwltool package which in its turn > depends on other libs like: ruamel.yaml,rdflib, shellescape, schema-salad, > psutil, scandir, pathlib, …, but it can be easily simplified if necessary. > CWL-Airflow creates Airflow DAG with all its steps on the fly on every > dagbag refresh. I think this behavior can be considered as converter. More > details can be found here > https://cwl-airflow.readthedocs.io/en/1.0.18/readme/how_it_works.html#what-s-inside > It's not a question of time, but rather the question of focus (and missed opportunities) - we already have a full roadmap for 2.0 and beyond and we are all working hard on it. Since you have already working solution - why do you not want to maintain it? What's the problem you want to solve by donating the code to Airflow team ? J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>