Thanks a lot Jarek and Brian for your answers.

I will learn from the links you provided and when I have other questions I will 
post them in the correspondent discussion threads. As for the main topic of 
this thread, I agree that currently, it might be more reasonable to keep it in 
as a separate project, but I really hope to find other people from the Airflow 
community who might be interested in CWL.
P.S. I will update the documentation to keep it up to date with the latest 
version of CWL-Airflow so it won't cause any misunderstanding in the basic 
concepts of our package:)

Best regards,
Michael



On 2019/11/16 08:55:07, Jarek Potiuk <jarek.pot...@polidea.com> wrote: 
> >
> > 1) The more dags I have in a dags folder, the longer time it takes to
> > parse them all. Taking into account that in my case I have also to parse
> > CWL files, it takes even more time for such a simple operation. So I was
> > wondering is there any common solution to approach this issue. Also, I was
> > thinking if I can use your Plugins mechanism to integrate some additional
> > functionality such as parsing CWL files directly without making any changes
> > in the core of Airflow.
> >
> 
> As a follow-up from the "political" decision, I would say, the best
> solution will be to treat CWL-airflow as a separate "converter" really
> rather than closely integrate it with AIrflow. I would imagine that you
> have a separate folder with CWL files and you have a daemon watching that
> folder and starting the conversion process whenever any of the CWL files
> change and creating python DAG files in Airflow's dag folder. That seems
> like very loosely coupled and relying on the basic behaviour of Airflow.
> Also then it can be  easily combined with Git-sync solution for Kubernetes
> or another way of synchronising DAGs.
> 
> 
> > 2) I'm working on running CWL pipelines in Kubernetes through Airflow and
> > one of the problems that I have to deal with is sharing directories between
> > the PODs. It looks like Kubernetes doesn't provide the direct solution to
> > this problem and mostly relies on the platform where it is installed. I
> > will appreciate if you direct me to the proper discussions/threads where
> > people solve similar problems.
> >
> 
> There are two ways of sharing DAGs - persistent volume claims and git sync
> currently. Generally the approach is that you need 3rd-party distributed
> storage to share the dags and the synchronisation mechanism is not (yet)
> built-in Airflow. There is the AIP-5 Remote DAG Fetcher (
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher)
> where it has been discussed at length and there is the accompanying
> discussion thread (shorter than discussion at the doc)
> https://lists.apache.org/thread.html/224d1e7d1b11e0b8314075f21b1b81708749f2899f4cce5af295e8a8@%3Cdev.airflow.apache.org%3E
> but
> I don't think anyone in the community is working actively on AIP-5
> currently. I think the consensus in the community is that Airflow is
> solving scheduling but it does not solve distribution - so it delegates
> distributing files to dedicated solutions (and you can choose whichever
> solution you already have to do the task). This is really targeted for
> "corporate" deployments where usually corporates have already some
> distributed storage in place. Rather than force a single "distribution"
> solution for them, assumption is that Airflow will use whatever solution is
> deployed at that company. Also as next step we have plans to get rid of it
> completely in Airflow 2.0. Providing that we will implement full DAG
> Serialisation - this problem will be gone. All the DAG data will be stored
> in the database and hopefully no more volume sharing will be needed.
> 
> Here you can find simple description of using PVC's with Airflow on
> Kubernetes:
> https://medium.com/@ramandumcs/how-to-run-apache-airflow-on-kubernetes-1cb809a8c7ea
> .
> Git Sync is also nice - but requires a shared Git repo where DAGs are
> shared.
> 
> There are other solutions - Composer team for example uses 'gcsfuse' - a
> user-space synchronisation from a GCS bucket to local pod volume (they have
> two containers in a pod - gcsfuse as side-car to airflow worker, scheduler,
> UI sharing single volume). Then it is a matter of putting the generated
> Dags to a GCS bucket (your daemon could do just that). And you can use
> similar solutions for other dedicated "artifact" sharing. For example we've
> implemented similar side-car pod for Nexus - where production DAG files
> were shared as Nexus artifacts.
> 
> 
> > Thanks a lot,
> > Michael
> >
> >
> >
> >
> >
> >
> >
> > On 2019/11/15 10:17:30, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> > > I am also -1. But I am happy to help with surfacing the CWL integration
> > on
> > > - both the new package (together with Oozie-2-airflow and maybe other
> > > converters) and having it easily installable as external Package. I will
> > > talk to Andrey separately about this so that we do not clutter the
> > devlist.
> > >
> > > J.
> > >
> > > On Fri, Nov 15, 2019 at 7:37 AM Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > After all the exploration of this topic here in this thread, I'm a
> > pretty
> > > > hard -1 on this one.
> > > >
> > > > I think CWL and CWL-Airflow are great projects, but they can't rely on
> > the
> > > > Airflow community to evolve/maintain/package this integration.
> > > >
> > > > Personally I think that generally and *within reason* (winking at the
> > npm
> > > > communities ;) that smaller, targeted and loosely coupled packages [and
> > > > their corresponding smaller repositories with their own set of
> > maintainers]
> > > > is better than bigger monoliths. Some reasons:
> > > > * separation of concerns
> > > > * faster, more targeted builds and test suites
> > > > * independent release cycles
> > > > * clearer ownership
> > > > * independent and adapted level of rigor / styling / standards
> > > > * more targeted notifications for people watching repos
> > > > * ...
> > > >
> > > > Max
> > > >
> > > > On Thu, Nov 14, 2019 at 12:33 PM Andrey Kartashov <por...@porter.st>
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > >  I looked at the
> > > > > >
> > > > >
> > > >
> > https://cwl-airflow.readthedocs.io/en/1.0.18/readme/how_it_works.html#what-s-inside
> > > > > > to
> > > > > > understand what CWL is and that's where I took the descriptor +
> > job (in
> > > > > Key
> > > > > > Concepts).
> > > > > >
> > > > >
> > > > > Oh this is an old one, but even new one probably does not reflect the
> > > > real
> > > > > picture.
> > > > >
> > > > >
> > > > > OK. So as I understand finally the problem you want to solve - "To
> > make
> > > > > > Airflow more accessible to people who already use CWL or who will
> > find
> > > > it
> > > > > > easier to write dags in CWL". I still think this does not
> > necessarily
> > > > > have
> > > > > > to be solved by donating CWL code to Airflow (see below).
> > > > > >
> > > > >
> > > > > I think there are many ways.
> > > > >
> > > > >
> > > > > > Ok. So what you basically say is that you think Airflow community
> > has
> > > > > more
> > > > > > capacity than CWL community to maintain CWL converter.
> > > > >
> > > > > My understanding CWL community just developing common standard (CWL)
> > not
> > > > > converters or converter :). For me the CWL-Airflow developer
> > definitely
> > > > > Airflow community has far more capacity that me alone :)
> > > > >
> > > > > > I am not so sure
> > > > > > about it (precisely because of the lost opportunities). But maybe a
> > > > > better
> > > > > > solution is to ask in the airflow community whether there are
> > people
> > > > who
> > > > > > could join the CWL-airflow converter and increase the community
> > there.
> > > > > >
> > > > >
> > > > > That probably a good start just to check and see the interest
> > > > >
> > > > > > I would not say for the whole community, but I would not feel
> > > > comfortable
> > > > > > as a community to take responsibility on the converter without
> > prior
> > > > > > knowledge and understanding CWL in detail. Especially that it is
> > rather
> > > > > for
> > > > > > small group of users (at least initially). But I find CWL as an
> > idea
> > > > very
> > > > > > interesting and maybe there are some people in the community who
> > would
> > > > > love
> > > > > > to contribute to your project?  Suggestion - maybe just ask - here
> > and
> > > > in
> > > > > > slack - if there is enough interest in contributing to CWL-Airflow,
> > > > > rather
> > > > > > than donating the code to Airflow ? Just promote your project in
> > the
> > > > > > community and ask for help.
> > > > >
> > > > > I tried but have not got any feedback :) but I’m not a promoter or
> > seller
> > > > >
> > > > >
> > > > > >
> > > > > > I can see this as the best of both worlds - if you find a few
> > people
> > > > who
> > > > > > would like to help and get familiar with it and they are also part
> > of
> > > > the
> > > > > > Airflow community and we get collective knowledge about it - then
> > > > > > eventually it might lead to incorporating it to Airflow itself if
> > our
> > > > > > community gets more familiar with CWL. I think this is the best
> > way to
> > > > > > achieve the final goal of finally incorporating CWL as part of
> > Airflow.
> > > > > >
> > > > >
> > > > > Works for me
> > > > >
> > > > >
> > > > > > In the meantime - I am happy to help to make Airflow more "CWL
> > > > friendly"
> > > > > > for the users - both from documentation and Helm chart POV.
> > > > > >
> > > > >
> > > > > Thank you, I appreciate that, how we proceed?
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
> 
> 
> -- 
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
> 

Reply via email to