Andrew, Thanks for chiming in - just to answer your questions and clarify the scope of the discussion:
Breeze is for developing Airflow itself, it's purpose is not to develop and run DAGs. It was never intended to be used by the "users" of Airflow or DAG development or testing the DAGs. And while we were pondering with that thought recently, I think it never will be this, it is simply not fit for the purpose. Even the "start-airflow" command is there mainly for the developers of Airflow, not for the users of it. For example, it can be quickly used to test if a new release candidate for Apache Aiirflow "works" - thanks to it in a few minutes I can run a released version of Airflow in several combinations of python/backend and see that it generally "works". So for the docker-compose user production image" - sure, it is needed but this is a different issue, different users, and a completely different use-case (even if "docker-compose" name is there too). Those two are completely different use-cases, starting from the fact that even the docker image used there is different. Maybe this is what both you and Ash are talking about. In which case I fully agree it's needed, but I believe we are not talking about it here. If you want to have this kind of approach you are talking about, you can take a look at the issue here: https://github.com/apache/airflow/issues/8605. Nobody works on it actively now, but I would love someone who takes a lead on it and completes it. I am happy to help and review it as much as I can. But maybe you would like to take a lead on it Andrew since you have some experience and real use case behind? I think we need people there who are actual users of Airflow - which sadly, I am mostly not one :) But let's not mix the two please :). I'd love to keep this thread focused on *"Breeze, the development environment for Airflow itself"*. Even the tagline of Breeze "*It's a Breeze to develop Airflow*." rather than "It's a Breeze to develop DAGs" J. On Wed, Nov 11, 2020 at 6:48 PM Jarek Potiuk <[email protected]> wrote: > Tomek: > > I started the discussion here, so just everyone is aware of it even if > they are not watching GH issues. I now created the GH Issue > https://github.com/apache/airflow/issues/12282 so that I can gather > together people with some interest and I think it's best to continue the > discussion there. > > What I plan to do within the next few days, is to start a design document > and design discussion. I would like to start with defining the actual users > of Breeze, the use-cases it should serve, the purpose, and the set of > assumptions that it should have. And only after we hash it all out, I would > like to define the scope, decide whether we want to have one or many > different tools for different users, how much of it is common and whether > we can remove some of it completely or simplify it. > > I think we've gathered enormous experience from various levels of > developers while using Breeze and it's a perfect moment to discuss (with > those various users) what is useful, for whom, what makes sense, and how to > provide the best interface. I see the current Breeze as a learning platform > on what is useful and what is not, and I would love - this time - so that > decisions in it are made by the actual users (of a various kind). And I > would love to lead it - not as a developer this time, but as a "product > manager" - listening to various voices and trying to make the best of > it, reaching some consensus and working with others to implement it. I > think this is the best use of the experience we had with Breeze and the > "crowd-wisdom" of the developers of Airflow of a different kind and with a > different experience. > > J. > > > On Wed, Nov 11, 2020 at 4:09 PM Andrew Harmon <[email protected]> > wrote: > >> I would agree as an end user, I’m not really sure what Breeze does. Is it >> for CI or is it a way to quickly spin up a containerized env for local >> development. I do think it would be great to have something similar to >> Puckel that uses official airflow images. Very easy to quickly get started >> with to give airflow a try, but also a jumping off point for organizations >> to customize it to their needs. If this is decker-compose or something >> else, that’s fine. We use a customized version of puckel for all the >> engineers to do local dag development. It would be great if this was more >> “official” Airflow. I agree that python would make it easier for others to >> contribute. Finally, very clear documentation on the Airflow site would be >> very helpful too. >> >> Thanks, >> Andrew Harmon >> >> On Nov 11, 2020, at 6:58 AM, Tomasz Urbaszek <[email protected]> >> wrote: >> >> +1 for using python. >> >> > I would also say: make breeze do less. Right now it is three major >> things: >> > * A local development environment >> > * CI runner >> > * It's recently grown the ability to run airflow for developing dags. >> >> My first thought was similar - breeze does too much now. However, I think >> the problem is not in plenty of functionality but in technology used - >> bash. Using python or any other language will let us create a nice and >> clear structure for the project that will be easy to onboard, reason about >> and manage. >> >> Structuring breeze may allow us to leverage using separate docker images, >> docker composes for different purposes (CI, DAG dev, Airflow dev). I like >> the way in which breeze is a "layer over docker" and I think this gives a >> nice experience. However, breeze has grown so big that I'm not sure even if >> I use half of the functions it has. >> >> *Note:* where should we continue the discussion? The official place is >> devlist, but we have GH issue. Which one should we use to avoid two >> separate discussions? >> >> Tomek >> >> >> On Wed, Nov 11, 2020 at 12:13 PM Jarek Potiuk <[email protected]> >> wrote: >> >>> I also created issue for it: >>> https://github.com/apache/airflow/issues/12282 >>> >>> Anyone interested in taking part - please comment there! >>> >>> On Wed, Nov 11, 2020 at 11:59 AM Jarek Potiuk <[email protected]> >>> wrote: >>> >>>> You screamed (among many others) and I listened :). And I think the >>>> time is now to act. >>>> >>>> I believe the scope of "Breeze 2" should be part of the design >>>> discussion, where we will hear other's opinions (especially the first time >>>> or fresh contributors). >>>> >>>> For now, my vision is quite a bit different than yours Ash :). But I do >>>> not want to start a design discussion just yet, I want to make breathing >>>> space for others to chime in. >>>> >>>> I would love to hear many voices and interests of people before we deep >>>> dive into what "Breeze 2" might look like. >>>> >>>> What I am interested in is whether: >>>> >>>> a) it's the right time >>>> b) python is the right choice >>>> c) do I have several people who would like to join and offer both - >>>> help in designing the vision for it, as well as their time to implement it. >>>> >>>> I think it is crucial that those people who will be implementing it, >>>> will be the main people who make design decisions about it, as I would love >>>> to have a strong group of people who would like to not only take part in >>>> developing it but also in maintaining it in the future. >>>> >>>> J. >>>> >>>> >>>> On Wed, Nov 11, 2020 at 11:11 AM Ash Berlin-Taylor <[email protected]> >>>> wrote: >>>> >>>>> Omg yes. I have been screaming out for this for months. >>>>> >>>>> $ find scripts -name '*.sh' | xargs egrep -v '^#' | wc -l >>>>> 6911 >>>>> >>>>> That's entirely too much bash for my liking by about an order of >>>>> magnitude ;) >>>>> >>>>> I would also say: make breeze do less. Right now it is three major >>>>> things: >>>>> >>>>> * A local development environment >>>>> * CI runner >>>>> * It's recently grown the ability to run airflow for developing dags. >>>>> >>>>> That is too much. Yes there is overlap, but it's just too much in one >>>>> tool, and too complex as a result. Some of this should just be replaced >>>>> with a docker-compose file (that uses published release images, not >>>>> floating master/nightly) and users told to run that. >>>>> >>>>> Make it simpler, fitting a core purpose - running CI consistently >>>>> should be it's only goal. >>>>> >>>>> -ash >>>>> >>>>> On Nov 11 2020, at 9:58 am, Jarek Potiuk <[email protected]> >>>>> wrote: >>>>> >>>>> Hello Everyone, >>>>> >>>>> TL; DR; I was thinking for quite a while on this and I think this is >>>>> the right time to raise that subject. It's been asked several times, why >>>>> Breeze is not written in something else than Bash since it is "that big" >>>>> or >>>>> some people said "monstrous" :). I think it's the right time to start a >>>>> "rewrite" project with wide community involvement and Python seems to be >>>>> the best choice :). >>>>> >>>>> >>>>> While I was opposing this while we were focusing on Airflow 2.0, and >>>>> there are some good reasons why initially I started Breeze in Bash, I >>>>> think >>>>> with the current state of Airflow 2.0 betas, with Airflow 2.0 fully based >>>>> on Python 3.6 and with some "stability" and "good set of features" we have >>>>> in Breeze and a good level of modularisation we achieved - it's the right >>>>> time to think about a rewrite. >>>>> >>>>> I did not raise this subject to add a distraction on top of what is >>>>> already a lot of work for 2.0, but I think having Breeze rewritten in >>>>> Python could be the "one more thing" that we could do - as a community to >>>>> make 2.0 experience even better, and one that can make the community even >>>>> closer. >>>>> >>>>> I was thinking that Breeze is perfect to be split into separate >>>>> smaller pieces, describe some assumptions that we will have for its use, >>>>> and turn it into a true community effort where a lot of people will >>>>> contribute and where we will be able to simplify some of the stuff, and - >>>>> most importantly - make more people from the community know about how our >>>>> CI and development environment works and be able to solve any problems >>>>> there. >>>>> >>>>> Breeze (and underlying bash libraries) are crucial, to get our CI >>>>> working and I am mostly the single point of contact (and failure!) when it >>>>> comes to that - I would love to not be one :) and I think with most of the >>>>> core committers busy with 2.0, this is also an opportunity for more of the >>>>> contributors to take their part in it (and eventually earn their rank to >>>>> become committers!). For the core committers, this is an extra opportunity >>>>> to learn how the system works, influence its design, and possibly simplify >>>>> some parts of it - even if they will be mostly focused on 2.0. >>>>> >>>>> I would like to do it well - write some assumptions in a design doc, >>>>> plan the work and split it into separate issues, and lead the effort - but >>>>> I would love if most of the work is done by others, who would then become >>>>> familiar with the whole of it. >>>>> >>>>> WDYT? Do you think it is a good idea? Do you thin k it is the right >>>>> time? Are there some people in the community who would like to take part >>>>> in >>>>> it? >>>>> >>>>> J. >>>>> >>>>> -- >>>>> Jarek Potiuk >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>>> M: +48 660 796 129 <+48660796129> >>>>> [image: Polidea] <https://www.polidea.com/> >>>>> >>>>> >>>> >>>> -- >>>> Jarek Potiuk >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>> M: +48 660 796 129 <+48660796129> >>>> [image: Polidea] <https://www.polidea.com/> >>>> >>>> >>> >>> -- >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> >>> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
