> > A lot of the problems that Quantopian experiences with Airflow can't be > tackled without either "hacks" on top of Airflow; or deep reworkings of > Airflow components. But that kind of rework is very challenging to > implement with the current Airflow contribution process.
Can you elaborate on what some of the problems are that Quantopian has encountered, which would require significant re-work to Airflow to address? On Wed, Apr 10, 2019 at 8:19 AM Driesprong, Fokko <fo...@driesprong.frl> wrote: > Hi James, > > Adressing your concerns one by one: > > - There are a lot of users of Airflow, but their use cases and feature > usage are not well described. Something that seems trivial or unnecessary > to one user turns out to be what someone else's entire workflow depends on. > > I think in general it is all about scheduling stuff. For me, this is also > true for many software packages. 80% of the users only use 20% of the > functionality. I think it is up to the committers to make sure that we > don't remove any functionality too easily, and break the workflow for > others. However, sometimes this is what you want, for example dropping > Python 2 support. I strongly believe that the flexibility offered by > Airflow is both a strength and a weakness, it allows you to do virtually > everything, on the other hand, maybe you should not do that :-) > > - The Airflow JIRA feels completely unmaintained. Most of the issues I've > reported have never even been acknowledged, and it's hard to know what > versions an issue applies to. This makes it hard to know what to work on or > what would be most impactful to other users. > > Keeping track of Jira is a full-time job. Periodically I go through all the > tickets, but it is also (mis)used for dumping stack traces, or any other > error. We should be more strict on this. As a community. If you're > interested in doing this, let me know so I can grand you editor > permissions. > > - Hacking on Airflow is challenging, especially if you need to run a real > workload to examine your changes. (I saw the work for an improved local dev > process - great stuff!) > > This is a known problem. I think the community is doing an awesome job > here. For example, Breeze by Polidea ( > https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by > ING/GoDataDriven ( > https://blog.godatadriven.com/open-source-airflow-local-development). > > - Keeping track of what's on master vs. what's in a release is challenging, > particularly since so many commits are for operators we'll never use. (I > know there's some discussion about breaking operators into their own repos, > and I hope that goes through.) > > The main job of the committers is to keep compatibility on the interfaces. > The versions are clearly set in Jira when a ticket is being worked on. > Based on if the change is compatible with the new minor version, it will be > included, otherwise, it will be set to the next major version. > > - The PMCs are too busy to guarantee timely reviews, and rebasing is > extremely costly with how much code reorganization is happening. This > strongly discourages putting in time to develop anything other than > relatively isolated features, often new features. > > The code grew rapidly over time. This required to reorganize a lot of code. > This is required to keep development possible and make the code more > accessible to newcomers. For example the splitting up of the infamous > models.py (a file with well over 6k lines), was quite a pain with circular > imports. This is periodically necessary to keep the code organized. Please > note that it isn't a task for only the PMC to do reviewing. But this is > also for the committers and contributors. If there any functionalities that > you use a lot, please also provide reviews on that topic. > > For me, being committer and PMC on the project is just something that I do > out of passion for Airflow. It isn't my job and I don't get paid for it. > That being said, I do agree with getting more committers on board to > strengthen the workforce. > > We're now preparing for Airflow 2.0, including a couple of AIP's. The > question if there will be a true container-native, or cloud-native version > of Airflow, is completely up to you and the community. I'm in favor of > jumping on the container train, but this requires to rework on the codebase > of Airflow. > > Cheers, Fokko > > > Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski < > szymon.przedwoj...@polidea.com>: > > > I think it is quite clear that Airflow needs more committers. > > Looking at AIPs, PRs and this devlist there are quite a few active people > > who might be a good fit to become them. > > With the community and the project growing I think this should be natural > > to increase the number of committers as well. I know there comes a new > > committer every now and then, but maybe it’s still not enough and maybe > > Airflow should recruit them more “aggressively”? > > > > Szymon Przedwojski > > Polidea | Software Engineer > > > > M: +48 500 330 790 > > E: szymon.przedwoj...@polidea.com > > > > > On 10 Apr 2019, at 16:47, airflowuser <airflowu...@protonmail.com > .INVALID> > > wrote: > > > > > > The Jira is a mess and it require committers time to organize it. > > > Ideally users should report issues and committers should tag them with > > priority, milestone / fix version, labels (This is how for example it's > > done with https://github.com/pandas-dev/pandas ) > > > > > > When I have time I try to stack list of Jira issues that require > > committers attention and ashb fix them but it's progressing slowly. > > > > > > I think that at least it would be great if the version field in the > Jira > > will be mandatory when user submit ticket. > > > > > > At the end... committers simply don't have time for this. They don't > > have enough time for reviewing PRs as well so I doubt something will > change > > in the near future. > > > > > > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > On Wednesday, April 10, 2019 5:18 PM, James Meickle < > > jmeic...@quantopian.com.INVALID> wrote: > > > > > >> Hi all, > > >> > > >> I've been following Airflow development fairly actively for over a > > year. In > > >> that time, the company I work at (Quantopian) has gone all-in on > > Airflow. > > >> It's a core part of our business and required for daily operations. > > >> > > >> However, I've had some concerns over the future of the project. Part > of > > >> these concerns are because it's difficult to contribute to Airflow: > > >> > > >> - There are a lot of users of Airflow, but their use cases and > feature > > >> usage are not well described. Something that seems trivial or > > unnecessary > > >> to one user turns out to be what someone else's entire workflow > > depends on. > > >> > > >> - The Airflow JIRA feels completely unmaintained. Most of the issues > > I've > > >> reported have never even been acknowledged, and it's hard to know > > what > > >> versions an issue applies to. This makes it hard to know what to > > work on or > > >> what would be most impactful to other users. > > >> > > >> - Hacking on Airflow is challenging, especially if you need to run a > > real > > >> workload to examine your changes. (I saw the work for an improved > > local dev > > >> process - great stuff!) > > >> > > >> - Keeping track of what's on master vs. what's in a release is > > challenging, > > >> particularly since so many commits are for operators we'll never > > use. (I > > >> know there's some discussion about breaking operators into their > own > > repos, > > >> and I hope that goes through.) > > >> > > >> - The PMCs are too busy to guarantee timely reviews, and rebasing is > > >> extremely costly with how much code reorganization is happening. > This > > >> strongly discourages putting in time to develop anything other than > > >> relatively isolated features, often new features. > > >> > > >> A lot of the problems that Quantopian experiences with Airflow > can't > > be > > >> tackled without either "hacks" on top of Airflow; or deep > reworkings > > of > > >> Airflow components. But that kind of rework is very challenging to > > >> implement with the current Airflow contribution process. > > >> > > >> I'm glad that we've recently adopted AIPs, but the way we're using > > them > > >> seems better suited to planning isolated features. The Airflow > > project does > > >> not have a well-maintained roadmap, nor any mechanism to produce > one > > by > > >> weighing AIPs based on synergy vs. developer interest vs. user > > interest. > > >> > > >> I think that this lack of long-term planning makes it even more > > challenging > > >> to propose larger reworks that might require multiple AIPs to > > implement, > > >> each of which individually might yield little benefit. I worry that > > we may > > >> approve a series of "promising" AIPs that, taken together, don't > > amount to > > >> anything greater than a "pile of new features"; instead of > balancing > > >> feature improvements with platform improvements that will unlock > more > > >> fundamental changes to how Airflow can work. > > >> > > >> I'd like to see some discussion of what it would look like to set > > long term > > >> goals for Airflow. What is Airflow 2 going to look like? How much > > backwards > > >> compat will it break? When should we expect Airflow 3? Are they > > going to be > > >> "business as usual" releases, or will they embrace any new concepts > > or > > >> idioms? Will there be a true container-native, or cloud-native > > version of > > >> Airflow? Will we work to be better for current users, or to embrace > > new > > >> classes of users? > > >> > > >> I have some thoughts of my own, of course, but I'd like to hear > what > > other > > >> people have to say on this topic first! > > >> > > > > > > > > > > >