Hi James, Adressing your concerns one by one:
- There are a lot of users of Airflow, but their use cases and feature usage are not well described. Something that seems trivial or unnecessary to one user turns out to be what someone else's entire workflow depends on. I think in general it is all about scheduling stuff. For me, this is also true for many software packages. 80% of the users only use 20% of the functionality. I think it is up to the committers to make sure that we don't remove any functionality too easily, and break the workflow for others. However, sometimes this is what you want, for example dropping Python 2 support. I strongly believe that the flexibility offered by Airflow is both a strength and a weakness, it allows you to do virtually everything, on the other hand, maybe you should not do that :-) - The Airflow JIRA feels completely unmaintained. Most of the issues I've reported have never even been acknowledged, and it's hard to know what versions an issue applies to. This makes it hard to know what to work on or what would be most impactful to other users. Keeping track of Jira is a full-time job. Periodically I go through all the tickets, but it is also (mis)used for dumping stack traces, or any other error. We should be more strict on this. As a community. If you're interested in doing this, let me know so I can grand you editor permissions. - Hacking on Airflow is challenging, especially if you need to run a real workload to examine your changes. (I saw the work for an improved local dev process - great stuff!) This is a known problem. I think the community is doing an awesome job here. For example, Breeze by Polidea ( https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by ING/GoDataDriven ( https://blog.godatadriven.com/open-source-airflow-local-development). - Keeping track of what's on master vs. what's in a release is challenging, particularly since so many commits are for operators we'll never use. (I know there's some discussion about breaking operators into their own repos, and I hope that goes through.) The main job of the committers is to keep compatibility on the interfaces. The versions are clearly set in Jira when a ticket is being worked on. Based on if the change is compatible with the new minor version, it will be included, otherwise, it will be set to the next major version. - The PMCs are too busy to guarantee timely reviews, and rebasing is extremely costly with how much code reorganization is happening. This strongly discourages putting in time to develop anything other than relatively isolated features, often new features. The code grew rapidly over time. This required to reorganize a lot of code. This is required to keep development possible and make the code more accessible to newcomers. For example the splitting up of the infamous models.py (a file with well over 6k lines), was quite a pain with circular imports. This is periodically necessary to keep the code organized. Please note that it isn't a task for only the PMC to do reviewing. But this is also for the committers and contributors. If there any functionalities that you use a lot, please also provide reviews on that topic. For me, being committer and PMC on the project is just something that I do out of passion for Airflow. It isn't my job and I don't get paid for it. That being said, I do agree with getting more committers on board to strengthen the workforce. We're now preparing for Airflow 2.0, including a couple of AIP's. The question if there will be a true container-native, or cloud-native version of Airflow, is completely up to you and the community. I'm in favor of jumping on the container train, but this requires to rework on the codebase of Airflow. Cheers, Fokko Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski < szymon.przedwoj...@polidea.com>: > I think it is quite clear that Airflow needs more committers. > Looking at AIPs, PRs and this devlist there are quite a few active people > who might be a good fit to become them. > With the community and the project growing I think this should be natural > to increase the number of committers as well. I know there comes a new > committer every now and then, but maybe it’s still not enough and maybe > Airflow should recruit them more “aggressively”? > > Szymon Przedwojski > Polidea | Software Engineer > > M: +48 500 330 790 > E: szymon.przedwoj...@polidea.com > > > On 10 Apr 2019, at 16:47, airflowuser <airflowu...@protonmail.com.INVALID> > wrote: > > > > The Jira is a mess and it require committers time to organize it. > > Ideally users should report issues and committers should tag them with > priority, milestone / fix version, labels (This is how for example it's > done with https://github.com/pandas-dev/pandas ) > > > > When I have time I try to stack list of Jira issues that require > committers attention and ashb fix them but it's progressing slowly. > > > > I think that at least it would be great if the version field in the Jira > will be mandatory when user submit ticket. > > > > At the end... committers simply don't have time for this. They don't > have enough time for reviewing PRs as well so I doubt something will change > in the near future. > > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Wednesday, April 10, 2019 5:18 PM, James Meickle < > jmeic...@quantopian.com.INVALID> wrote: > > > >> Hi all, > >> > >> I've been following Airflow development fairly actively for over a > year. In > >> that time, the company I work at (Quantopian) has gone all-in on > Airflow. > >> It's a core part of our business and required for daily operations. > >> > >> However, I've had some concerns over the future of the project. Part of > >> these concerns are because it's difficult to contribute to Airflow: > >> > >> - There are a lot of users of Airflow, but their use cases and feature > >> usage are not well described. Something that seems trivial or > unnecessary > >> to one user turns out to be what someone else's entire workflow > depends on. > >> > >> - The Airflow JIRA feels completely unmaintained. Most of the issues > I've > >> reported have never even been acknowledged, and it's hard to know > what > >> versions an issue applies to. This makes it hard to know what to > work on or > >> what would be most impactful to other users. > >> > >> - Hacking on Airflow is challenging, especially if you need to run a > real > >> workload to examine your changes. (I saw the work for an improved > local dev > >> process - great stuff!) > >> > >> - Keeping track of what's on master vs. what's in a release is > challenging, > >> particularly since so many commits are for operators we'll never > use. (I > >> know there's some discussion about breaking operators into their own > repos, > >> and I hope that goes through.) > >> > >> - The PMCs are too busy to guarantee timely reviews, and rebasing is > >> extremely costly with how much code reorganization is happening. This > >> strongly discourages putting in time to develop anything other than > >> relatively isolated features, often new features. > >> > >> A lot of the problems that Quantopian experiences with Airflow can't > be > >> tackled without either "hacks" on top of Airflow; or deep reworkings > of > >> Airflow components. But that kind of rework is very challenging to > >> implement with the current Airflow contribution process. > >> > >> I'm glad that we've recently adopted AIPs, but the way we're using > them > >> seems better suited to planning isolated features. The Airflow > project does > >> not have a well-maintained roadmap, nor any mechanism to produce one > by > >> weighing AIPs based on synergy vs. developer interest vs. user > interest. > >> > >> I think that this lack of long-term planning makes it even more > challenging > >> to propose larger reworks that might require multiple AIPs to > implement, > >> each of which individually might yield little benefit. I worry that > we may > >> approve a series of "promising" AIPs that, taken together, don't > amount to > >> anything greater than a "pile of new features"; instead of balancing > >> feature improvements with platform improvements that will unlock more > >> fundamental changes to how Airflow can work. > >> > >> I'd like to see some discussion of what it would look like to set > long term > >> goals for Airflow. What is Airflow 2 going to look like? How much > backwards > >> compat will it break? When should we expect Airflow 3? Are they > going to be > >> "business as usual" releases, or will they embrace any new concepts > or > >> idioms? Will there be a true container-native, or cloud-native > version of > >> Airflow? Will we work to be better for current users, or to embrace > new > >> classes of users? > >> > >> I have some thoughts of my own, of course, but I'd like to hear what > other > >> people have to say on this topic first! > >> > > > > > >