Hello all, The Beam project has had problems similar to these also. One of the things they did is formalize how contributions are tracked. I understand that tracking this sort of information is difficult for the PMC, so if there's interest, I'd be happy to work with the PMC to make tools to track contributions (e.g. a simple spreadsheet tracking contributions on PRs, StackOverflow answering, public speaking, documentation, etc). So that we can streamline the "promotion" of new committers. This may also help incentivize "housekeeping" work, such as triaging of JIRA issues, testing, release management, etc.
This may also help provide early feedback to people on track to being a committer. (e.g. private emails of the kind "hi X. The Airflow PMC has noticed and appreciates your contributions. We think you could improve by doing Y or Z" Let me know what you all think. Best, Aizhamal On Wed, Apr 10, 2019 at 5:24 PM Gabriel Silk <gs...@dropbox.com.invalid> wrote: > > > > A lot of the problems that Quantopian experiences with Airflow can't be > > tackled without either "hacks" on top of Airflow; or deep reworkings of > > Airflow components. But that kind of rework is very challenging to > > implement with the current Airflow contribution process. > > > Can you elaborate on what some of the problems are that Quantopian has > encountered, which would require significant re-work to Airflow to address? > > On Wed, Apr 10, 2019 at 8:19 AM Driesprong, Fokko <fo...@driesprong.frl> > wrote: > > > Hi James, > > > > Adressing your concerns one by one: > > > > - There are a lot of users of Airflow, but their use cases and feature > > usage are not well described. Something that seems trivial or unnecessary > > to one user turns out to be what someone else's entire workflow depends > on. > > > > I think in general it is all about scheduling stuff. For me, this is also > > true for many software packages. 80% of the users only use 20% of the > > functionality. I think it is up to the committers to make sure that we > > don't remove any functionality too easily, and break the workflow for > > others. However, sometimes this is what you want, for example dropping > > Python 2 support. I strongly believe that the flexibility offered by > > Airflow is both a strength and a weakness, it allows you to do virtually > > everything, on the other hand, maybe you should not do that :-) > > > > - The Airflow JIRA feels completely unmaintained. Most of the issues I've > > reported have never even been acknowledged, and it's hard to know what > > versions an issue applies to. This makes it hard to know what to work on > or > > what would be most impactful to other users. > > > > Keeping track of Jira is a full-time job. Periodically I go through all > the > > tickets, but it is also (mis)used for dumping stack traces, or any other > > error. We should be more strict on this. As a community. If you're > > interested in doing this, let me know so I can grand you editor > > permissions. > > > > - Hacking on Airflow is challenging, especially if you need to run a real > > workload to examine your changes. (I saw the work for an improved local > dev > > process - great stuff!) > > > > This is a known problem. I think the community is doing an awesome job > > here. For example, Breeze by Polidea ( > > https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by > > ING/GoDataDriven ( > > https://blog.godatadriven.com/open-source-airflow-local-development). > > > > - Keeping track of what's on master vs. what's in a release is > challenging, > > particularly since so many commits are for operators we'll never use. (I > > know there's some discussion about breaking operators into their own > repos, > > and I hope that goes through.) > > > > The main job of the committers is to keep compatibility on the > interfaces. > > The versions are clearly set in Jira when a ticket is being worked on. > > Based on if the change is compatible with the new minor version, it will > be > > included, otherwise, it will be set to the next major version. > > > > - The PMCs are too busy to guarantee timely reviews, and rebasing is > > extremely costly with how much code reorganization is happening. This > > strongly discourages putting in time to develop anything other than > > relatively isolated features, often new features. > > > > The code grew rapidly over time. This required to reorganize a lot of > code. > > This is required to keep development possible and make the code more > > accessible to newcomers. For example the splitting up of the infamous > > models.py (a file with well over 6k lines), was quite a pain with > circular > > imports. This is periodically necessary to keep the code organized. > Please > > note that it isn't a task for only the PMC to do reviewing. But this is > > also for the committers and contributors. If there any functionalities > that > > you use a lot, please also provide reviews on that topic. > > > > For me, being committer and PMC on the project is just something that I > do > > out of passion for Airflow. It isn't my job and I don't get paid for it. > > That being said, I do agree with getting more committers on board to > > strengthen the workforce. > > > > We're now preparing for Airflow 2.0, including a couple of AIP's. The > > question if there will be a true container-native, or cloud-native > version > > of Airflow, is completely up to you and the community. I'm in favor of > > jumping on the container train, but this requires to rework on the > codebase > > of Airflow. > > > > Cheers, Fokko > > > > > > Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski < > > szymon.przedwoj...@polidea.com>: > > > > > I think it is quite clear that Airflow needs more committers. > > > Looking at AIPs, PRs and this devlist there are quite a few active > people > > > who might be a good fit to become them. > > > With the community and the project growing I think this should be > natural > > > to increase the number of committers as well. I know there comes a new > > > committer every now and then, but maybe it’s still not enough and maybe > > > Airflow should recruit them more “aggressively”? > > > > > > Szymon Przedwojski > > > Polidea | Software Engineer > > > > > > M: +48 500 330 790 > > > E: szymon.przedwoj...@polidea.com > > > > > > > On 10 Apr 2019, at 16:47, airflowuser <airflowu...@protonmail.com > > .INVALID> > > > wrote: > > > > > > > > The Jira is a mess and it require committers time to organize it. > > > > Ideally users should report issues and committers should tag them > with > > > priority, milestone / fix version, labels (This is how for example > it's > > > done with https://github.com/pandas-dev/pandas ) > > > > > > > > When I have time I try to stack list of Jira issues that require > > > committers attention and ashb fix them but it's progressing slowly. > > > > > > > > I think that at least it would be great if the version field in the > > Jira > > > will be mandatory when user submit ticket. > > > > > > > > At the end... committers simply don't have time for this. They don't > > > have enough time for reviewing PRs as well so I doubt something will > > change > > > in the near future. > > > > > > > > > > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > On Wednesday, April 10, 2019 5:18 PM, James Meickle < > > > jmeic...@quantopian.com.INVALID> wrote: > > > > > > > >> Hi all, > > > >> > > > >> I've been following Airflow development fairly actively for over a > > > year. In > > > >> that time, the company I work at (Quantopian) has gone all-in on > > > Airflow. > > > >> It's a core part of our business and required for daily operations. > > > >> > > > >> However, I've had some concerns over the future of the project. Part > > of > > > >> these concerns are because it's difficult to contribute to Airflow: > > > >> > > > >> - There are a lot of users of Airflow, but their use cases and > > feature > > > >> usage are not well described. Something that seems trivial or > > > unnecessary > > > >> to one user turns out to be what someone else's entire workflow > > > depends on. > > > >> > > > >> - The Airflow JIRA feels completely unmaintained. Most of the > issues > > > I've > > > >> reported have never even been acknowledged, and it's hard to know > > > what > > > >> versions an issue applies to. This makes it hard to know what to > > > work on or > > > >> what would be most impactful to other users. > > > >> > > > >> - Hacking on Airflow is challenging, especially if you need to > run a > > > real > > > >> workload to examine your changes. (I saw the work for an improved > > > local dev > > > >> process - great stuff!) > > > >> > > > >> - Keeping track of what's on master vs. what's in a release is > > > challenging, > > > >> particularly since so many commits are for operators we'll never > > > use. (I > > > >> know there's some discussion about breaking operators into their > > own > > > repos, > > > >> and I hope that goes through.) > > > >> > > > >> - The PMCs are too busy to guarantee timely reviews, and rebasing > is > > > >> extremely costly with how much code reorganization is happening. > > This > > > >> strongly discourages putting in time to develop anything other > than > > > >> relatively isolated features, often new features. > > > >> > > > >> A lot of the problems that Quantopian experiences with Airflow > > can't > > > be > > > >> tackled without either "hacks" on top of Airflow; or deep > > reworkings > > > of > > > >> Airflow components. But that kind of rework is very challenging > to > > > >> implement with the current Airflow contribution process. > > > >> > > > >> I'm glad that we've recently adopted AIPs, but the way we're > using > > > them > > > >> seems better suited to planning isolated features. The Airflow > > > project does > > > >> not have a well-maintained roadmap, nor any mechanism to produce > > one > > > by > > > >> weighing AIPs based on synergy vs. developer interest vs. user > > > interest. > > > >> > > > >> I think that this lack of long-term planning makes it even more > > > challenging > > > >> to propose larger reworks that might require multiple AIPs to > > > implement, > > > >> each of which individually might yield little benefit. I worry > that > > > we may > > > >> approve a series of "promising" AIPs that, taken together, don't > > > amount to > > > >> anything greater than a "pile of new features"; instead of > > balancing > > > >> feature improvements with platform improvements that will unlock > > more > > > >> fundamental changes to how Airflow can work. > > > >> > > > >> I'd like to see some discussion of what it would look like to set > > > long term > > > >> goals for Airflow. What is Airflow 2 going to look like? How much > > > backwards > > > >> compat will it break? When should we expect Airflow 3? Are they > > > going to be > > > >> "business as usual" releases, or will they embrace any new > concepts > > > or > > > >> idioms? Will there be a true container-native, or cloud-native > > > version of > > > >> Airflow? Will we work to be better for current users, or to > embrace > > > new > > > >> classes of users? > > > >> > > > >> I have some thoughts of my own, of course, but I'd like to hear > > what > > > other > > > >> people have to say on this topic first! > > > >> > > > > > > > > > > > > > > > > >