Some personal thoughts about the PR processing speed specifically. I'm trying to benchmark Airflow with other Apache projects (like Spark, Kafka), in terms of PR reviewing/merging speed: as at this moment, there are 400+ open PRs in Spark and 500+ open PRs in Kafka. On the other hand, there are 26 committers of Kafka and 68 committers of Spark. For Airflow, we have less than 20 committers, and recently the # of open PRs remain at about 200.
(highlight: this benchmarking is not 100% precise, as I didn't consider the total # of PRs processed per day. But seems the # of commits per day of Kafka is roughly close to Airflow) Don't get me wrong: I never think we have done well enough, and I do agree that there is big room for improvement. But to be fair, the situation of Airflow here is not that bad. I was just nominated as a committer about 1 month ago. Earlier as a PR submitter, I also had the feeling "why my PRs are processed so slowly"; but now when I start to consider more about reviewing/approving/merging, I realize the current pace is fairly good (big thanks to the other committers). Another thing I would like to suggest. Currently we committers almost never give "-1" for PRs. Even when committers disagree on a change proposal, they don’t close it. I would like to suggest PMC to have this discussion: whether we can close a PR is we have a few "-1"s from committers (say 3 or 4). I believe this would somehow help. Best regards, XD On Thu, Apr 11, 2019 at 13:54 airflowuser <airflowu...@protonmail.com.invalid> wrote: > 1. Getting more contributes is important but it's also important to give > attention to the current contributes. > I noticed that if PR had no reviews and it reached page 3 and above it is > likely to be forgotten. > take this one for example: > https://github.com/apache/airflow/pull/4473 > The author is required to rebase again. It's not very "welcomey" to new > contributes. There are more open PRs like this. One suggestion might be a > monthly status check of all open PRs to see if something was missed? > > > 2. The attention of committers doesn't always pointer to what the > community needs. Check this one > https://github.com/apache/airflow/pull/1936 a problem that bugs many > people but there is no discussion how to solve this. There has been more > than 4 releases after this PR was introduced and the problem it tries to > fix wasn't addressed nor discussed. The author commented that he can update > the branch but he needs committers to be involved. > > Again, since everything is volunteer base it make sense and understandable > however if the project wishes to get more contributors it might be easier > to start with the PRs that we already have rather than putting effort on > trying to invite new contributors. > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Thursday, April 11, 2019 3:46 AM, Aizhamal Nurmamat kyzy > <aizha...@google.com.INVALID> wrote: > > > Hello all, > > > > The Beam project has had problems similar to these also. One of the > things > > they did is formalize how contributions are tracked. I understand that > > tracking this sort of information is difficult for the PMC, so if there's > > interest, I'd be happy to work with the PMC to make tools to track > > contributions (e.g. a simple spreadsheet tracking contributions on PRs, > > StackOverflow answering, public speaking, documentation, etc). So that we > > can streamline the "promotion" of new committers. This may also help > > incentivize "housekeeping" work, such as triaging of JIRA issues, > testing, > > release management, etc. > > > > This may also help provide early feedback to people on track to being a > > committer. (e.g. private emails of the kind "hi X. The Airflow PMC has > > noticed and appreciates your contributions. We think you could improve by > > doing Y or Z" > > > > Let me know what you all think. > > > > Best, > > Aizhamal > > > > On Wed, Apr 10, 2019 at 5:24 PM Gabriel Silk gs...@dropbox.com.invalid > > wrote: > > > > > > A lot of the problems that Quantopian experiences with Airflow can't > be > > > > tackled without either "hacks" on top of Airflow; or deep reworkings > of > > > > Airflow components. But that kind of rework is very challenging to > > > > implement with the current Airflow contribution process. > > > > > > Can you elaborate on what some of the problems are that Quantopian has > > > encountered, which would require significant re-work to Airflow to > address? > > > On Wed, Apr 10, 2019 at 8:19 AM Driesprong, Fokko fo...@driesprong.frl > > > wrote: > > > > > > > Hi James, > > > > Adressing your concerns one by one: > > > > > > > > - There are a lot of users of Airflow, but their use cases and > feature > > > > usage are not well described. Something that seems trivial or > unnecessary > > > > to one user turns out to be what someone else's entire workflow > depends > > > > on. > > > > > > > > > > > > I think in general it is all about scheduling stuff. For me, this is > also > > > > true for many software packages. 80% of the users only use 20% of the > > > > functionality. I think it is up to the committers to make sure that > we > > > > don't remove any functionality too easily, and break the workflow for > > > > others. However, sometimes this is what you want, for example > dropping > > > > Python 2 support. I strongly believe that the flexibility offered by > > > > Airflow is both a strength and a weakness, it allows you to do > virtually > > > > everything, on the other hand, maybe you should not do that :-) > > > > > > > > - The Airflow JIRA feels completely unmaintained. Most of the > issues I've > > > > reported have never even been acknowledged, and it's hard to > know what > > > > versions an issue applies to. This makes it hard to know what to > work on > > > > or > > > > what would be most impactful to other users. > > > > > > > > > > > > Keeping track of Jira is a full-time job. Periodically I go through > all > > > > the > > > > tickets, but it is also (mis)used for dumping stack traces, or any > other > > > > error. We should be more strict on this. As a community. If you're > > > > interested in doing this, let me know so I can grand you editor > > > > permissions. > > > > > > > > - Hacking on Airflow is challenging, especially if you need to run > a real > > > > workload to examine your changes. (I saw the work for an > improved local > > > > dev > > > > process - great stuff!) > > > > > > > > > > > > This is a known problem. I think the community is doing an awesome > job > > > > here. For example, Breeze by Polidea ( > > > > https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by > > > > ING/GoDataDriven ( > > > > https://blog.godatadriven.com/open-source-airflow-local-development > ). > > > > > > > > - Keeping track of what's on master vs. what's in a release is > > > > challenging, > > > > particularly since so many commits are for operators we'll never > use. (I > > > > know there's some discussion about breaking operators into their > own > > > > repos, > > > > and I hope that goes through.) > > > > > > > > > > > > The main job of the committers is to keep compatibility on the > > > > interfaces. > > > > The versions are clearly set in Jira when a ticket is being worked > on. > > > > Based on if the change is compatible with the new minor version, it > will > > > > be > > > > included, otherwise, it will be set to the next major version. > > > > > > > > - The PMCs are too busy to guarantee timely reviews, and rebasing > is > > > > extremely costly with how much code reorganization is happening. > This > > > > strongly discourages putting in time to develop anything other > than > > > > relatively isolated features, often new features. > > > > > > > > > > > > The code grew rapidly over time. This required to reorganize a lot of > > > > code. > > > > This is required to keep development possible and make the code more > > > > accessible to newcomers. For example the splitting up of the infamous > > > > models.py (a file with well over 6k lines), was quite a pain with > > > > circular > > > > imports. This is periodically necessary to keep the code organized. > > > > Please > > > > note that it isn't a task for only the PMC to do reviewing. But this > is > > > > also for the committers and contributors. If there any > functionalities > > > > that > > > > you use a lot, please also provide reviews on that topic. > > > > For me, being committer and PMC on the project is just something > that I > > > > do > > > > out of passion for Airflow. It isn't my job and I don't get paid for > it. > > > > That being said, I do agree with getting more committers on board to > > > > strengthen the workforce. > > > > We're now preparing for Airflow 2.0, including a couple of AIP's. The > > > > question if there will be a true container-native, or cloud-native > > > > version > > > > of Airflow, is completely up to you and the community. I'm in favor > of > > > > jumping on the container train, but this requires to rework on the > > > > codebase > > > > of Airflow. > > > > Cheers, Fokko > > > > Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski < > > > > szymon.przedwoj...@polidea.com>: > > > > > > > > > I think it is quite clear that Airflow needs more committers. > > > > > Looking at AIPs, PRs and this devlist there are quite a few active > > > > > people > > > > > > > > > who might be a good fit to become them. > > > > > With the community and the project growing I think this should be > > > > > natural > > > > > > > > > to increase the number of committers as well. I know there comes a > new > > > > > committer every now and then, but maybe it’s still not enough and > maybe > > > > > Airflow should recruit them more “aggressively”? > > > > > Szymon Przedwojski > > > > > Polidea | Software Engineer > > > > > M: +48 500 330 790 > > > > > E: szymon.przedwoj...@polidea.com > > > > > > > > > > > On 10 Apr 2019, at 16:47, airflowuser < > airflowu...@protonmail.com > > > > > > .INVALID> > > > > > > wrote: > > > > > > > > > > > The Jira is a mess and it require committers time to organize it. > > > > > > Ideally users should report issues and committers should tag them > > > > > > with > > > > > > > > > priority, milestone / fix version, labels (This is how for example > > > > > it's > > > > > > > > > done with https://github.com/pandas-dev/pandas ) > > > > > > > > > > > When I have time I try to stack list of Jira issues that require > > > > > > committers attention and ashb fix them but it's progressing > slowly. > > > > > > I think that at least it would be great if the version field in > the > > > > > > Jira > > > > > > will be mandatory when user submit ticket. > > > > > > > > > > > At the end... committers simply don't have time for this. They > don't > > > > > > have enough time for reviewing PRs as well so I doubt something > will > > > > > > change > > > > > > in the near future. > > > > > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > > > On Wednesday, April 10, 2019 5:18 PM, James Meickle < > > > > > > jmeic...@quantopian.com.INVALID> wrote: > > > > > > > > > > > > > Hi all, > > > > > > > I've been following Airflow development fairly actively for > over a > > > > > > > year. In > > > > > > > > > > > > > that time, the company I work at (Quantopian) has gone all-in > on > > > > > > > Airflow. > > > > > > > > > > > > > It's a core part of our business and required for daily > operations. > > > > > > > However, I've had some concerns over the future of the > project. Part > > > > > > > of > > > > > > > > > > > > these concerns are because it's difficult to contribute to > Airflow: > > > > > > > > > > > > > > - There are a lot of users of Airflow, but their use cases > and > > > > > > > feature > > > > > > > > > > > > > > > > > > > usage are not well described. Something that seems trivial or > > > > > > > unnecessary > > > > > > > > > > > > > to one user turns out to be what someone else's entire workflow > > > > > > > depends on. > > > > > > > > > > > > > - The Airflow JIRA feels completely unmaintained. Most of the > > > > > > > issues > > > > > > > > > > > > > > > > I've > > > > > > > > > > > > reported have never even been acknowledged, and it's hard to > know > > > > > > > what > > > > > > > > > > > > > versions an issue applies to. This makes it hard to know what > to > > > > > > > work on or > > > > > > > > > > > > > what would be most impactful to other users. > > > > > > > > > > > > > > - Hacking on Airflow is challenging, especially if you need > to > > > > > > > run a > > > > > > > > > > > > > > > > real > > > > > > > > > > > > workload to examine your changes. (I saw the work for an > improved > > > > > > > local dev > > > > > > > > > > > > > process - great stuff!) > > > > > > > > > > > > > > - Keeping track of what's on master vs. what's in a release > is > > > > > > > challenging, > > > > > > > > > > > > > > > > > > > > particularly since so many commits are for operators we'll > never > > > > > > > use. (I > > > > > > > > > > > > > know there's some discussion about breaking operators into > their > > > > > > > own > > > > > > > repos, > > > > > > > > > > > > and I hope that goes through.) > > > > > > > > > > > > > > - The PMCs are too busy to guarantee timely reviews, and > rebasing > > > > > > > is > > > > > > > > > > > > > > > > > > extremely costly with how much code reorganization is > happening. > > > > > > > This > > > > > > > > > > > > strongly discourages putting in time to develop anything other > > > > > > > than > > > > > > > > > > > relatively isolated features, often new features. > > > > > > > A lot of the problems that Quantopian experiences with Airflow > > > > > > > can't > > > > > > > be > > > > > > > > > > > > tackled without either "hacks" on top of Airflow; or deep > > > > > > > reworkings > > > > > > > of > > > > > > > > > > > > Airflow components. But that kind of rework is very challenging > > > > > > > to > > > > > > > > > > > implement with the current Airflow contribution process. > > > > > > > I'm glad that we've recently adopted AIPs, but the way we're > > > > > > > using > > > > > > > > > them > > > > > > > > > > > > seems better suited to planning isolated features. The Airflow > > > > > > > project does > > > > > > > > > > > > > not have a well-maintained roadmap, nor any mechanism to > produce > > > > > > > one > > > > > > > by > > > > > > > > > > > > weighing AIPs based on synergy vs. developer interest vs. user > > > > > > > interest. > > > > > > > > > > > > > I think that this lack of long-term planning makes it even more > > > > > > > challenging > > > > > > > > > > > > > to propose larger reworks that might require multiple AIPs to > > > > > > > implement, > > > > > > > > > > > > > each of which individually might yield little benefit. I worry > > > > > > > that > > > > > > > > > we may > > > > > > > > > > > > approve a series of "promising" AIPs that, taken together, > don't > > > > > > > amount to > > > > > > > > > > > > > anything greater than a "pile of new features"; instead of > > > > > > > balancing > > > > > > > > > > > > feature improvements with platform improvements that will > unlock > > > > > > > more > > > > > > > > > > > > fundamental changes to how Airflow can work. > > > > > > > I'd like to see some discussion of what it would look like to > set > > > > > > > long term > > > > > > > > > > > > > goals for Airflow. What is Airflow 2 going to look like? How > much > > > > > > > backwards > > > > > > > > > > > > > compat will it break? When should we expect Airflow 3? Are they > > > > > > > going to be > > > > > > > > > > > > > "business as usual" releases, or will they embrace any new > > > > > > > concepts > > > > > > > > > or > > > > > > > > > > > > idioms? Will there be a true container-native, or cloud-native > > > > > > > version of > > > > > > > > > > > > > Airflow? Will we work to be better for current users, or to > > > > > > > embrace > > > > > > > > > new > > > > > > > > > > > > classes of users? > > > > > > > I have some thoughts of my own, of course, but I'd like to hear > > > > > > > what > > > > > > > other > > > > > > > > > > > > people have to say on this topic first! > > >