Some personal thoughts about the PR processing speed specifically.

I'm trying to benchmark Airflow with other Apache projects (like Spark,
Kafka), in terms of PR reviewing/merging speed: as at this moment, there
are 400+ open PRs in Spark and 500+ open PRs in Kafka. On the other hand,
there are 26 committers of Kafka and 68 committers of Spark. For Airflow,
we have less than 20 committers, and recently the # of open PRs remain at
about 200.

(highlight: this benchmarking is not 100% precise, as I didn't consider the
total # of PRs processed per day. But seems the # of commits per day of
Kafka is roughly close to Airflow)

Don't get me wrong: I never think we have done well enough, and I do agree
that there is big room for improvement. But to be fair, the situation of
Airflow here is not that bad.

I was just nominated as a committer about 1 month ago. Earlier as a PR
submitter, I also had the feeling "why my PRs are processed so slowly"; but
now when I start to consider more about reviewing/approving/merging, I
realize the current pace is fairly good (big thanks to the other
committers).

Another thing I would like to suggest. Currently we committers almost never
give "-1" for PRs. Even when committers disagree on a change proposal, they
don’t close it. I would like to suggest PMC to have this discussion:
whether we can close a PR is we have a few "-1"s from committers (say 3 or
4). I believe this would somehow help.


Best regards,

XD

On Thu, Apr 11, 2019 at 13:54 airflowuser
<airflowu...@protonmail.com.invalid> wrote:

> 1. Getting more contributes is important but it's also important to give
> attention to the current contributes.
> I noticed that if PR had no reviews and it reached page 3 and above it is
> likely to be forgotten.
> take this one for example:
> https://github.com/apache/airflow/pull/4473
> The author is required to rebase again. It's not very "welcomey" to new
> contributes. There are more open PRs like this. One suggestion might be a
> monthly status check of all open PRs to see if something was missed?
>
>
> 2. The attention of committers doesn't always pointer to what the
> community needs. Check this one
> https://github.com/apache/airflow/pull/1936 a problem that bugs many
> people but there is no discussion how to solve this. There has been more
> than 4 releases after this PR was introduced and the problem it tries to
> fix wasn't addressed nor discussed. The author commented that he can update
> the branch but he needs committers to be involved.
>
> Again, since everything is volunteer base it make sense and understandable
> however if the project wishes to get more contributors it might be easier
> to start with the PRs that we already have rather than putting effort on
> trying to invite new contributors.
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, April 11, 2019 3:46 AM, Aizhamal Nurmamat kyzy
> <aizha...@google.com.INVALID> wrote:
>
> > Hello all,
> >
> > The Beam project has had problems similar to these also. One of the
> things
> > they did is formalize how contributions are tracked. I understand that
> > tracking this sort of information is difficult for the PMC, so if there's
> > interest, I'd be happy to work with the PMC to make tools to track
> > contributions (e.g. a simple spreadsheet tracking contributions on PRs,
> > StackOverflow answering, public speaking, documentation, etc). So that we
> > can streamline the "promotion" of new committers. This may also help
> > incentivize "housekeeping" work, such as triaging of JIRA issues,
> testing,
> > release management, etc.
> >
> > This may also help provide early feedback to people on track to being a
> > committer. (e.g. private emails of the kind "hi X. The Airflow PMC has
> > noticed and appreciates your contributions. We think you could improve by
> > doing Y or Z"
> >
> > Let me know what you all think.
> >
> > Best,
> > Aizhamal
> >
> > On Wed, Apr 10, 2019 at 5:24 PM Gabriel Silk gs...@dropbox.com.invalid
> > wrote:
> >
> > > > A lot of the problems that Quantopian experiences with Airflow can't
> be
> > > > tackled without either "hacks" on top of Airflow; or deep reworkings
> of
> > > > Airflow components. But that kind of rework is very challenging to
> > > > implement with the current Airflow contribution process.
> > >
> > > Can you elaborate on what some of the problems are that Quantopian has
> > > encountered, which would require significant re-work to Airflow to
> address?
> > > On Wed, Apr 10, 2019 at 8:19 AM Driesprong, Fokko fo...@driesprong.frl
> > > wrote:
> > >
> > > > Hi James,
> > > > Adressing your concerns one by one:
> > > >
> > > > -   There are a lot of users of Airflow, but their use cases and
> feature
> > > >     usage are not well described. Something that seems trivial or
> unnecessary
> > > >     to one user turns out to be what someone else's entire workflow
> depends
> > > >     on.
> > > >
> > > >
> > > > I think in general it is all about scheduling stuff. For me, this is
> also
> > > > true for many software packages. 80% of the users only use 20% of the
> > > > functionality. I think it is up to the committers to make sure that
> we
> > > > don't remove any functionality too easily, and break the workflow for
> > > > others. However, sometimes this is what you want, for example
> dropping
> > > > Python 2 support. I strongly believe that the flexibility offered by
> > > > Airflow is both a strength and a weakness, it allows you to do
> virtually
> > > > everything, on the other hand, maybe you should not do that :-)
> > > >
> > > > -   The Airflow JIRA feels completely unmaintained. Most of the
> issues I've
> > > >     reported have never even been acknowledged, and it's hard to
> know what
> > > >     versions an issue applies to. This makes it hard to know what to
> work on
> > > >     or
> > > >     what would be most impactful to other users.
> > > >
> > > >
> > > > Keeping track of Jira is a full-time job. Periodically I go through
> all
> > > > the
> > > > tickets, but it is also (mis)used for dumping stack traces, or any
> other
> > > > error. We should be more strict on this. As a community. If you're
> > > > interested in doing this, let me know so I can grand you editor
> > > > permissions.
> > > >
> > > > -   Hacking on Airflow is challenging, especially if you need to run
> a real
> > > >     workload to examine your changes. (I saw the work for an
> improved local
> > > >     dev
> > > >     process - great stuff!)
> > > >
> > > >
> > > > This is a known problem. I think the community is doing an awesome
> job
> > > > here. For example, Breeze by Polidea (
> > > > https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by
> > > > ING/GoDataDriven (
> > > > https://blog.godatadriven.com/open-source-airflow-local-development
> ).
> > > >
> > > > -   Keeping track of what's on master vs. what's in a release is
> > > >     challenging,
> > > >     particularly since so many commits are for operators we'll never
> use. (I
> > > >     know there's some discussion about breaking operators into their
> own
> > > >     repos,
> > > >     and I hope that goes through.)
> > > >
> > > >
> > > > The main job of the committers is to keep compatibility on the
> > > > interfaces.
> > > > The versions are clearly set in Jira when a ticket is being worked
> on.
> > > > Based on if the change is compatible with the new minor version, it
> will
> > > > be
> > > > included, otherwise, it will be set to the next major version.
> > > >
> > > > -   The PMCs are too busy to guarantee timely reviews, and rebasing
> is
> > > >     extremely costly with how much code reorganization is happening.
> This
> > > >     strongly discourages putting in time to develop anything other
> than
> > > >     relatively isolated features, often new features.
> > > >
> > > >
> > > > The code grew rapidly over time. This required to reorganize a lot of
> > > > code.
> > > > This is required to keep development possible and make the code more
> > > > accessible to newcomers. For example the splitting up of the infamous
> > > > models.py (a file with well over 6k lines), was quite a pain with
> > > > circular
> > > > imports. This is periodically necessary to keep the code organized.
> > > > Please
> > > > note that it isn't a task for only the PMC to do reviewing. But this
> is
> > > > also for the committers and contributors. If there any
> functionalities
> > > > that
> > > > you use a lot, please also provide reviews on that topic.
> > > > For me, being committer and PMC on the project is just something
> that I
> > > > do
> > > > out of passion for Airflow. It isn't my job and I don't get paid for
> it.
> > > > That being said, I do agree with getting more committers on board to
> > > > strengthen the workforce.
> > > > We're now preparing for Airflow 2.0, including a couple of AIP's. The
> > > > question if there will be a true container-native, or cloud-native
> > > > version
> > > > of Airflow, is completely up to you and the community. I'm in favor
> of
> > > > jumping on the container train, but this requires to rework on the
> > > > codebase
> > > > of Airflow.
> > > > Cheers, Fokko
> > > > Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski <
> > > > szymon.przedwoj...@polidea.com>:
> > > >
> > > > > I think it is quite clear that Airflow needs more committers.
> > > > > Looking at AIPs, PRs and this devlist there are quite a few active
> > > > > people
> > > >
> > > > > who might be a good fit to become them.
> > > > > With the community and the project growing I think this should be
> > > > > natural
> > > >
> > > > > to increase the number of committers as well. I know there comes a
> new
> > > > > committer every now and then, but maybe it’s still not enough and
> maybe
> > > > > Airflow should recruit them more “aggressively”?
> > > > > Szymon Przedwojski
> > > > > Polidea | Software Engineer
> > > > > M: +48 500 330 790
> > > > > E: szymon.przedwoj...@polidea.com
> > > > >
> > > > > > On 10 Apr 2019, at 16:47, airflowuser <
> airflowu...@protonmail.com
> > > > > > .INVALID>
> > > > > > wrote:
> > > > >
> > > > > > The Jira is a mess and it require committers time to organize it.
> > > > > > Ideally users should report issues and committers should tag them
> > > > > > with
> > > >
> > > > > priority, milestone / fix version, labels (This is how for example
> > > > > it's
> > > >
> > > > > done with https://github.com/pandas-dev/pandas )
> > > > >
> > > > > > When I have time I try to stack list of Jira issues that require
> > > > > > committers attention and ashb fix them but it's progressing
> slowly.
> > > > > > I think that at least it would be great if the version field in
> the
> > > > > > Jira
> > > > > > will be mandatory when user submit ticket.
> > > > >
> > > > > > At the end... committers simply don't have time for this. They
> don't
> > > > > > have enough time for reviewing PRs as well so I doubt something
> will
> > > > > > change
> > > > > > in the near future.
> > > > >
> > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > > > On Wednesday, April 10, 2019 5:18 PM, James Meickle <
> > > > > > jmeic...@quantopian.com.INVALID> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > > I've been following Airflow development fairly actively for
> over a
> > > > > > > year. In
> > > > > >
> > > > > > > that time, the company I work at (Quantopian) has gone all-in
> on
> > > > > > > Airflow.
> > > > > >
> > > > > > > It's a core part of our business and required for daily
> operations.
> > > > > > > However, I've had some concerns over the future of the
> project. Part
> > > > > > > of
> > > > >
> > > > > > > these concerns are because it's difficult to contribute to
> Airflow:
> > > > > > >
> > > > > > > -   There are a lot of users of Airflow, but their use cases
> and
> > > > > > >     feature
> > > > > > >
> > > > >
> > > > > > > usage are not well described. Something that seems trivial or
> > > > > > > unnecessary
> > > > > >
> > > > > > > to one user turns out to be what someone else's entire workflow
> > > > > > > depends on.
> > > > > >
> > > > > > > -   The Airflow JIRA feels completely unmaintained. Most of the
> > > > > > >     issues
> > > > > > >
> > > >
> > > > > I've
> > > > >
> > > > > > > reported have never even been acknowledged, and it's hard to
> know
> > > > > > > what
> > > > > >
> > > > > > > versions an issue applies to. This makes it hard to know what
> to
> > > > > > > work on or
> > > > > >
> > > > > > > what would be most impactful to other users.
> > > > > > >
> > > > > > > -   Hacking on Airflow is challenging, especially if you need
> to
> > > > > > >     run a
> > > > > > >
> > > >
> > > > > real
> > > > >
> > > > > > > workload to examine your changes. (I saw the work for an
> improved
> > > > > > > local dev
> > > > > >
> > > > > > > process - great stuff!)
> > > > > > >
> > > > > > > -   Keeping track of what's on master vs. what's in a release
> is
> > > > > > >     challenging,
> > > > > > >
> > > > > >
> > > > > > > particularly since so many commits are for operators we'll
> never
> > > > > > > use. (I
> > > > > >
> > > > > > > know there's some discussion about breaking operators into
> their
> > > > > > > own
> > > > > > > repos,
> > > > >
> > > > > > > and I hope that goes through.)
> > > > > > >
> > > > > > > -   The PMCs are too busy to guarantee timely reviews, and
> rebasing
> > > > > > >     is
> > > > > > >
> > > >
> > > > > > > extremely costly with how much code reorganization is
> happening.
> > > > > > > This
> > > > >
> > > > > > > strongly discourages putting in time to develop anything other
> > > > > > > than
> > > >
> > > > > > > relatively isolated features, often new features.
> > > > > > > A lot of the problems that Quantopian experiences with Airflow
> > > > > > > can't
> > > > > > > be
> > > > >
> > > > > > > tackled without either "hacks" on top of Airflow; or deep
> > > > > > > reworkings
> > > > > > > of
> > > > >
> > > > > > > Airflow components. But that kind of rework is very challenging
> > > > > > > to
> > > >
> > > > > > > implement with the current Airflow contribution process.
> > > > > > > I'm glad that we've recently adopted AIPs, but the way we're
> > > > > > > using
> > > >
> > > > > them
> > > > >
> > > > > > > seems better suited to planning isolated features. The Airflow
> > > > > > > project does
> > > > > >
> > > > > > > not have a well-maintained roadmap, nor any mechanism to
> produce
> > > > > > > one
> > > > > > > by
> > > > >
> > > > > > > weighing AIPs based on synergy vs. developer interest vs. user
> > > > > > > interest.
> > > > > >
> > > > > > > I think that this lack of long-term planning makes it even more
> > > > > > > challenging
> > > > > >
> > > > > > > to propose larger reworks that might require multiple AIPs to
> > > > > > > implement,
> > > > > >
> > > > > > > each of which individually might yield little benefit. I worry
> > > > > > > that
> > > >
> > > > > we may
> > > > >
> > > > > > > approve a series of "promising" AIPs that, taken together,
> don't
> > > > > > > amount to
> > > > > >
> > > > > > > anything greater than a "pile of new features"; instead of
> > > > > > > balancing
> > > > >
> > > > > > > feature improvements with platform improvements that will
> unlock
> > > > > > > more
> > > > >
> > > > > > > fundamental changes to how Airflow can work.
> > > > > > > I'd like to see some discussion of what it would look like to
> set
> > > > > > > long term
> > > > > >
> > > > > > > goals for Airflow. What is Airflow 2 going to look like? How
> much
> > > > > > > backwards
> > > > > >
> > > > > > > compat will it break? When should we expect Airflow 3? Are they
> > > > > > > going to be
> > > > > >
> > > > > > > "business as usual" releases, or will they embrace any new
> > > > > > > concepts
> > > >
> > > > > or
> > > > >
> > > > > > > idioms? Will there be a true container-native, or cloud-native
> > > > > > > version of
> > > > > >
> > > > > > > Airflow? Will we work to be better for current users, or to
> > > > > > > embrace
> > > >
> > > > > new
> > > > >
> > > > > > > classes of users?
> > > > > > > I have some thoughts of my own, of course, but I'd like to hear
> > > > > > > what
> > > > > > > other
> > > > >
> > > > > > > people have to say on this topic first!
>
>
>

Reply via email to