[DISCUSS] Using GitHub Pull Requests for contributions and code review

Ismael Juma Thu, 30 Apr 2015 06:14:42 -0700

Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

"Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA."[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening them
over the years, but nothing was done about most of them)
5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00

[DISCUSS] Using GitHub Pull Requests for contributions and code review

Reply via email to