Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

"Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA."[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

   1. Fork the Github repository at http://github.com/apache/kafka (if you
   haven't already)
   2. git checkout -b kafka-XXX
   3. Make one or more commits (smaller commits can be easier to review and
   reviewboard makes that hard)
   4. git push origin kafka-XXX
   5. Create PR against upstream/trunk (this will update JIRA
   automatically[2] and it will send an email to the dev mailing list too)
   6. A CI build will be triggered[3]
   7. Review process happens on GitHub (it's quite handy to be able to
   comment on both commit or PR-level, unlike Review Board)
   8. Once all feedback has been addressed and the build is green, a
   variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
   push, close the PR and JIRA issue. The squashed commit generated by the
   script includes a bunch of useful information including links to the
   original commits[5] (in the future, I think it's worth reconsidering the
   squashing of commits, but retaining the information in the commit is
   already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

   1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
   we need to request it for Kafka and provide whatever configuration is
   needed)
   2. Adapting Spark's merge_park_pr script and integrating it into the
   kafka Git repository
   3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
   the Git repository (this is shown when someone is creating a pull request)
   4. Go through existing GitHub pull requests and close the ones that are
   no longer relevant (there are quite a few as people have been opening them
   over the years, but nothing was done about most of them)
   5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00

Reply via email to