Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Thanks Gwen, will update the wiki then. On Wed, Jul 15, 2015 at 1:03 PM, Gwen Shapira wrote: > The script just runs a git push, it will be with your user (from git > configuration), so you need privileges to push. > > On Wed, Jul 15, 2015 at 12:59 PM, Guozhang Wang > wrote: > > Another questions regarding the "kafka-merge-pr.py" script: once a > > committer gives a LGTM on the PR, could the contributor use the script to > > push to the git repo himself, or it must be executed by the committer? I > > thought it is the former case but seems not. If not we need to make it > > clear on the wiki. > > > > Guozhang > > > > On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma wrote: > > > >> On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao wrote: > >> > >> > I made a couple of changes to the new Jenkins job. Could you try > again? > >> > > >> > >> It's still not working, unfortunately. It may or may not be related to: > >> > >> https://blogs.apache.org/infra/entry/mirroring_to_github_issues > >> > >> For b, if we can't easily change the behavior of pull request bot, we > can > >> > also just flip back and forth btw "In Progress" and "Open". > >> > > >> > >> It turns out that the ASF bot doesn't change the status at all. Spark is > >> using a different mechanism to cause the status change (via a > `sparkuser`). > >> The JIRA ticket status will have to be changed manually for now. I have > >> updated the instructions accordingly. I was told in the INFRA channel to > >> file a JIRA ticket and they would see if auto transition listeners can > be > >> used to automate this (no guarantees though). > >> > >> Ismael > >> > > > > > > > > -- > > -- Guozhang > -- -- Guozhang
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
The script just runs a git push, it will be with your user (from git configuration), so you need privileges to push. On Wed, Jul 15, 2015 at 12:59 PM, Guozhang Wang wrote: > Another questions regarding the "kafka-merge-pr.py" script: once a > committer gives a LGTM on the PR, could the contributor use the script to > push to the git repo himself, or it must be executed by the committer? I > thought it is the former case but seems not. If not we need to make it > clear on the wiki. > > Guozhang > > On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma wrote: > >> On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao wrote: >> >> > I made a couple of changes to the new Jenkins job. Could you try again? >> > >> >> It's still not working, unfortunately. It may or may not be related to: >> >> https://blogs.apache.org/infra/entry/mirroring_to_github_issues >> >> For b, if we can't easily change the behavior of pull request bot, we can >> > also just flip back and forth btw "In Progress" and "Open". >> > >> >> It turns out that the ASF bot doesn't change the status at all. Spark is >> using a different mechanism to cause the status change (via a `sparkuser`). >> The JIRA ticket status will have to be changed manually for now. I have >> updated the instructions accordingly. I was told in the INFRA channel to >> file a JIRA ticket and they would see if auto transition listeners can be >> used to automate this (no guarantees though). >> >> Ismael >> > > > > -- > -- Guozhang
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Another questions regarding the "kafka-merge-pr.py" script: once a committer gives a LGTM on the PR, could the contributor use the script to push to the git repo himself, or it must be executed by the committer? I thought it is the former case but seems not. If not we need to make it clear on the wiki. Guozhang On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma wrote: > On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao wrote: > > > I made a couple of changes to the new Jenkins job. Could you try again? > > > > It's still not working, unfortunately. It may or may not be related to: > > https://blogs.apache.org/infra/entry/mirroring_to_github_issues > > For b, if we can't easily change the behavior of pull request bot, we can > > also just flip back and forth btw "In Progress" and "Open". > > > > It turns out that the ASF bot doesn't change the status at all. Spark is > using a different mechanism to cause the status change (via a `sparkuser`). > The JIRA ticket status will have to be changed manually for now. I have > updated the instructions accordingly. I was told in the INFRA channel to > file a JIRA ticket and they would see if auto transition listeners can be > used to automate this (no guarantees though). > > Ismael > -- -- Guozhang
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao wrote: > I made a couple of changes to the new Jenkins job. Could you try again? > It's still not working, unfortunately. It may or may not be related to: https://blogs.apache.org/infra/entry/mirroring_to_github_issues For b, if we can't easily change the behavior of pull request bot, we can > also just flip back and forth btw "In Progress" and "Open". > It turns out that the ASF bot doesn't change the status at all. Spark is using a different mechanism to cause the status change (via a `sparkuser`). The JIRA ticket status will have to be changed manually for now. I have updated the instructions accordingly. I was told in the INFRA channel to file a JIRA ticket and they would see if auto transition listeners can be used to automate this (no guarantees though). Ismael
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
I made a couple of changes to the new Jenkins job. Could you try again? For b, if we can't easily change the behavior of pull request bot, we can also just flip back and forth btw "In Progress" and "Open". Thanks, Jun On Tue, Jul 14, 2015 at 2:13 AM, Ismael Juma wrote: > Hi Jun, > > On Tue, Jul 14, 2015 at 2:09 AM, Jun Rao wrote: > > > Ismael, > > > > I followed the instructions in KAFKA-2320 and created a new Jenkins job ( > > https://builds.apache.org/job/kafka-trunk-git-pr/). Could you check if > it > > works? > > > > Thanks! It seems to be building when trunk changes as opposed to when PRs > are submitted (eg https://github.com/apache/kafka/pull/74). I'll see if I > can get some help from INFRA. > > As for wiki, I have a couple of minor comments. > > > > a. Could we add the following to the wiki? > > To avoid conflicts, assign a jira to yourself if you plan to work on it. > If > > you are creating a jira and don't plan to work on it, leave the assignee > as > > Unassigned. > > > > Good point, done. > > b. Previously, we mark a jira as "Patch Available" if there is a patch. > > Could we reuse that instead of "In Progress" to be consistent? Also, if a > > patch needs more work after review, the reviewer will mark the jira back > to > > "In Progress". > > > > I added this as an additional step for the user because I don't think we > can customise the behaviour of the pull request bot (which sets the status > to "In Progress"). It's a bit redundant when a PR is created, so a bit > annoying. I'll double-check with INFRA. > > I also mentioned in the review section about the reviewer changing the JIRA > status back to "In Progress". > > Best, > Ismael >
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Hi Jun, On Tue, Jul 14, 2015 at 2:09 AM, Jun Rao wrote: > Ismael, > > I followed the instructions in KAFKA-2320 and created a new Jenkins job ( > https://builds.apache.org/job/kafka-trunk-git-pr/). Could you check if it > works? > Thanks! It seems to be building when trunk changes as opposed to when PRs are submitted (eg https://github.com/apache/kafka/pull/74). I'll see if I can get some help from INFRA. As for wiki, I have a couple of minor comments. > > a. Could we add the following to the wiki? > To avoid conflicts, assign a jira to yourself if you plan to work on it. If > you are creating a jira and don't plan to work on it, leave the assignee as > Unassigned. > Good point, done. b. Previously, we mark a jira as "Patch Available" if there is a patch. > Could we reuse that instead of "In Progress" to be consistent? Also, if a > patch needs more work after review, the reviewer will mark the jira back to > "In Progress". > I added this as an additional step for the user because I don't think we can customise the behaviour of the pull request bot (which sets the status to "In Progress"). It's a bit redundant when a PR is created, so a bit annoying. I'll double-check with INFRA. I also mentioned in the review section about the reviewer changing the JIRA status back to "In Progress". Best, Ismael
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Ismael, I followed the instructions in KAFKA-2320 and created a new Jenkins job ( https://builds.apache.org/job/kafka-trunk-git-pr/). Could you check if it works? As for wiki, I have a couple of minor comments. a. Could we add the following to the wiki? To avoid conflicts, assign a jira to yourself if you plan to work on it. If you are creating a jira and don't plan to work on it, leave the assignee as Unassigned. b. Previously, we mark a jira as "Patch Available" if there is a patch. Could we reuse that instead of "In Progress" to be consistent? Also, if a patch needs more work after review, the reviewer will mark the jira back to "In Progress". Thanks, Jun On Wed, Jul 8, 2015 at 1:39 AM, Ismael Juma wrote: > An update on this. > > On Thu, Apr 30, 2015 at 2:12 PM, Ismael Juma wrote: > > > > >1. CI builds triggered by GitHub PRs (this is supported by Apache > >Infra, we need to request it for Kafka and provide whatever > configuration > >is needed) > > > > Filed https://issues.apache.org/jira/browse/KAFKA-2320, someone with a > Jenkins account needs to follow the instructions there. > > > > >1. Adapting Spark's merge_park_pr script and integrating it into the > >kafka Git repository > > > > https://issues.apache.org/jira/browse/KAFKA-2187 includes a patch that > has received an initial review by Neha. > > > > >1. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md > >to the Git repository (this is shown when someone is creating a pull > >request) > > > > Initial versions (feedback and/or improvements are welcome): > >- > > https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes >- > > https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review#Patchsubmissionandreview-MergingGitHubPullRequests >- https://issues.apache.org/jira/browse/KAFKA-2321 (patch available) > > > >1. Go through existing GitHub pull requests and close the ones that > >are no longer relevant (there are quite a few as people have been > opening > >them over the years, but nothing was done about most of them) > > > > Not done yet. I think this should wait until we have merged a few PRs as > I > would like to invite people to open new PRs if it's still relevant while > pointing them to the documentation on how to go about it. > > > > >1. Other things I may be missing > > > > We also need to update the "Contributing" page on the website. I think > this should also wait until we are happy that the new approach works well > for us. > > Any help moving this forward is appreciated. Aside from reviews, feedback > and merging the changes; testing the new process is particularly useful ( > https://issues.apache.org/jira/browse/KAFKA-2276 is an example). > > Best, > Ismael >
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Commends inlined. On Fri, Jul 10, 2015 at 2:10 PM, Ismael Juma wrote: > Hi Guozhang, > > Comments inline. > > On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang wrote: > > > > I have a couple of comments on the wiki pages / merge scripts: > > > > Thanks, it's important to discuss these things as the current version is > based on what the Spark project does and may not match what we want to do > exactly. > > 1. In the wiki page it mentions "If the change is new, then it usually > > needs a new JIRA. However, trivial changes, where "what should change" is > > virtually the same as "how it should change" do not require a JIRA. > > Example: "Fix typos in Foo scaladoc"." > > > > So it sounds we are going to allow pull requests without a JIRA ticket, > but > > later we are also mentioning: > > > > "The PR title should be of the form [KAFKA-].." which is a bit > > conflicting with the previous statement. Could you clarify it? Today our > > commits are mostly with JIRAs except some trivial ones that are only done > > by committers, I can either extend this to non-committers or not, just > that > > we need to make it clear. > > > > I agree that it's not very clear and it needs to be fixed. What do we want > to do though? It's a trade-off between consistency (always have a JIRA > ticket) and avoiding redundancy (skip the JIRA where it doesn't add > anything). The former is the more conservative option and I am happy to > update the documentation if that's the preferred option. > > Personally I think it is better to not enforcing a JIRA ticket for minor / hotfix commits, for example, we can format the title with [MINOR] [HOTFIX] etc as in Spark: https://github.com/apache/spark/commits/master But I think it we need some broader discussion for this, maybe we could start another email thread regarding this? > 2. The git commit message is a little verbose to me, for example: > > > > -- > > > > commit ee88dbb67f19b787e12ccef37982c9459b78c7b6 > > > > Author: Geoff Anderson > > > > Date: Thu Jul 9 14:58:01 2015 -0700 > > > >KAFKA-2327; broker doesn't start if config defines advertised.host but > > not advertised.port > > > > > > > >Added unit tests as well. These fail without the fix, but pass with > the > > fix. > > > > > > > >Author: Geoff Anderson > > > > > > > >Closes #73 from granders/KAFKA-2327 and squashes the following > commits: > > > > > > > >52a2085 [Geoff Anderson] Cleaned up unecessary toString calls > > > >23b3340 [Geoff Anderson] Fixes KAFKA-2327 > > > > -- > > > > > > The "Author" field is redundant, and we could probably also omit the > github > > commits. What do you think? > > > > Is it harmful to have detailed information? None of it is actually > redundant as far as I can see (but I could be missing something). There is > the squashed commit author, the pull request message, the pull request > author and the information about the individual commits. Even though Geoff > worked on this PR by himself, multiple people can collaborate on a feature > and then it's useful to credit correctly. > > The fact that we are keeping all the information encourages a style of > development where a few small commits (each commit should must make sense > on its own and pass the tests) are used in the pull request, which helps a > lot during code review and when trying to understand changes after the > fact. This is in contrast with the style where there is a single commit per > feature even when the feature is quite large (we have a few of those > ongoing at the moment). I don't know if you noticed, but GitHub actually > links to the original commits, which is quite handy: > > > https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6 > > This approach is a bit of a workaround, but it's easier as the script > handles everything (credit to the Spark project once again, it's their > code). The ideal scenario would be to keep the individual commits by > merging the branch into trunk after a rebase (no fast-forward, so that a > merge commit is maintained). This has a number of nice characteristics > (history is linear, it's clear what commits were part of the merged branch, > easy to bisect, easy to revert, etc.), but it requires a bit more work, > more Git knowledge and ideally a PR builder that builds and tests every > commit in the branch (Typesafe wrote a tool that does this for the Scala > project, for what is worth). In other words, a step too far for us right > now. :) > > What do you think? > > I think I'm sold on your comments, makes sense :) > Best, > Ismael > -- -- Guozhang
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Hi Guozhang, Comments inline. On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang wrote: > > I have a couple of comments on the wiki pages / merge scripts: > Thanks, it's important to discuss these things as the current version is based on what the Spark project does and may not match what we want to do exactly. 1. In the wiki page it mentions "If the change is new, then it usually > needs a new JIRA. However, trivial changes, where "what should change" is > virtually the same as "how it should change" do not require a JIRA. > Example: "Fix typos in Foo scaladoc"." > > So it sounds we are going to allow pull requests without a JIRA ticket, but > later we are also mentioning: > > "The PR title should be of the form [KAFKA-].." which is a bit > conflicting with the previous statement. Could you clarify it? Today our > commits are mostly with JIRAs except some trivial ones that are only done > by committers, I can either extend this to non-committers or not, just that > we need to make it clear. > I agree that it's not very clear and it needs to be fixed. What do we want to do though? It's a trade-off between consistency (always have a JIRA ticket) and avoiding redundancy (skip the JIRA where it doesn't add anything). The former is the more conservative option and I am happy to update the documentation if that's the preferred option. 2. The git commit message is a little verbose to me, for example: > -- > > commit ee88dbb67f19b787e12ccef37982c9459b78c7b6 > > Author: Geoff Anderson > > Date: Thu Jul 9 14:58:01 2015 -0700 > >KAFKA-2327; broker doesn't start if config defines advertised.host but > not advertised.port > > > >Added unit tests as well. These fail without the fix, but pass with the > fix. > > > >Author: Geoff Anderson > > > >Closes #73 from granders/KAFKA-2327 and squashes the following commits: > > > >52a2085 [Geoff Anderson] Cleaned up unecessary toString calls > >23b3340 [Geoff Anderson] Fixes KAFKA-2327 > > -- > > > The "Author" field is redundant, and we could probably also omit the github > commits. What do you think? > Is it harmful to have detailed information? None of it is actually redundant as far as I can see (but I could be missing something). There is the squashed commit author, the pull request message, the pull request author and the information about the individual commits. Even though Geoff worked on this PR by himself, multiple people can collaborate on a feature and then it's useful to credit correctly. The fact that we are keeping all the information encourages a style of development where a few small commits (each commit should must make sense on its own and pass the tests) are used in the pull request, which helps a lot during code review and when trying to understand changes after the fact. This is in contrast with the style where there is a single commit per feature even when the feature is quite large (we have a few of those ongoing at the moment). I don't know if you noticed, but GitHub actually links to the original commits, which is quite handy: https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6 This approach is a bit of a workaround, but it's easier as the script handles everything (credit to the Spark project once again, it's their code). The ideal scenario would be to keep the individual commits by merging the branch into trunk after a rebase (no fast-forward, so that a merge commit is maintained). This has a number of nice characteristics (history is linear, it's clear what commits were part of the merged branch, easy to bisect, easy to revert, etc.), but it requires a bit more work, more Git knowledge and ideally a PR builder that builds and tests every commit in the branch (Typesafe wrote a tool that does this for the Scala project, for what is worth). In other words, a step too far for us right now. :) What do you think? Best, Ismael
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Hi Ismael, I have a couple of comments on the wiki pages / merge scripts: 1. In the wiki page it mentions "If the change is new, then it usually needs a new JIRA. However, trivial changes, where "what should change" is virtually the same as "how it should change" do not require a JIRA. Example: "Fix typos in Foo scaladoc"." So it sounds we are going to allow pull requests without a JIRA ticket, but later we are also mentioning: "The PR title should be of the form [KAFKA-].." which is a bit conflicting with the previous statement. Could you clarify it? Today our commits are mostly with JIRAs except some trivial ones that are only done by committers, I can either extend this to non-committers or not, just that we need to make it clear. 2. The git commit message is a little verbose to me, for example: -- commit ee88dbb67f19b787e12ccef37982c9459b78c7b6 Author: Geoff Anderson Date: Thu Jul 9 14:58:01 2015 -0700 KAFKA-2327; broker doesn't start if config defines advertised.host but not advertised.port Added unit tests as well. These fail without the fix, but pass with the fix. Author: Geoff Anderson Closes #73 from granders/KAFKA-2327 and squashes the following commits: 52a2085 [Geoff Anderson] Cleaned up unecessary toString calls 23b3340 [Geoff Anderson] Fixes KAFKA-2327 -- The "Author" field is redundant, and we could probably also omit the github commits. What do you think? Guozhang
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
An update on this. On Thu, Apr 30, 2015 at 2:12 PM, Ismael Juma wrote: > >1. CI builds triggered by GitHub PRs (this is supported by Apache >Infra, we need to request it for Kafka and provide whatever configuration >is needed) > > Filed https://issues.apache.org/jira/browse/KAFKA-2320, someone with a Jenkins account needs to follow the instructions there. > >1. Adapting Spark's merge_park_pr script and integrating it into the >kafka Git repository > > https://issues.apache.org/jira/browse/KAFKA-2187 includes a patch that has received an initial review by Neha. > >1. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md >to the Git repository (this is shown when someone is creating a pull >request) > > Initial versions (feedback and/or improvements are welcome): - https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes - https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review#Patchsubmissionandreview-MergingGitHubPullRequests - https://issues.apache.org/jira/browse/KAFKA-2321 (patch available) >1. Go through existing GitHub pull requests and close the ones that >are no longer relevant (there are quite a few as people have been opening >them over the years, but nothing was done about most of them) > > Not done yet. I think this should wait until we have merged a few PRs as I would like to invite people to open new PRs if it's still relevant while pointing them to the documentation on how to go about it. > >1. Other things I may be missing > > We also need to update the "Contributing" page on the website. I think this should also wait until we are happy that the new approach works well for us. Any help moving this forward is appreciated. Aside from reviews, feedback and merging the changes; testing the new process is particularly useful ( https://issues.apache.org/jira/browse/KAFKA-2276 is an example). Best, Ismael
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
On Fri, May 1, 2015 at 8:38 AM, Ewen Cheslack-Postava wrote: > One thing I noticed is that when you try to generate a PR it defaults to > the 0.8.2 branch. Can we fix that up to be trunk by default? > I filed an INFRA ticket for this: https://issues.apache.org/jira/browse/INFRA-9680 Best, Ismael
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Yes, it would be great if we can do a final rebase of the branch before merging. Perhaps the squashing part already does that? Thanks, Jun On Tue, May 5, 2015 at 5:53 AM, Ismael Juma wrote: > Hi Jun, > > On Sat, May 2, 2015 at 2:50 PM, Jun Rao wrote: > > > We will also need to figure out if we need CONTRIBUTING.md like the > > following to take care of the Apache licensing stuff. > > > > https://github.com/apache/spark/blob/master/CONTRIBUTING.md > > > Yes indeed. That is in step 3 of the "missing pieces" in the original > email. > > As for merging changes, I think squashing the commits will be ideal during > > merge. > > > The script does squash commits during merge indeed. It also includes > retains a bunch of useful information in the commit message. I think one > loses some of the benefits of git when using a one squashed commit per PR > approach, but that is a separate issue and I am not suggesting a change in > this regard at this point. > > > > Also, it would be great if we can always put the latest changes on > > top during the merge. > > > > Just to make sure I understand correctly, do you mean updating the working > copy with the latest from trunk before merging? Or did I misunderstand? > > Best, > Ismael >
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Hi Jun, On Sat, May 2, 2015 at 2:50 PM, Jun Rao wrote: > We will also need to figure out if we need CONTRIBUTING.md like the > following to take care of the Apache licensing stuff. > > https://github.com/apache/spark/blob/master/CONTRIBUTING.md Yes indeed. That is in step 3 of the "missing pieces" in the original email. As for merging changes, I think squashing the commits will be ideal during > merge. The script does squash commits during merge indeed. It also includes retains a bunch of useful information in the commit message. I think one loses some of the benefits of git when using a one squashed commit per PR approach, but that is a separate issue and I am not suggesting a change in this regard at this point. > Also, it would be great if we can always put the latest changes on > top during the merge. > Just to make sure I understand correctly, do you mean updating the working copy with the latest from trunk before merging? Or did I misunderstand? Best, Ismael
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Hi Ewen, Comments inline. On Fri, May 1, 2015 at 8:38 AM, Ewen Cheslack-Postava wrote: > Also +1. There are some drawbacks to using Github for reviews, e.g. lots of > emails for each review because they don't let you publish your entire > review in one go like RB does, but it drastically lowers the barrier to > contributing for most developers. Also, if you haven't tried it, hub > https://hub.github.com/ makes it really easy to checkout and test PRs. > Good points. One thing I noticed is that when you try to generate a PR it defaults to > the 0.8.2 branch. Can we fix that up to be trunk by default? That's the > most common use case; version branches are really only useful when a > release is being prepared. Do changes to the Github repo require tickets to > the Apache Infra team or is this something committers have control over? > It can be changed to trunk, but it does require a ticket to the Infra team[1]. Sadly, PRs can only be closed via commits, PR creator or a ticket to Infra. This is quite clunky for abandoned PRs (a label can be used to exclude such PRs from view, if needed). On a related note, which perhaps should be discussed on another thread: > > The CI setup is a related issue that we might want to rethink. I did think about Travis as well, but I thought that should be considered as a subsequent and separate discussion indeed. Best, Ismael [1] An example of such a request: https://issues.apache.org/jira/browse/INFRA-9208
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
+1! -Jay On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma wrote: > Hi all, > > Kafka currently uses a combination of Review Board and JIRA for > contributions and code review. In my opinion, this makes contribution and > code review a bit harder than it has to be. > > I think the approach used by Spark would improve the current situation: > > "Generally, Spark uses JIRA to track logical issues, including bugs and > improvements, and uses Github pull requests to manage the review and merge > of specific code changes. That is, JIRAs are used to describe what should > be fixed or changed, and high-level approaches, and pull requests describe > how to implement that change in the project's source code. For example, > major design decisions are discussed in JIRA."[1] > > It's worth reading the wiki page for all the details, but I will summarise > the suggested workflow for code changes: > >1. Fork the Github repository at http://github.com/apache/kafka (if you >haven't already) >2. git checkout -b kafka-XXX >3. Make one or more commits (smaller commits can be easier to review and >reviewboard makes that hard) >4. git push origin kafka-XXX >5. Create PR against upstream/trunk (this will update JIRA >automatically[2] and it will send an email to the dev mailing list too) >6. A CI build will be triggered[3] >7. Review process happens on GitHub (it's quite handy to be able to >comment on both commit or PR-level, unlike Review Board) >8. Once all feedback has been addressed and the build is green, a >variant of the `merge_spark_pr.py`[4] script is used to squash, merge, >push, close the PR and JIRA issue. The squashed commit generated by the >script includes a bunch of useful information including links to the >original commits[5] (in the future, I think it's worth reconsidering the >squashing of commits, but retaining the information in the commit is >already an improvement) > > Neha merged a couple of commits via GitHub already and it went smoothly > although we are still missing a few of the pieces described above: > >1. CI builds triggered by GitHub PRs (this is supported by Apache Infra, >we need to request it for Kafka and provide whatever configuration is >needed) >2. Adapting Spark's merge_park_pr script and integrating it into the >kafka Git repository >3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to >the Git repository (this is shown when someone is creating a pull > request) >4. Go through existing GitHub pull requests and close the ones that are >no longer relevant (there are quite a few as people have been opening > them >over the years, but nothing was done about most of them) >5. Other things I may be missing > > I am volunteering to help with the above if people agree that this is the > right direction for Kafka. Thoughts? > > Best. > Ismael > > P.S. I was told in the Apache Infra HipChat that it's not currently > possible (and there are no plans to change that in the near future) to use > the GitHub merge button to merge PRs. The merge script does quite a few > useful things that the merge button does not in any case. > > [1] > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > [2] > > https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614 > [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now > [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py > [5] > > https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00 >
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Ismael, We will also need to figure out if we need CONTRIBUTING.md like the following to take care of the Apache licensing stuff. https://github.com/apache/spark/blob/master/CONTRIBUTING.md As for merging changes, I think squashing the commits will be ideal during merge. Also, it would be great if we can always put the latest changes on top during the merge. Thanks, Jun On Thu, Apr 30, 2015 at 8:12 AM, Ismael Juma wrote: > Hi all, > > Kafka currently uses a combination of Review Board and JIRA for > contributions and code review. In my opinion, this makes contribution and > code review a bit harder than it has to be. > > I think the approach used by Spark would improve the current situation: > > "Generally, Spark uses JIRA to track logical issues, including bugs and > improvements, and uses Github pull requests to manage the review and merge > of specific code changes. That is, JIRAs are used to describe what should > be fixed or changed, and high-level approaches, and pull requests describe > how to implement that change in the project's source code. For example, > major design decisions are discussed in JIRA."[1] > > It's worth reading the wiki page for all the details, but I will summarise > the suggested workflow for code changes: > >1. Fork the Github repository at http://github.com/apache/kafka (if you >haven't already) >2. git checkout -b kafka-XXX >3. Make one or more commits (smaller commits can be easier to review and >reviewboard makes that hard) >4. git push origin kafka-XXX >5. Create PR against upstream/trunk (this will update JIRA >automatically[2] and it will send an email to the dev mailing list too) >6. A CI build will be triggered[3] >7. Review process happens on GitHub (it's quite handy to be able to >comment on both commit or PR-level, unlike Review Board) >8. Once all feedback has been addressed and the build is green, a >variant of the `merge_spark_pr.py`[4] script is used to squash, merge, >push, close the PR and JIRA issue. The squashed commit generated by the >script includes a bunch of useful information including links to the >original commits[5] (in the future, I think it's worth reconsidering the >squashing of commits, but retaining the information in the commit is >already an improvement) > > Neha merged a couple of commits via GitHub already and it went smoothly > although we are still missing a few of the pieces described above: > >1. CI builds triggered by GitHub PRs (this is supported by Apache Infra, >we need to request it for Kafka and provide whatever configuration is >needed) >2. Adapting Spark's merge_park_pr script and integrating it into the >kafka Git repository >3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to >the Git repository (this is shown when someone is creating a pull > request) >4. Go through existing GitHub pull requests and close the ones that are >no longer relevant (there are quite a few as people have been opening > them >over the years, but nothing was done about most of them) >5. Other things I may be missing > > I am volunteering to help with the above if people agree that this is the > right direction for Kafka. Thoughts? > > Best. > Ismael > > P.S. I was told in the Apache Infra HipChat that it's not currently > possible (and there are no plans to change that in the near future) to use > the GitHub merge button to merge PRs. The merge script does quite a few > useful things that the merge button does not in any case. > > [1] > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > [2] > > https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614 > [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now > [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py > [5] > > https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00 >
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
+1. Thanks Parth On 5/1/15, 12:38 AM, "Ewen Cheslack-Postava" wrote: >Also +1. There are some drawbacks to using Github for reviews, e.g. lots >of >emails for each review because they don't let you publish your entire >review in one go like RB does, but it drastically lowers the barrier to >contributing for most developers. Also, if you haven't tried it, hub >https://hub.github.com/ makes it really easy to checkout and test PRs. > >One thing I noticed is that when you try to generate a PR it defaults to >the 0.8.2 branch. Can we fix that up to be trunk by default? That's the >most common use case; version branches are really only useful when a >release is being prepared. Do changes to the Github repo require tickets >to >the Apache Infra team or is this something committers have control over? > > >On a related note, which perhaps should be discussed on another thread: > >The CI setup is a related issue that we might want to rethink. Apache also >supports Travis CI, and now pays for dedicated build slaves: >https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci I >think the setup should be pretty easy since building + running tests is >just a gradle command. Having tests run automatically on PRs (and >promptly!) makes it a lot easier to confidently commit a change, >especially >if the build merges with trunk first. I started looking at this when I was >trying to sort out Confluent's build & test infrastructure. See >https://travis-ci.org/ewencp/kafka/builds/60802386 for an example, the >patch is about 10 lines in a .travis.yml file and the failure in that >example seems to be unrelated to the Travis confg. I think the basic >config >I created is all we'd need. > >Unfortunately, I couldn't easily tell what the delay on builds is, i.e. >would it be an improvement over the delays with the current Jenkins setup. >But having them run on PR creation/update means the results will usually >be >ready by the time a reviewer gets to looking at the PR and would be >reported in the PR so the state is easy to evaluate. (I'm also having >trouble telling exactly how the two ASF Jenkins builders differ since they >both seem to poll trunk, so I can't be certain whether Travis would be >able >to completely replace the current setup. That said, it should be telling >that I have never paid attention to Jenkins output at all since it seems >so >far removed from any of my actions as a contributor.) > >As another alternative, Confluent would also be happy to provide >build/test >infrastructure to help out. AMPLab does this for Spark, so it seems to be >acceptable to ASF. We already have PR builders and trunk/master builders >set up for other projects, so it wouldn't be hard to setup the same for >Kafka with the right access to the repos. Based on the current frequency >of >builds (https://builds.apache.org/view/All/job/Kafka-trunk/ and >https://builds.apache.org/view/All/job/KafkaPreCommit/) I think it'll be >easy for even our current infrastructure to keep up. > > >On Thu, Apr 30, 2015 at 9:41 PM, Jaikiran Pai >wrote: > >> I think this will help a lot in contributions. Some of my local changes >> that I want to contribute back have been pending because I sometimes >>switch >> machines and I then have to go through setting up the Ruby/python and >>other >> stuff for the current review process. Using just github is going to >>help in >> quickly submitting the changes. >> >> -Jaikiran >> >> On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote: >> >>> Hi all, >>> >>> Kafka currently uses a combination of Review Board and JIRA for >>> contributions and code review. In my opinion, this makes contribution >>>and >>> code review a bit harder than it has to be. >>> >>> I think the approach used by Spark would improve the current situation: >>> >>> "Generally, Spark uses JIRA to track logical issues, including bugs and >>> improvements, and uses Github pull requests to manage the review and >>>merge >>> of specific code changes. That is, JIRAs are used to describe what >>>should >>> be fixed or changed, and high-level approaches, and pull requests >>>describe >>> how to implement that change in the project's source code. For example, >>> major design decisions are discussed in JIRA."[1] >>> >>> It's worth reading the wiki page for all the details, but I will >>>summarise >>> the suggested workflow for code changes: >>> >>> 1. Fork the Github repository at http://github.com/apache/kafka (if >>> you >>> haven't already) >>> 2. git checkout -b kafka-XXX >>> 3. Make one or more commits (smaller commits can be easier to >>>review >>> and >>> reviewboard makes that hard) >>> 4. git push origin kafka-XXX >>> 5. Create PR against upstream/trunk (this will update JIRA >>> automatically[2] and it will send an email to the dev mailing list >>> too) >>> 6. A CI build will be triggered[3] >>> 7. Review process happens on GitHub (it's quite handy to be able to >>> comment on both commit
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Also +1. There are some drawbacks to using Github for reviews, e.g. lots of emails for each review because they don't let you publish your entire review in one go like RB does, but it drastically lowers the barrier to contributing for most developers. Also, if you haven't tried it, hub https://hub.github.com/ makes it really easy to checkout and test PRs. One thing I noticed is that when you try to generate a PR it defaults to the 0.8.2 branch. Can we fix that up to be trunk by default? That's the most common use case; version branches are really only useful when a release is being prepared. Do changes to the Github repo require tickets to the Apache Infra team or is this something committers have control over? On a related note, which perhaps should be discussed on another thread: The CI setup is a related issue that we might want to rethink. Apache also supports Travis CI, and now pays for dedicated build slaves: https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci I think the setup should be pretty easy since building + running tests is just a gradle command. Having tests run automatically on PRs (and promptly!) makes it a lot easier to confidently commit a change, especially if the build merges with trunk first. I started looking at this when I was trying to sort out Confluent's build & test infrastructure. See https://travis-ci.org/ewencp/kafka/builds/60802386 for an example, the patch is about 10 lines in a .travis.yml file and the failure in that example seems to be unrelated to the Travis confg. I think the basic config I created is all we'd need. Unfortunately, I couldn't easily tell what the delay on builds is, i.e. would it be an improvement over the delays with the current Jenkins setup. But having them run on PR creation/update means the results will usually be ready by the time a reviewer gets to looking at the PR and would be reported in the PR so the state is easy to evaluate. (I'm also having trouble telling exactly how the two ASF Jenkins builders differ since they both seem to poll trunk, so I can't be certain whether Travis would be able to completely replace the current setup. That said, it should be telling that I have never paid attention to Jenkins output at all since it seems so far removed from any of my actions as a contributor.) As another alternative, Confluent would also be happy to provide build/test infrastructure to help out. AMPLab does this for Spark, so it seems to be acceptable to ASF. We already have PR builders and trunk/master builders set up for other projects, so it wouldn't be hard to setup the same for Kafka with the right access to the repos. Based on the current frequency of builds (https://builds.apache.org/view/All/job/Kafka-trunk/ and https://builds.apache.org/view/All/job/KafkaPreCommit/) I think it'll be easy for even our current infrastructure to keep up. On Thu, Apr 30, 2015 at 9:41 PM, Jaikiran Pai wrote: > I think this will help a lot in contributions. Some of my local changes > that I want to contribute back have been pending because I sometimes switch > machines and I then have to go through setting up the Ruby/python and other > stuff for the current review process. Using just github is going to help in > quickly submitting the changes. > > -Jaikiran > > On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote: > >> Hi all, >> >> Kafka currently uses a combination of Review Board and JIRA for >> contributions and code review. In my opinion, this makes contribution and >> code review a bit harder than it has to be. >> >> I think the approach used by Spark would improve the current situation: >> >> "Generally, Spark uses JIRA to track logical issues, including bugs and >> improvements, and uses Github pull requests to manage the review and merge >> of specific code changes. That is, JIRAs are used to describe what should >> be fixed or changed, and high-level approaches, and pull requests describe >> how to implement that change in the project's source code. For example, >> major design decisions are discussed in JIRA."[1] >> >> It's worth reading the wiki page for all the details, but I will summarise >> the suggested workflow for code changes: >> >> 1. Fork the Github repository at http://github.com/apache/kafka (if >> you >> haven't already) >> 2. git checkout -b kafka-XXX >> 3. Make one or more commits (smaller commits can be easier to review >> and >> reviewboard makes that hard) >> 4. git push origin kafka-XXX >> 5. Create PR against upstream/trunk (this will update JIRA >> automatically[2] and it will send an email to the dev mailing list >> too) >> 6. A CI build will be triggered[3] >> 7. Review process happens on GitHub (it's quite handy to be able to >> comment on both commit or PR-level, unlike Review Board) >> 8. Once all feedback has been addressed and the build is green, a >> variant of the `merge_spark_pr.py`[4] script is used to squash, merge, >> push, clo
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
I think this will help a lot in contributions. Some of my local changes that I want to contribute back have been pending because I sometimes switch machines and I then have to go through setting up the Ruby/python and other stuff for the current review process. Using just github is going to help in quickly submitting the changes. -Jaikiran On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote: Hi all, Kafka currently uses a combination of Review Board and JIRA for contributions and code review. In my opinion, this makes contribution and code review a bit harder than it has to be. I think the approach used by Spark would improve the current situation: "Generally, Spark uses JIRA to track logical issues, including bugs and improvements, and uses Github pull requests to manage the review and merge of specific code changes. That is, JIRAs are used to describe what should be fixed or changed, and high-level approaches, and pull requests describe how to implement that change in the project's source code. For example, major design decisions are discussed in JIRA."[1] It's worth reading the wiki page for all the details, but I will summarise the suggested workflow for code changes: 1. Fork the Github repository at http://github.com/apache/kafka (if you haven't already) 2. git checkout -b kafka-XXX 3. Make one or more commits (smaller commits can be easier to review and reviewboard makes that hard) 4. git push origin kafka-XXX 5. Create PR against upstream/trunk (this will update JIRA automatically[2] and it will send an email to the dev mailing list too) 6. A CI build will be triggered[3] 7. Review process happens on GitHub (it's quite handy to be able to comment on both commit or PR-level, unlike Review Board) 8. Once all feedback has been addressed and the build is green, a variant of the `merge_spark_pr.py`[4] script is used to squash, merge, push, close the PR and JIRA issue. The squashed commit generated by the script includes a bunch of useful information including links to the original commits[5] (in the future, I think it's worth reconsidering the squashing of commits, but retaining the information in the commit is already an improvement) Neha merged a couple of commits via GitHub already and it went smoothly although we are still missing a few of the pieces described above: 1. CI builds triggered by GitHub PRs (this is supported by Apache Infra, we need to request it for Kafka and provide whatever configuration is needed) 2. Adapting Spark's merge_park_pr script and integrating it into the kafka Git repository 3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to the Git repository (this is shown when someone is creating a pull request) 4. Go through existing GitHub pull requests and close the ones that are no longer relevant (there are quite a few as people have been opening them over the years, but nothing was done about most of them) 5. Other things I may be missing I am volunteering to help with the above if people agree that this is the right direction for Kafka. Thoughts? Best. Ismael P.S. I was told in the Apache Infra HipChat that it's not currently possible (and there are no plans to change that in the near future) to use the GitHub merge button to merge PRs. The merge script does quite a few useful things that the merge button does not in any case. [1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark [2] https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614 [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py [5] https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00
Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review
Thanks a bunch for taking this up, Ismael! +1, I think it will be much more convenient to sunset RB and move to github. Especially looking forward to the CIs on PRs and also the merge script. Alas, my wonderful patch-review script will be retired :-) On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma wrote: > Hi all, > > Kafka currently uses a combination of Review Board and JIRA for > contributions and code review. In my opinion, this makes contribution and > code review a bit harder than it has to be. > > I think the approach used by Spark would improve the current situation: > > "Generally, Spark uses JIRA to track logical issues, including bugs and > improvements, and uses Github pull requests to manage the review and merge > of specific code changes. That is, JIRAs are used to describe what should > be fixed or changed, and high-level approaches, and pull requests describe > how to implement that change in the project's source code. For example, > major design decisions are discussed in JIRA."[1] > > It's worth reading the wiki page for all the details, but I will summarise > the suggested workflow for code changes: > >1. Fork the Github repository at http://github.com/apache/kafka (if you >haven't already) >2. git checkout -b kafka-XXX >3. Make one or more commits (smaller commits can be easier to review and >reviewboard makes that hard) >4. git push origin kafka-XXX >5. Create PR against upstream/trunk (this will update JIRA >automatically[2] and it will send an email to the dev mailing list too) >6. A CI build will be triggered[3] >7. Review process happens on GitHub (it's quite handy to be able to >comment on both commit or PR-level, unlike Review Board) >8. Once all feedback has been addressed and the build is green, a >variant of the `merge_spark_pr.py`[4] script is used to squash, merge, >push, close the PR and JIRA issue. The squashed commit generated by the >script includes a bunch of useful information including links to the >original commits[5] (in the future, I think it's worth reconsidering the >squashing of commits, but retaining the information in the commit is >already an improvement) > > Neha merged a couple of commits via GitHub already and it went smoothly > although we are still missing a few of the pieces described above: > >1. CI builds triggered by GitHub PRs (this is supported by Apache Infra, >we need to request it for Kafka and provide whatever configuration is >needed) >2. Adapting Spark's merge_park_pr script and integrating it into the >kafka Git repository >3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to >the Git repository (this is shown when someone is creating a pull > request) >4. Go through existing GitHub pull requests and close the ones that are >no longer relevant (there are quite a few as people have been opening > them >over the years, but nothing was done about most of them) >5. Other things I may be missing > > I am volunteering to help with the above if people agree that this is the > right direction for Kafka. Thoughts? > > Best. > Ismael > > P.S. I was told in the Apache Infra HipChat that it's not currently > possible (and there are no plans to change that in the near future) to use > the GitHub merge button to merge PRs. The merge script does quite a few > useful things that the merge button does not in any case. > > [1] > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > [2] > > https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614 > [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now > [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py > [5] > > https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00 > -- Thanks, Neha
[DISCUSS] Using GitHub Pull Requests for contributions and code review
Hi all, Kafka currently uses a combination of Review Board and JIRA for contributions and code review. In my opinion, this makes contribution and code review a bit harder than it has to be. I think the approach used by Spark would improve the current situation: "Generally, Spark uses JIRA to track logical issues, including bugs and improvements, and uses Github pull requests to manage the review and merge of specific code changes. That is, JIRAs are used to describe what should be fixed or changed, and high-level approaches, and pull requests describe how to implement that change in the project's source code. For example, major design decisions are discussed in JIRA."[1] It's worth reading the wiki page for all the details, but I will summarise the suggested workflow for code changes: 1. Fork the Github repository at http://github.com/apache/kafka (if you haven't already) 2. git checkout -b kafka-XXX 3. Make one or more commits (smaller commits can be easier to review and reviewboard makes that hard) 4. git push origin kafka-XXX 5. Create PR against upstream/trunk (this will update JIRA automatically[2] and it will send an email to the dev mailing list too) 6. A CI build will be triggered[3] 7. Review process happens on GitHub (it's quite handy to be able to comment on both commit or PR-level, unlike Review Board) 8. Once all feedback has been addressed and the build is green, a variant of the `merge_spark_pr.py`[4] script is used to squash, merge, push, close the PR and JIRA issue. The squashed commit generated by the script includes a bunch of useful information including links to the original commits[5] (in the future, I think it's worth reconsidering the squashing of commits, but retaining the information in the commit is already an improvement) Neha merged a couple of commits via GitHub already and it went smoothly although we are still missing a few of the pieces described above: 1. CI builds triggered by GitHub PRs (this is supported by Apache Infra, we need to request it for Kafka and provide whatever configuration is needed) 2. Adapting Spark's merge_park_pr script and integrating it into the kafka Git repository 3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to the Git repository (this is shown when someone is creating a pull request) 4. Go through existing GitHub pull requests and close the ones that are no longer relevant (there are quite a few as people have been opening them over the years, but nothing was done about most of them) 5. Other things I may be missing I am volunteering to help with the above if people agree that this is the right direction for Kafka. Thoughts? Best. Ismael P.S. I was told in the Apache Infra HipChat that it's not currently possible (and there are no plans to change that in the near future) to use the GitHub merge button to merge PRs. The merge script does quite a few useful things that the merge button does not in any case. [1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark [2] https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614 [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py [5] https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00