Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-15 Thread Guozhang Wang
Another questions regarding the kafka-merge-pr.py script: once a
committer gives a LGTM on the PR, could the contributor use the script to
push to the git repo himself, or it must be executed by the committer? I
thought it is the former case but seems not. If not we need to make it
clear on the wiki.

Guozhang

On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma ism...@juma.me.uk wrote:

 On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao j...@confluent.io wrote:

  I made a couple of changes to the new Jenkins job. Could you try again?
 

 It's still not working, unfortunately. It may or may not be related to:

 https://blogs.apache.org/infra/entry/mirroring_to_github_issues

 For b, if we can't easily change the behavior of pull request bot, we can
  also just flip back and forth btw In Progress and Open.
 

 It turns out that the ASF bot doesn't change the status at all. Spark is
 using a different mechanism to cause the status change (via a `sparkuser`).
 The JIRA ticket status will have to be changed manually for now. I have
 updated the instructions accordingly. I was told in the INFRA channel to
 file a JIRA ticket and they would see if auto transition listeners can be
 used to automate this (no guarantees though).

 Ismael




-- 
-- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-15 Thread Gwen Shapira
The script just runs a git push, it will be with your user (from git
configuration), so you need privileges to push.

On Wed, Jul 15, 2015 at 12:59 PM, Guozhang Wang wangg...@gmail.com wrote:
 Another questions regarding the kafka-merge-pr.py script: once a
 committer gives a LGTM on the PR, could the contributor use the script to
 push to the git repo himself, or it must be executed by the committer? I
 thought it is the former case but seems not. If not we need to make it
 clear on the wiki.

 Guozhang

 On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma ism...@juma.me.uk wrote:

 On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao j...@confluent.io wrote:

  I made a couple of changes to the new Jenkins job. Could you try again?
 

 It's still not working, unfortunately. It may or may not be related to:

 https://blogs.apache.org/infra/entry/mirroring_to_github_issues

 For b, if we can't easily change the behavior of pull request bot, we can
  also just flip back and forth btw In Progress and Open.
 

 It turns out that the ASF bot doesn't change the status at all. Spark is
 using a different mechanism to cause the status change (via a `sparkuser`).
 The JIRA ticket status will have to be changed manually for now. I have
 updated the instructions accordingly. I was told in the INFRA channel to
 file a JIRA ticket and they would see if auto transition listeners can be
 used to automate this (no guarantees though).

 Ismael




 --
 -- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-15 Thread Guozhang Wang
Thanks Gwen, will update the wiki then.

On Wed, Jul 15, 2015 at 1:03 PM, Gwen Shapira gshap...@cloudera.com wrote:

 The script just runs a git push, it will be with your user (from git
 configuration), so you need privileges to push.

 On Wed, Jul 15, 2015 at 12:59 PM, Guozhang Wang wangg...@gmail.com
 wrote:
  Another questions regarding the kafka-merge-pr.py script: once a
  committer gives a LGTM on the PR, could the contributor use the script to
  push to the git repo himself, or it must be executed by the committer? I
  thought it is the former case but seems not. If not we need to make it
  clear on the wiki.
 
  Guozhang
 
  On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma ism...@juma.me.uk wrote:
 
  On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao j...@confluent.io wrote:
 
   I made a couple of changes to the new Jenkins job. Could you try
 again?
  
 
  It's still not working, unfortunately. It may or may not be related to:
 
  https://blogs.apache.org/infra/entry/mirroring_to_github_issues
 
  For b, if we can't easily change the behavior of pull request bot, we
 can
   also just flip back and forth btw In Progress and Open.
  
 
  It turns out that the ASF bot doesn't change the status at all. Spark is
  using a different mechanism to cause the status change (via a
 `sparkuser`).
  The JIRA ticket status will have to be changed manually for now. I have
  updated the instructions accordingly. I was told in the INFRA channel to
  file a JIRA ticket and they would see if auto transition listeners can
 be
  used to automate this (no guarantees though).
 
  Ismael
 
 
 
 
  --
  -- Guozhang




-- 
-- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-14 Thread Ismael Juma
Hi Jun,

On Tue, Jul 14, 2015 at 2:09 AM, Jun Rao j...@confluent.io wrote:

 Ismael,

 I followed the instructions in KAFKA-2320 and created a new Jenkins job (
 https://builds.apache.org/job/kafka-trunk-git-pr/).  Could you check if it
 works?


Thanks! It seems to be building when trunk changes as opposed to when PRs
are submitted (eg https://github.com/apache/kafka/pull/74). I'll see if I
can get some help from INFRA.

As for wiki, I have a couple of minor comments.

 a. Could we add the following to the wiki?
 To avoid conflicts, assign a jira to yourself if you plan to work on it. If
 you are creating a jira and don't plan to work on it, leave the assignee as
 Unassigned.


Good point, done.

b. Previously, we mark a jira as Patch Available if there is a patch.
 Could we reuse that instead of In Progress to be consistent? Also, if a
 patch needs more work after review, the reviewer will mark the jira back to
 In Progress.


I added this as an additional step for the user because I don't think we
can customise the behaviour of the pull request bot (which sets the status
to In Progress). It's a bit redundant when a PR is created, so a bit
annoying. I'll double-check with INFRA.

I also mentioned in the review section about the reviewer changing the JIRA
status back to In Progress.

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-14 Thread Ismael Juma
On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao j...@confluent.io wrote:

 I made a couple of changes to the new Jenkins job. Could you try again?


It's still not working, unfortunately. It may or may not be related to:

https://blogs.apache.org/infra/entry/mirroring_to_github_issues

For b, if we can't easily change the behavior of pull request bot, we can
 also just flip back and forth btw In Progress and Open.


It turns out that the ASF bot doesn't change the status at all. Spark is
using a different mechanism to cause the status change (via a `sparkuser`).
The JIRA ticket status will have to be changed manually for now. I have
updated the instructions accordingly. I was told in the INFRA channel to
file a JIRA ticket and they would see if auto transition listeners can be
used to automate this (no guarantees though).

Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-13 Thread Jun Rao
Ismael,

I followed the instructions in KAFKA-2320 and created a new Jenkins job (
https://builds.apache.org/job/kafka-trunk-git-pr/).  Could you check if it
works?

As for wiki, I have a couple of minor comments.

a. Could we add the following to the wiki?
To avoid conflicts, assign a jira to yourself if you plan to work on it. If
you are creating a jira and don't plan to work on it, leave the assignee as
Unassigned.

b. Previously, we mark a jira as Patch Available if there is a patch.
Could we reuse that instead of In Progress to be consistent? Also, if a
patch needs more work after review, the reviewer will mark the jira back to
In Progress.

Thanks,

Jun




On Wed, Jul 8, 2015 at 1:39 AM, Ismael Juma ism...@juma.me.uk wrote:

 An update on this.

 On Thu, Apr 30, 2015 at 2:12 PM, Ismael Juma ism...@juma.me.uk wrote:

 
 1. CI builds triggered by GitHub PRs (this is supported by Apache
 Infra, we need to request it for Kafka and provide whatever
 configuration
 is needed)
 
  Filed https://issues.apache.org/jira/browse/KAFKA-2320, someone with a
 Jenkins account needs to follow the instructions there.

 
 1. Adapting Spark's merge_park_pr script and integrating it into the
 kafka Git repository
 
  https://issues.apache.org/jira/browse/KAFKA-2187 includes a patch that
 has received an initial review by Neha.

 
 1. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md
 to the Git repository (this is shown when someone is creating a pull
 request)
 
  Initial versions (feedback and/or improvements are welcome):

-

 https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
-

 https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review#Patchsubmissionandreview-MergingGitHubPullRequests
- https://issues.apache.org/jira/browse/KAFKA-2321 (patch available)


 1. Go through existing GitHub pull requests and close the ones that
 are no longer relevant (there are quite a few as people have been
 opening
 them over the years, but nothing was done about most of them)
 
  Not done yet. I think this should wait until we have merged a few PRs as
 I
 would like to invite people to open new PRs if it's still relevant while
 pointing them to the documentation on how to go about it.

 
 1. Other things I may be missing
 
  We also need to update the Contributing page on the website. I think
 this should also wait until we are happy that the new approach works well
 for us.

 Any help moving this forward is appreciated. Aside from reviews, feedback
 and merging the changes; testing the new process is particularly useful (
 https://issues.apache.org/jira/browse/KAFKA-2276 is an example).

 Best,
 Ismael



Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-10 Thread Guozhang Wang
Hi Ismael,


I have a couple of comments on the wiki pages / merge scripts:


1. In the wiki page it mentions If the change is new, then it usually
needs a new JIRA. However, trivial changes, where what should change is
virtually the same as how it should change do not require a JIRA.
Example: Fix typos in Foo scaladoc.


So it sounds we are going to allow pull requests without a JIRA ticket, but
later we are also mentioning:


The PR title should be of the form [KAFKA-].. which is a bit
conflicting with the previous statement. Could you clarify it? Today our
commits are mostly with JIRAs except some trivial ones that are only done
by committers, I can either extend this to non-committers or not, just that
we need to make it clear.

2. The git commit message is a little verbose to me, for example:


--

commit ee88dbb67f19b787e12ccef37982c9459b78c7b6

Author: Geoff Anderson ge...@confluent.io

Date:   Thu Jul 9 14:58:01 2015 -0700

   KAFKA-2327; broker doesn't start if config defines advertised.host but
not advertised.port



   Added unit tests as well. These fail without the fix, but pass with the
fix.



   Author: Geoff Anderson ge...@confluent.io



   Closes #73 from granders/KAFKA-2327 and squashes the following commits:



   52a2085 [Geoff Anderson] Cleaned up unecessary toString calls

   23b3340 [Geoff Anderson] Fixes KAFKA-2327

--


The Author field is redundant, and we could probably also omit the github
commits. What do you think?


Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-10 Thread Ismael Juma
Hi Guozhang,

Comments inline.

On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang wangg...@gmail.com wrote:

 I have a couple of comments on the wiki pages / merge scripts:


Thanks, it's important to discuss these things as the current version is
based on what the Spark project does and may not match what we want to do
exactly.

1. In the wiki page it mentions If the change is new, then it usually
 needs a new JIRA. However, trivial changes, where what should change is
 virtually the same as how it should change do not require a JIRA.
 Example: Fix typos in Foo scaladoc.

 So it sounds we are going to allow pull requests without a JIRA ticket, but
 later we are also mentioning:

 The PR title should be of the form [KAFKA-].. which is a bit
 conflicting with the previous statement. Could you clarify it? Today our
 commits are mostly with JIRAs except some trivial ones that are only done
 by committers, I can either extend this to non-committers or not, just that
 we need to make it clear.


I agree that it's not very clear and it needs to be fixed. What do we want
to do though? It's a trade-off between consistency (always have a JIRA
ticket) and avoiding redundancy (skip the JIRA where it doesn't add
anything). The former is the more conservative option and I am happy to
update the documentation if that's the preferred option.

2. The git commit message is a little verbose to me, for example:


 --

 commit ee88dbb67f19b787e12ccef37982c9459b78c7b6

 Author: Geoff Anderson ge...@confluent.io

 Date:   Thu Jul 9 14:58:01 2015 -0700

KAFKA-2327; broker doesn't start if config defines advertised.host but
 not advertised.port



Added unit tests as well. These fail without the fix, but pass with the
 fix.



Author: Geoff Anderson ge...@confluent.io



Closes #73 from granders/KAFKA-2327 and squashes the following commits:



52a2085 [Geoff Anderson] Cleaned up unecessary toString calls

23b3340 [Geoff Anderson] Fixes KAFKA-2327

 --


 The Author field is redundant, and we could probably also omit the github
 commits. What do you think?


Is it harmful to have detailed information? None of it is actually
redundant as far as I can see (but I could be missing something). There is
the squashed commit author, the pull request message, the pull request
author and the information about the individual commits. Even though Geoff
worked on this PR by himself, multiple people can collaborate on a feature
and then it's useful to credit correctly.

The fact that we are keeping all the information encourages a style of
development where a few small commits (each commit should must make sense
on its own and pass the tests) are used in the pull request, which helps a
lot during code review and when trying to understand changes after the
fact. This is in contrast with the style where there is a single commit per
feature even when the feature is quite large (we have a few of those
ongoing at the moment). I don't know if you noticed, but GitHub actually
links to the original commits, which is quite handy:

https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6

This approach is a bit of a workaround, but it's easier as the script
handles everything (credit to the Spark project once again, it's their
code). The ideal scenario would be to keep the individual commits by
merging the branch into trunk after a rebase (no fast-forward, so that a
merge commit is maintained). This has a number of nice characteristics
(history is linear, it's clear what commits were part of the merged branch,
easy to bisect, easy to revert, etc.), but it requires a bit more work,
more Git knowledge and ideally a PR builder that builds and tests every
commit in the branch (Typesafe wrote a tool that does this for the Scala
project, for what is worth). In other words, a step too far for us right
now. :)

What do you think?

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-10 Thread Guozhang Wang
Commends inlined.

On Fri, Jul 10, 2015 at 2:10 PM, Ismael Juma ism...@juma.me.uk wrote:

 Hi Guozhang,

 Comments inline.

 On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang wangg...@gmail.com wrote:
 
  I have a couple of comments on the wiki pages / merge scripts:
 

 Thanks, it's important to discuss these things as the current version is
 based on what the Spark project does and may not match what we want to do
 exactly.

 1. In the wiki page it mentions If the change is new, then it usually
  needs a new JIRA. However, trivial changes, where what should change is
  virtually the same as how it should change do not require a JIRA.
  Example: Fix typos in Foo scaladoc.
 
  So it sounds we are going to allow pull requests without a JIRA ticket,
 but
  later we are also mentioning:
 
  The PR title should be of the form [KAFKA-].. which is a bit
  conflicting with the previous statement. Could you clarify it? Today our
  commits are mostly with JIRAs except some trivial ones that are only done
  by committers, I can either extend this to non-committers or not, just
 that
  we need to make it clear.
 

 I agree that it's not very clear and it needs to be fixed. What do we want
 to do though? It's a trade-off between consistency (always have a JIRA
 ticket) and avoiding redundancy (skip the JIRA where it doesn't add
 anything). The former is the more conservative option and I am happy to
 update the documentation if that's the preferred option.


Personally I think it is better to not enforcing a JIRA ticket for minor /
hotfix commits, for example, we can format the title with [MINOR] [HOTFIX]
etc as in Spark:

https://github.com/apache/spark/commits/master

But I think it we need some broader discussion for this, maybe we could
start another email thread regarding this?


 2. The git commit message is a little verbose to me, for example:


  --
 
  commit ee88dbb67f19b787e12ccef37982c9459b78c7b6
 
  Author: Geoff Anderson ge...@confluent.io
 
  Date:   Thu Jul 9 14:58:01 2015 -0700
 
 KAFKA-2327; broker doesn't start if config defines advertised.host but
  not advertised.port
 
 
 
 Added unit tests as well. These fail without the fix, but pass with
 the
  fix.
 
 
 
 Author: Geoff Anderson ge...@confluent.io
 
 
 
 Closes #73 from granders/KAFKA-2327 and squashes the following
 commits:
 
 
 
 52a2085 [Geoff Anderson] Cleaned up unecessary toString calls
 
 23b3340 [Geoff Anderson] Fixes KAFKA-2327
 
  --
 
 
  The Author field is redundant, and we could probably also omit the
 github
  commits. What do you think?
 

 Is it harmful to have detailed information? None of it is actually
 redundant as far as I can see (but I could be missing something). There is
 the squashed commit author, the pull request message, the pull request
 author and the information about the individual commits. Even though Geoff
 worked on this PR by himself, multiple people can collaborate on a feature
 and then it's useful to credit correctly.

 The fact that we are keeping all the information encourages a style of
 development where a few small commits (each commit should must make sense
 on its own and pass the tests) are used in the pull request, which helps a
 lot during code review and when trying to understand changes after the
 fact. This is in contrast with the style where there is a single commit per
 feature even when the feature is quite large (we have a few of those
 ongoing at the moment). I don't know if you noticed, but GitHub actually
 links to the original commits, which is quite handy:


 https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6

 This approach is a bit of a workaround, but it's easier as the script
 handles everything (credit to the Spark project once again, it's their
 code). The ideal scenario would be to keep the individual commits by
 merging the branch into trunk after a rebase (no fast-forward, so that a
 merge commit is maintained). This has a number of nice characteristics
 (history is linear, it's clear what commits were part of the merged branch,
 easy to bisect, easy to revert, etc.), but it requires a bit more work,
 more Git knowledge and ideally a PR builder that builds and tests every
 commit in the branch (Typesafe wrote a tool that does this for the Scala
 project, for what is worth). In other words, a step too far for us right
 now. :)

 What do you think?


I think I'm sold on your comments, makes sense :)


 Best,
 Ismael




-- 
-- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-08 Thread Ismael Juma
An update on this.

On Thu, Apr 30, 2015 at 2:12 PM, Ismael Juma ism...@juma.me.uk wrote:


1. CI builds triggered by GitHub PRs (this is supported by Apache
Infra, we need to request it for Kafka and provide whatever configuration
is needed)

 Filed https://issues.apache.org/jira/browse/KAFKA-2320, someone with a
Jenkins account needs to follow the instructions there.


1. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository

 https://issues.apache.org/jira/browse/KAFKA-2187 includes a patch that
has received an initial review by Neha.


1. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md
to the Git repository (this is shown when someone is creating a pull
request)

 Initial versions (feedback and/or improvements are welcome):

   -
   https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
   -
   
https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review#Patchsubmissionandreview-MergingGitHubPullRequests
   - https://issues.apache.org/jira/browse/KAFKA-2321 (patch available)


1. Go through existing GitHub pull requests and close the ones that
are no longer relevant (there are quite a few as people have been opening
them over the years, but nothing was done about most of them)

 Not done yet. I think this should wait until we have merged a few PRs as I
would like to invite people to open new PRs if it's still relevant while
pointing them to the documentation on how to go about it.


1. Other things I may be missing

 We also need to update the Contributing page on the website. I think
this should also wait until we are happy that the new approach works well
for us.

Any help moving this forward is appreciated. Aside from reviews, feedback
and merging the changes; testing the new process is particularly useful (
https://issues.apache.org/jira/browse/KAFKA-2276 is an example).

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-21 Thread Ismael Juma
On Fri, May 1, 2015 at 8:38 AM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 One thing I noticed is that when you try to generate a PR it defaults to
 the 0.8.2 branch. Can we fix that up to be trunk by default?


I filed an INFRA ticket for this:

https://issues.apache.org/jira/browse/INFRA-9680

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-05 Thread Ismael Juma
Hi Jun,

On Sat, May 2, 2015 at 2:50 PM, Jun Rao j...@confluent.io wrote:

 We will also need to figure out if we need CONTRIBUTING.md like the
 following to take care of the Apache licensing stuff.

 https://github.com/apache/spark/blob/master/CONTRIBUTING.md


Yes indeed. That is in step 3 of the missing pieces in the original email.

As for merging changes, I think squashing the commits will be ideal during
 merge.


The script does squash commits during merge indeed. It also includes
retains a bunch of useful information in the commit message. I think one
loses some of the benefits of git when using a one squashed commit per PR
approach, but that is a separate issue and I am not suggesting a change in
this regard at this point.


 Also, it would be great if we can always put the latest changes on
 top during the merge.


Just to make sure I understand correctly, do you mean updating the working
copy with the latest from trunk before merging? Or did I misunderstand?

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-05 Thread Ismael Juma
Hi Ewen,

Comments inline.

On Fri, May 1, 2015 at 8:38 AM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 Also +1. There are some drawbacks to using Github for reviews, e.g. lots of
 emails for each review because they don't let you publish your entire
 review in one go like RB does, but it drastically lowers the barrier to
 contributing for most developers. Also, if you haven't tried it, hub
 https://hub.github.com/ makes it really easy to checkout and test PRs.


Good points.

One thing I noticed is that when you try to generate a PR it defaults to
 the 0.8.2 branch. Can we fix that up to be trunk by default? That's the
 most common use case; version branches are really only useful when a
 release is being prepared. Do changes to the Github repo require tickets to
 the Apache Infra team or is this something committers have control over?


It can be changed to trunk, but it does require a ticket to the Infra
team[1]. Sadly, PRs can only be closed via commits, PR creator or a ticket
to Infra. This is quite clunky for abandoned PRs (a label can be used to
exclude such PRs from view, if needed).

On a related note, which perhaps should be discussed on another thread:

 The CI setup is a related issue that we might want to rethink.


I did think about Travis as well, but I thought that should be considered
as a subsequent and separate discussion indeed.

Best,
Ismael

[1] An example of such a request:
https://issues.apache.org/jira/browse/INFRA-9208


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-02 Thread Jun Rao
Ismael,

We will also need to figure out if we need CONTRIBUTING.md like the
following to take care of the Apache licensing stuff.

https://github.com/apache/spark/blob/master/CONTRIBUTING.md

As for merging changes, I think squashing the commits will be ideal during
merge. Also, it would be great if we can always put the latest changes on
top during the merge.

Thanks,

Jun

On Thu, Apr 30, 2015 at 8:12 AM, Ismael Juma ism...@juma.me.uk wrote:

 Hi all,

 Kafka currently uses a combination of Review Board and JIRA for
 contributions and code review. In my opinion, this makes contribution and
 code review a bit harder than it has to be.

 I think the approach used by Spark would improve the current situation:

 Generally, Spark uses JIRA to track logical issues, including bugs and
 improvements, and uses Github pull requests to manage the review and merge
 of specific code changes. That is, JIRAs are used to describe what should
 be fixed or changed, and high-level approaches, and pull requests describe
 how to implement that change in the project's source code. For example,
 major design decisions are discussed in JIRA.[1]

 It's worth reading the wiki page for all the details, but I will summarise
 the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

 Neha merged a couple of commits via GitHub already and it went smoothly
 although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull
 request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening
 them
over the years, but nothing was done about most of them)
5. Other things I may be missing

 I am volunteering to help with the above if people agree that this is the
 right direction for Kafka. Thoughts?

 Best.
 Ismael

 P.S. I was told in the Apache Infra HipChat that it's not currently
 possible (and there are no plans to change that in the near future) to use
 the GitHub merge button to merge PRs. The merge script does quite a few
 useful things that the merge button does not in any case.

 [1]
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
 [2]

 https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
 [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
 [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
 [5]

 https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00



Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-02 Thread Jay Kreps
+1!

-Jay

On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma ism...@juma.me.uk wrote:

 Hi all,

 Kafka currently uses a combination of Review Board and JIRA for
 contributions and code review. In my opinion, this makes contribution and
 code review a bit harder than it has to be.

 I think the approach used by Spark would improve the current situation:

 Generally, Spark uses JIRA to track logical issues, including bugs and
 improvements, and uses Github pull requests to manage the review and merge
 of specific code changes. That is, JIRAs are used to describe what should
 be fixed or changed, and high-level approaches, and pull requests describe
 how to implement that change in the project's source code. For example,
 major design decisions are discussed in JIRA.[1]

 It's worth reading the wiki page for all the details, but I will summarise
 the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

 Neha merged a couple of commits via GitHub already and it went smoothly
 although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull
 request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening
 them
over the years, but nothing was done about most of them)
5. Other things I may be missing

 I am volunteering to help with the above if people agree that this is the
 right direction for Kafka. Thoughts?

 Best.
 Ismael

 P.S. I was told in the Apache Infra HipChat that it's not currently
 possible (and there are no plans to change that in the near future) to use
 the GitHub merge button to merge PRs. The merge script does quite a few
 useful things that the merge button does not in any case.

 [1]
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
 [2]

 https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
 [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
 [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
 [5]

 https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00



Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-01 Thread Parth Brahmbhatt
+1. 

Thanks
Parth

On 5/1/15, 12:38 AM, Ewen Cheslack-Postava e...@confluent.io wrote:

Also +1. There are some drawbacks to using Github for reviews, e.g. lots
of
emails for each review because they don't let you publish your entire
review in one go like RB does, but it drastically lowers the barrier to
contributing for most developers. Also, if you haven't tried it, hub
https://hub.github.com/ makes it really easy to checkout and test PRs.

One thing I noticed is that when you try to generate a PR it defaults to
the 0.8.2 branch. Can we fix that up to be trunk by default? That's the
most common use case; version branches are really only useful when a
release is being prepared. Do changes to the Github repo require tickets
to
the Apache Infra team or is this something committers have control over?


On a related note, which perhaps should be discussed on another thread:

The CI setup is a related issue that we might want to rethink. Apache also
supports Travis CI, and now pays for dedicated build slaves:
https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci I
think the setup should be pretty easy since building + running tests is
just a gradle command. Having tests run automatically on PRs (and
promptly!) makes it a lot easier to confidently commit a change,
especially
if the build merges with trunk first. I started looking at this when I was
trying to sort out Confluent's build  test infrastructure. See
https://travis-ci.org/ewencp/kafka/builds/60802386 for an example, the
patch is about 10 lines in a .travis.yml file and the failure in that
example seems to be unrelated to the Travis confg. I think the basic
config
I created is all we'd need.

Unfortunately, I couldn't easily tell what the delay on builds is, i.e.
would it be an improvement over the delays with the current Jenkins setup.
But having them run on PR creation/update means the results will usually
be
ready by the time a reviewer gets to looking at the PR and would be
reported in the PR so the state is easy to evaluate. (I'm also having
trouble telling exactly how the two ASF Jenkins builders differ since they
both seem to poll trunk, so I can't be certain whether Travis would be
able
to completely replace the current setup. That said, it should be telling
that I have never paid attention to Jenkins output at all since it seems
so
far removed from any of my actions as a contributor.)

As another alternative, Confluent would also be happy to provide
build/test
infrastructure to help out. AMPLab does this for Spark, so it seems to be
acceptable to ASF. We already have PR builders and trunk/master builders
set up for other projects, so it wouldn't be hard to setup the same for
Kafka with the right access to the repos. Based on the current frequency
of
builds (https://builds.apache.org/view/All/job/Kafka-trunk/ and
https://builds.apache.org/view/All/job/KafkaPreCommit/) I think it'll be
easy for even our current infrastructure to keep up.


On Thu, Apr 30, 2015 at 9:41 PM, Jaikiran Pai jai.forums2...@gmail.com
wrote:

 I think this will help a lot in contributions. Some of my local changes
 that I want to contribute back have been pending because I sometimes
switch
 machines and I then have to go through setting up the Ruby/python and
other
 stuff for the current review process. Using just github is going to
help in
 quickly submitting the changes.

 -Jaikiran

 On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:

 Hi all,

 Kafka currently uses a combination of Review Board and JIRA for
 contributions and code review. In my opinion, this makes contribution
and
 code review a bit harder than it has to be.

 I think the approach used by Spark would improve the current situation:

 Generally, Spark uses JIRA to track logical issues, including bugs and
 improvements, and uses Github pull requests to manage the review and
merge
 of specific code changes. That is, JIRAs are used to describe what
should
 be fixed or changed, and high-level approaches, and pull requests
describe
 how to implement that change in the project's source code. For example,
 major design decisions are discussed in JIRA.[1]

 It's worth reading the wiki page for all the details, but I will
summarise
 the suggested workflow for code changes:

 1. Fork the Github repository at http://github.com/apache/kafka (if
 you
 haven't already)
 2. git checkout -b kafka-XXX
 3. Make one or more commits (smaller commits can be easier to
review
 and
 reviewboard makes that hard)
 4. git push origin kafka-XXX
 5. Create PR against upstream/trunk (this will update JIRA
 automatically[2] and it will send an email to the dev mailing list
 too)
 6. A CI build will be triggered[3]
 7. Review process happens on GitHub (it's quite handy to be able to
 comment on both commit or PR-level, unlike Review Board)
 8. Once all feedback has been addressed and the build is green, a
 variant of the `merge_spark_pr.py`[4] script is used 

Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Jaikiran Pai
I think this will help a lot in contributions. Some of my local changes 
that I want to contribute back have been pending because I sometimes 
switch machines and I then have to go through setting up the Ruby/python 
and other stuff for the current review process. Using just github is 
going to help in quickly submitting the changes.


-Jaikiran
On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:

Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA.[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening them
over the years, but nothing was done about most of them)
5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00





Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Neha Narkhede
Thanks a bunch for taking this up, Ismael!

+1, I think it will be much more convenient to sunset RB and move to
github. Especially looking forward to the CIs on PRs and also the merge
script.

Alas, my wonderful patch-review script will be retired :-)

On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma ism...@juma.me.uk wrote:

 Hi all,

 Kafka currently uses a combination of Review Board and JIRA for
 contributions and code review. In my opinion, this makes contribution and
 code review a bit harder than it has to be.

 I think the approach used by Spark would improve the current situation:

 Generally, Spark uses JIRA to track logical issues, including bugs and
 improvements, and uses Github pull requests to manage the review and merge
 of specific code changes. That is, JIRAs are used to describe what should
 be fixed or changed, and high-level approaches, and pull requests describe
 how to implement that change in the project's source code. For example,
 major design decisions are discussed in JIRA.[1]

 It's worth reading the wiki page for all the details, but I will summarise
 the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

 Neha merged a couple of commits via GitHub already and it went smoothly
 although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull
 request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening
 them
over the years, but nothing was done about most of them)
5. Other things I may be missing

 I am volunteering to help with the above if people agree that this is the
 right direction for Kafka. Thoughts?

 Best.
 Ismael

 P.S. I was told in the Apache Infra HipChat that it's not currently
 possible (and there are no plans to change that in the near future) to use
 the GitHub merge button to merge PRs. The merge script does quite a few
 useful things that the merge button does not in any case.

 [1]
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
 [2]

 https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
 [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
 [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
 [5]

 https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00




-- 
Thanks,
Neha