Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-15 Thread Guozhang Wang
Thanks Gwen, will update the wiki then.

On Wed, Jul 15, 2015 at 1:03 PM, Gwen Shapira  wrote:

> The script just runs a git push, it will be with your user (from git
> configuration), so you need privileges to push.
>
> On Wed, Jul 15, 2015 at 12:59 PM, Guozhang Wang 
> wrote:
> > Another questions regarding the "kafka-merge-pr.py" script: once a
> > committer gives a LGTM on the PR, could the contributor use the script to
> > push to the git repo himself, or it must be executed by the committer? I
> > thought it is the former case but seems not. If not we need to make it
> > clear on the wiki.
> >
> > Guozhang
> >
> > On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma  wrote:
> >
> >> On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao  wrote:
> >>
> >> > I made a couple of changes to the new Jenkins job. Could you try
> again?
> >> >
> >>
> >> It's still not working, unfortunately. It may or may not be related to:
> >>
> >> https://blogs.apache.org/infra/entry/mirroring_to_github_issues
> >>
> >> For b, if we can't easily change the behavior of pull request bot, we
> can
> >> > also just flip back and forth btw "In Progress" and "Open".
> >> >
> >>
> >> It turns out that the ASF bot doesn't change the status at all. Spark is
> >> using a different mechanism to cause the status change (via a
> `sparkuser`).
> >> The JIRA ticket status will have to be changed manually for now. I have
> >> updated the instructions accordingly. I was told in the INFRA channel to
> >> file a JIRA ticket and they would see if auto transition listeners can
> be
> >> used to automate this (no guarantees though).
> >>
> >> Ismael
> >>
> >
> >
> >
> > --
> > -- Guozhang
>



-- 
-- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-15 Thread Gwen Shapira
The script just runs a git push, it will be with your user (from git
configuration), so you need privileges to push.

On Wed, Jul 15, 2015 at 12:59 PM, Guozhang Wang  wrote:
> Another questions regarding the "kafka-merge-pr.py" script: once a
> committer gives a LGTM on the PR, could the contributor use the script to
> push to the git repo himself, or it must be executed by the committer? I
> thought it is the former case but seems not. If not we need to make it
> clear on the wiki.
>
> Guozhang
>
> On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma  wrote:
>
>> On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao  wrote:
>>
>> > I made a couple of changes to the new Jenkins job. Could you try again?
>> >
>>
>> It's still not working, unfortunately. It may or may not be related to:
>>
>> https://blogs.apache.org/infra/entry/mirroring_to_github_issues
>>
>> For b, if we can't easily change the behavior of pull request bot, we can
>> > also just flip back and forth btw "In Progress" and "Open".
>> >
>>
>> It turns out that the ASF bot doesn't change the status at all. Spark is
>> using a different mechanism to cause the status change (via a `sparkuser`).
>> The JIRA ticket status will have to be changed manually for now. I have
>> updated the instructions accordingly. I was told in the INFRA channel to
>> file a JIRA ticket and they would see if auto transition listeners can be
>> used to automate this (no guarantees though).
>>
>> Ismael
>>
>
>
>
> --
> -- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-15 Thread Guozhang Wang
Another questions regarding the "kafka-merge-pr.py" script: once a
committer gives a LGTM on the PR, could the contributor use the script to
push to the git repo himself, or it must be executed by the committer? I
thought it is the former case but seems not. If not we need to make it
clear on the wiki.

Guozhang

On Tue, Jul 14, 2015 at 5:44 PM, Ismael Juma  wrote:

> On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao  wrote:
>
> > I made a couple of changes to the new Jenkins job. Could you try again?
> >
>
> It's still not working, unfortunately. It may or may not be related to:
>
> https://blogs.apache.org/infra/entry/mirroring_to_github_issues
>
> For b, if we can't easily change the behavior of pull request bot, we can
> > also just flip back and forth btw "In Progress" and "Open".
> >
>
> It turns out that the ASF bot doesn't change the status at all. Spark is
> using a different mechanism to cause the status change (via a `sparkuser`).
> The JIRA ticket status will have to be changed manually for now. I have
> updated the instructions accordingly. I was told in the INFRA channel to
> file a JIRA ticket and they would see if auto transition listeners can be
> used to automate this (no guarantees though).
>
> Ismael
>



-- 
-- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-14 Thread Ismael Juma
On Tue, Jul 14, 2015 at 6:15 PM, Jun Rao  wrote:

> I made a couple of changes to the new Jenkins job. Could you try again?
>

It's still not working, unfortunately. It may or may not be related to:

https://blogs.apache.org/infra/entry/mirroring_to_github_issues

For b, if we can't easily change the behavior of pull request bot, we can
> also just flip back and forth btw "In Progress" and "Open".
>

It turns out that the ASF bot doesn't change the status at all. Spark is
using a different mechanism to cause the status change (via a `sparkuser`).
The JIRA ticket status will have to be changed manually for now. I have
updated the instructions accordingly. I was told in the INFRA channel to
file a JIRA ticket and they would see if auto transition listeners can be
used to automate this (no guarantees though).

Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-14 Thread Jun Rao
I made a couple of changes to the new Jenkins job. Could you try again?

For b, if we can't easily change the behavior of pull request bot, we can
also just flip back and forth btw "In Progress" and "Open".

Thanks,

Jun

On Tue, Jul 14, 2015 at 2:13 AM, Ismael Juma  wrote:

> Hi Jun,
>
> On Tue, Jul 14, 2015 at 2:09 AM, Jun Rao  wrote:
>
> > Ismael,
> >
> > I followed the instructions in KAFKA-2320 and created a new Jenkins job (
> > https://builds.apache.org/job/kafka-trunk-git-pr/).  Could you check if
> it
> > works?
> >
>
> Thanks! It seems to be building when trunk changes as opposed to when PRs
> are submitted (eg https://github.com/apache/kafka/pull/74). I'll see if I
> can get some help from INFRA.
>
> As for wiki, I have a couple of minor comments.
> >
> > a. Could we add the following to the wiki?
> > To avoid conflicts, assign a jira to yourself if you plan to work on it.
> If
> > you are creating a jira and don't plan to work on it, leave the assignee
> as
> > Unassigned.
> >
>
> Good point, done.
>
> b. Previously, we mark a jira as "Patch Available" if there is a patch.
> > Could we reuse that instead of "In Progress" to be consistent? Also, if a
> > patch needs more work after review, the reviewer will mark the jira back
> to
> > "In Progress".
> >
>
> I added this as an additional step for the user because I don't think we
> can customise the behaviour of the pull request bot (which sets the status
> to "In Progress"). It's a bit redundant when a PR is created, so a bit
> annoying. I'll double-check with INFRA.
>
> I also mentioned in the review section about the reviewer changing the JIRA
> status back to "In Progress".
>
> Best,
> Ismael
>


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-14 Thread Ismael Juma
Hi Jun,

On Tue, Jul 14, 2015 at 2:09 AM, Jun Rao  wrote:

> Ismael,
>
> I followed the instructions in KAFKA-2320 and created a new Jenkins job (
> https://builds.apache.org/job/kafka-trunk-git-pr/).  Could you check if it
> works?
>

Thanks! It seems to be building when trunk changes as opposed to when PRs
are submitted (eg https://github.com/apache/kafka/pull/74). I'll see if I
can get some help from INFRA.

As for wiki, I have a couple of minor comments.
>
> a. Could we add the following to the wiki?
> To avoid conflicts, assign a jira to yourself if you plan to work on it. If
> you are creating a jira and don't plan to work on it, leave the assignee as
> Unassigned.
>

Good point, done.

b. Previously, we mark a jira as "Patch Available" if there is a patch.
> Could we reuse that instead of "In Progress" to be consistent? Also, if a
> patch needs more work after review, the reviewer will mark the jira back to
> "In Progress".
>

I added this as an additional step for the user because I don't think we
can customise the behaviour of the pull request bot (which sets the status
to "In Progress"). It's a bit redundant when a PR is created, so a bit
annoying. I'll double-check with INFRA.

I also mentioned in the review section about the reviewer changing the JIRA
status back to "In Progress".

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-13 Thread Jun Rao
Ismael,

I followed the instructions in KAFKA-2320 and created a new Jenkins job (
https://builds.apache.org/job/kafka-trunk-git-pr/).  Could you check if it
works?

As for wiki, I have a couple of minor comments.

a. Could we add the following to the wiki?
To avoid conflicts, assign a jira to yourself if you plan to work on it. If
you are creating a jira and don't plan to work on it, leave the assignee as
Unassigned.

b. Previously, we mark a jira as "Patch Available" if there is a patch.
Could we reuse that instead of "In Progress" to be consistent? Also, if a
patch needs more work after review, the reviewer will mark the jira back to
"In Progress".

Thanks,

Jun




On Wed, Jul 8, 2015 at 1:39 AM, Ismael Juma  wrote:

> An update on this.
>
> On Thu, Apr 30, 2015 at 2:12 PM, Ismael Juma  wrote:
>
> >
> >1. CI builds triggered by GitHub PRs (this is supported by Apache
> >Infra, we need to request it for Kafka and provide whatever
> configuration
> >is needed)
> >
> > Filed https://issues.apache.org/jira/browse/KAFKA-2320, someone with a
> Jenkins account needs to follow the instructions there.
>
> >
> >1. Adapting Spark's merge_park_pr script and integrating it into the
> >kafka Git repository
> >
> > https://issues.apache.org/jira/browse/KAFKA-2187 includes a patch that
> has received an initial review by Neha.
>
> >
> >1. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md
> >to the Git repository (this is shown when someone is creating a pull
> >request)
> >
> > Initial versions (feedback and/or improvements are welcome):
>
>-
>
> https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
>-
>
> https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review#Patchsubmissionandreview-MergingGitHubPullRequests
>- https://issues.apache.org/jira/browse/KAFKA-2321 (patch available)
>
>
> >1. Go through existing GitHub pull requests and close the ones that
> >are no longer relevant (there are quite a few as people have been
> opening
> >them over the years, but nothing was done about most of them)
> >
> > Not done yet. I think this should wait until we have merged a few PRs as
> I
> would like to invite people to open new PRs if it's still relevant while
> pointing them to the documentation on how to go about it.
>
> >
> >1. Other things I may be missing
> >
> > We also need to update the "Contributing" page on the website. I think
> this should also wait until we are happy that the new approach works well
> for us.
>
> Any help moving this forward is appreciated. Aside from reviews, feedback
> and merging the changes; testing the new process is particularly useful (
> https://issues.apache.org/jira/browse/KAFKA-2276 is an example).
>
> Best,
> Ismael
>


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-10 Thread Guozhang Wang
Commends inlined.

On Fri, Jul 10, 2015 at 2:10 PM, Ismael Juma  wrote:

> Hi Guozhang,
>
> Comments inline.
>
> On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang  wrote:
> >
> > I have a couple of comments on the wiki pages / merge scripts:
> >
>
> Thanks, it's important to discuss these things as the current version is
> based on what the Spark project does and may not match what we want to do
> exactly.
>
> 1. In the wiki page it mentions "If the change is new, then it usually
> > needs a new JIRA. However, trivial changes, where "what should change" is
> > virtually the same as "how it should change" do not require a JIRA.
> > Example: "Fix typos in Foo scaladoc"."
> >
> > So it sounds we are going to allow pull requests without a JIRA ticket,
> but
> > later we are also mentioning:
> >
> > "The PR title should be of the form [KAFKA-].." which is a bit
> > conflicting with the previous statement. Could you clarify it? Today our
> > commits are mostly with JIRAs except some trivial ones that are only done
> > by committers, I can either extend this to non-committers or not, just
> that
> > we need to make it clear.
> >
>
> I agree that it's not very clear and it needs to be fixed. What do we want
> to do though? It's a trade-off between consistency (always have a JIRA
> ticket) and avoiding redundancy (skip the JIRA where it doesn't add
> anything). The former is the more conservative option and I am happy to
> update the documentation if that's the preferred option.
>
>
Personally I think it is better to not enforcing a JIRA ticket for minor /
hotfix commits, for example, we can format the title with [MINOR] [HOTFIX]
etc as in Spark:

https://github.com/apache/spark/commits/master

But I think it we need some broader discussion for this, maybe we could
start another email thread regarding this?


> 2. The git commit message is a little verbose to me, for example:
>
>
> > --
> >
> > commit ee88dbb67f19b787e12ccef37982c9459b78c7b6
> >
> > Author: Geoff Anderson 
> >
> > Date:   Thu Jul 9 14:58:01 2015 -0700
> >
> >KAFKA-2327; broker doesn't start if config defines advertised.host but
> > not advertised.port
> >
> >
> >
> >Added unit tests as well. These fail without the fix, but pass with
> the
> > fix.
> >
> >
> >
> >Author: Geoff Anderson 
> >
> >
> >
> >Closes #73 from granders/KAFKA-2327 and squashes the following
> commits:
> >
> >
> >
> >52a2085 [Geoff Anderson] Cleaned up unecessary toString calls
> >
> >23b3340 [Geoff Anderson] Fixes KAFKA-2327
> >
> > --
> >
> >
> > The "Author" field is redundant, and we could probably also omit the
> github
> > commits. What do you think?
> >
>
> Is it harmful to have detailed information? None of it is actually
> redundant as far as I can see (but I could be missing something). There is
> the squashed commit author, the pull request message, the pull request
> author and the information about the individual commits. Even though Geoff
> worked on this PR by himself, multiple people can collaborate on a feature
> and then it's useful to credit correctly.
>
> The fact that we are keeping all the information encourages a style of
> development where a few small commits (each commit should must make sense
> on its own and pass the tests) are used in the pull request, which helps a
> lot during code review and when trying to understand changes after the
> fact. This is in contrast with the style where there is a single commit per
> feature even when the feature is quite large (we have a few of those
> ongoing at the moment). I don't know if you noticed, but GitHub actually
> links to the original commits, which is quite handy:
>
>
> https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6
>
> This approach is a bit of a workaround, but it's easier as the script
> handles everything (credit to the Spark project once again, it's their
> code). The ideal scenario would be to keep the individual commits by
> merging the branch into trunk after a rebase (no fast-forward, so that a
> merge commit is maintained). This has a number of nice characteristics
> (history is linear, it's clear what commits were part of the merged branch,
> easy to bisect, easy to revert, etc.), but it requires a bit more work,
> more Git knowledge and ideally a PR builder that builds and tests every
> commit in the branch (Typesafe wrote a tool that does this for the Scala
> project, for what is worth). In other words, a step too far for us right
> now. :)
>
> What do you think?
>
>
I think I'm sold on your comments, makes sense :)


> Best,
> Ismael
>



-- 
-- Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-10 Thread Ismael Juma
Hi Guozhang,

Comments inline.

On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang  wrote:
>
> I have a couple of comments on the wiki pages / merge scripts:
>

Thanks, it's important to discuss these things as the current version is
based on what the Spark project does and may not match what we want to do
exactly.

1. In the wiki page it mentions "If the change is new, then it usually
> needs a new JIRA. However, trivial changes, where "what should change" is
> virtually the same as "how it should change" do not require a JIRA.
> Example: "Fix typos in Foo scaladoc"."
>
> So it sounds we are going to allow pull requests without a JIRA ticket, but
> later we are also mentioning:
>
> "The PR title should be of the form [KAFKA-].." which is a bit
> conflicting with the previous statement. Could you clarify it? Today our
> commits are mostly with JIRAs except some trivial ones that are only done
> by committers, I can either extend this to non-committers or not, just that
> we need to make it clear.
>

I agree that it's not very clear and it needs to be fixed. What do we want
to do though? It's a trade-off between consistency (always have a JIRA
ticket) and avoiding redundancy (skip the JIRA where it doesn't add
anything). The former is the more conservative option and I am happy to
update the documentation if that's the preferred option.

2. The git commit message is a little verbose to me, for example:


> --
>
> commit ee88dbb67f19b787e12ccef37982c9459b78c7b6
>
> Author: Geoff Anderson 
>
> Date:   Thu Jul 9 14:58:01 2015 -0700
>
>KAFKA-2327; broker doesn't start if config defines advertised.host but
> not advertised.port
>
>
>
>Added unit tests as well. These fail without the fix, but pass with the
> fix.
>
>
>
>Author: Geoff Anderson 
>
>
>
>Closes #73 from granders/KAFKA-2327 and squashes the following commits:
>
>
>
>52a2085 [Geoff Anderson] Cleaned up unecessary toString calls
>
>23b3340 [Geoff Anderson] Fixes KAFKA-2327
>
> --
>
>
> The "Author" field is redundant, and we could probably also omit the github
> commits. What do you think?
>

Is it harmful to have detailed information? None of it is actually
redundant as far as I can see (but I could be missing something). There is
the squashed commit author, the pull request message, the pull request
author and the information about the individual commits. Even though Geoff
worked on this PR by himself, multiple people can collaborate on a feature
and then it's useful to credit correctly.

The fact that we are keeping all the information encourages a style of
development where a few small commits (each commit should must make sense
on its own and pass the tests) are used in the pull request, which helps a
lot during code review and when trying to understand changes after the
fact. This is in contrast with the style where there is a single commit per
feature even when the feature is quite large (we have a few of those
ongoing at the moment). I don't know if you noticed, but GitHub actually
links to the original commits, which is quite handy:

https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6

This approach is a bit of a workaround, but it's easier as the script
handles everything (credit to the Spark project once again, it's their
code). The ideal scenario would be to keep the individual commits by
merging the branch into trunk after a rebase (no fast-forward, so that a
merge commit is maintained). This has a number of nice characteristics
(history is linear, it's clear what commits were part of the merged branch,
easy to bisect, easy to revert, etc.), but it requires a bit more work,
more Git knowledge and ideally a PR builder that builds and tests every
commit in the branch (Typesafe wrote a tool that does this for the Scala
project, for what is worth). In other words, a step too far for us right
now. :)

What do you think?

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-10 Thread Guozhang Wang
Hi Ismael,


I have a couple of comments on the wiki pages / merge scripts:


1. In the wiki page it mentions "If the change is new, then it usually
needs a new JIRA. However, trivial changes, where "what should change" is
virtually the same as "how it should change" do not require a JIRA.
Example: "Fix typos in Foo scaladoc"."


So it sounds we are going to allow pull requests without a JIRA ticket, but
later we are also mentioning:


"The PR title should be of the form [KAFKA-].." which is a bit
conflicting with the previous statement. Could you clarify it? Today our
commits are mostly with JIRAs except some trivial ones that are only done
by committers, I can either extend this to non-committers or not, just that
we need to make it clear.

2. The git commit message is a little verbose to me, for example:


--

commit ee88dbb67f19b787e12ccef37982c9459b78c7b6

Author: Geoff Anderson 

Date:   Thu Jul 9 14:58:01 2015 -0700

   KAFKA-2327; broker doesn't start if config defines advertised.host but
not advertised.port



   Added unit tests as well. These fail without the fix, but pass with the
fix.



   Author: Geoff Anderson 



   Closes #73 from granders/KAFKA-2327 and squashes the following commits:



   52a2085 [Geoff Anderson] Cleaned up unecessary toString calls

   23b3340 [Geoff Anderson] Fixes KAFKA-2327

--


The "Author" field is redundant, and we could probably also omit the github
commits. What do you think?


Guozhang


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-07-08 Thread Ismael Juma
An update on this.

On Thu, Apr 30, 2015 at 2:12 PM, Ismael Juma  wrote:

>
>1. CI builds triggered by GitHub PRs (this is supported by Apache
>Infra, we need to request it for Kafka and provide whatever configuration
>is needed)
>
> Filed https://issues.apache.org/jira/browse/KAFKA-2320, someone with a
Jenkins account needs to follow the instructions there.

>
>1. Adapting Spark's merge_park_pr script and integrating it into the
>kafka Git repository
>
> https://issues.apache.org/jira/browse/KAFKA-2187 includes a patch that
has received an initial review by Neha.

>
>1. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md
>to the Git repository (this is shown when someone is creating a pull
>request)
>
> Initial versions (feedback and/or improvements are welcome):

   -
   https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes
   -
   
https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review#Patchsubmissionandreview-MergingGitHubPullRequests
   - https://issues.apache.org/jira/browse/KAFKA-2321 (patch available)


>1. Go through existing GitHub pull requests and close the ones that
>are no longer relevant (there are quite a few as people have been opening
>them over the years, but nothing was done about most of them)
>
> Not done yet. I think this should wait until we have merged a few PRs as I
would like to invite people to open new PRs if it's still relevant while
pointing them to the documentation on how to go about it.

>
>1. Other things I may be missing
>
> We also need to update the "Contributing" page on the website. I think
this should also wait until we are happy that the new approach works well
for us.

Any help moving this forward is appreciated. Aside from reviews, feedback
and merging the changes; testing the new process is particularly useful (
https://issues.apache.org/jira/browse/KAFKA-2276 is an example).

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-21 Thread Ismael Juma
On Fri, May 1, 2015 at 8:38 AM, Ewen Cheslack-Postava 
wrote:

> One thing I noticed is that when you try to generate a PR it defaults to
> the 0.8.2 branch. Can we fix that up to be trunk by default?
>

I filed an INFRA ticket for this:

https://issues.apache.org/jira/browse/INFRA-9680

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-05 Thread Jun Rao
Yes, it would be great if we can do a final rebase of the branch before
merging. Perhaps the squashing part already does that?

Thanks,

Jun

On Tue, May 5, 2015 at 5:53 AM, Ismael Juma  wrote:

> Hi Jun,
>
> On Sat, May 2, 2015 at 2:50 PM, Jun Rao  wrote:
>
> > We will also need to figure out if we need CONTRIBUTING.md like the
> > following to take care of the Apache licensing stuff.
> >
> > https://github.com/apache/spark/blob/master/CONTRIBUTING.md
>
>
> Yes indeed. That is in step 3 of the "missing pieces" in the original
> email.
>
> As for merging changes, I think squashing the commits will be ideal during
> > merge.
>
>
> The script does squash commits during merge indeed. It also includes
> retains a bunch of useful information in the commit message. I think one
> loses some of the benefits of git when using a one squashed commit per PR
> approach, but that is a separate issue and I am not suggesting a change in
> this regard at this point.
>
>
> > Also, it would be great if we can always put the latest changes on
> > top during the merge.
> >
>
> Just to make sure I understand correctly, do you mean updating the working
> copy with the latest from trunk before merging? Or did I misunderstand?
>
> Best,
> Ismael
>


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-05 Thread Ismael Juma
Hi Jun,

On Sat, May 2, 2015 at 2:50 PM, Jun Rao  wrote:

> We will also need to figure out if we need CONTRIBUTING.md like the
> following to take care of the Apache licensing stuff.
>
> https://github.com/apache/spark/blob/master/CONTRIBUTING.md


Yes indeed. That is in step 3 of the "missing pieces" in the original email.

As for merging changes, I think squashing the commits will be ideal during
> merge.


The script does squash commits during merge indeed. It also includes
retains a bunch of useful information in the commit message. I think one
loses some of the benefits of git when using a one squashed commit per PR
approach, but that is a separate issue and I am not suggesting a change in
this regard at this point.


> Also, it would be great if we can always put the latest changes on
> top during the merge.
>

Just to make sure I understand correctly, do you mean updating the working
copy with the latest from trunk before merging? Or did I misunderstand?

Best,
Ismael


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-05 Thread Ismael Juma
Hi Ewen,

Comments inline.

On Fri, May 1, 2015 at 8:38 AM, Ewen Cheslack-Postava 
wrote:

> Also +1. There are some drawbacks to using Github for reviews, e.g. lots of
> emails for each review because they don't let you publish your entire
> review in one go like RB does, but it drastically lowers the barrier to
> contributing for most developers. Also, if you haven't tried it, hub
> https://hub.github.com/ makes it really easy to checkout and test PRs.
>

Good points.

One thing I noticed is that when you try to generate a PR it defaults to
> the 0.8.2 branch. Can we fix that up to be trunk by default? That's the
> most common use case; version branches are really only useful when a
> release is being prepared. Do changes to the Github repo require tickets to
> the Apache Infra team or is this something committers have control over?
>

It can be changed to trunk, but it does require a ticket to the Infra
team[1]. Sadly, PRs can only be closed via commits, PR creator or a ticket
to Infra. This is quite clunky for abandoned PRs (a label can be used to
exclude such PRs from view, if needed).

On a related note, which perhaps should be discussed on another thread:
>
> The CI setup is a related issue that we might want to rethink.


I did think about Travis as well, but I thought that should be considered
as a subsequent and separate discussion indeed.

Best,
Ismael

[1] An example of such a request:
https://issues.apache.org/jira/browse/INFRA-9208


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-02 Thread Jay Kreps
+1!

-Jay

On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma  wrote:

> Hi all,
>
> Kafka currently uses a combination of Review Board and JIRA for
> contributions and code review. In my opinion, this makes contribution and
> code review a bit harder than it has to be.
>
> I think the approach used by Spark would improve the current situation:
>
> "Generally, Spark uses JIRA to track logical issues, including bugs and
> improvements, and uses Github pull requests to manage the review and merge
> of specific code changes. That is, JIRAs are used to describe what should
> be fixed or changed, and high-level approaches, and pull requests describe
> how to implement that change in the project's source code. For example,
> major design decisions are discussed in JIRA."[1]
>
> It's worth reading the wiki page for all the details, but I will summarise
> the suggested workflow for code changes:
>
>1. Fork the Github repository at http://github.com/apache/kafka (if you
>haven't already)
>2. git checkout -b kafka-XXX
>3. Make one or more commits (smaller commits can be easier to review and
>reviewboard makes that hard)
>4. git push origin kafka-XXX
>5. Create PR against upstream/trunk (this will update JIRA
>automatically[2] and it will send an email to the dev mailing list too)
>6. A CI build will be triggered[3]
>7. Review process happens on GitHub (it's quite handy to be able to
>comment on both commit or PR-level, unlike Review Board)
>8. Once all feedback has been addressed and the build is green, a
>variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
>push, close the PR and JIRA issue. The squashed commit generated by the
>script includes a bunch of useful information including links to the
>original commits[5] (in the future, I think it's worth reconsidering the
>squashing of commits, but retaining the information in the commit is
>already an improvement)
>
> Neha merged a couple of commits via GitHub already and it went smoothly
> although we are still missing a few of the pieces described above:
>
>1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
>we need to request it for Kafka and provide whatever configuration is
>needed)
>2. Adapting Spark's merge_park_pr script and integrating it into the
>kafka Git repository
>3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
>the Git repository (this is shown when someone is creating a pull
> request)
>4. Go through existing GitHub pull requests and close the ones that are
>no longer relevant (there are quite a few as people have been opening
> them
>over the years, but nothing was done about most of them)
>5. Other things I may be missing
>
> I am volunteering to help with the above if people agree that this is the
> right direction for Kafka. Thoughts?
>
> Best.
> Ismael
>
> P.S. I was told in the Apache Infra HipChat that it's not currently
> possible (and there are no plans to change that in the near future) to use
> the GitHub merge button to merge PRs. The merge script does quite a few
> useful things that the merge button does not in any case.
>
> [1]
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> [2]
>
> https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
> [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
> [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
> [5]
>
> https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00
>


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-02 Thread Jun Rao
Ismael,

We will also need to figure out if we need CONTRIBUTING.md like the
following to take care of the Apache licensing stuff.

https://github.com/apache/spark/blob/master/CONTRIBUTING.md

As for merging changes, I think squashing the commits will be ideal during
merge. Also, it would be great if we can always put the latest changes on
top during the merge.

Thanks,

Jun

On Thu, Apr 30, 2015 at 8:12 AM, Ismael Juma  wrote:

> Hi all,
>
> Kafka currently uses a combination of Review Board and JIRA for
> contributions and code review. In my opinion, this makes contribution and
> code review a bit harder than it has to be.
>
> I think the approach used by Spark would improve the current situation:
>
> "Generally, Spark uses JIRA to track logical issues, including bugs and
> improvements, and uses Github pull requests to manage the review and merge
> of specific code changes. That is, JIRAs are used to describe what should
> be fixed or changed, and high-level approaches, and pull requests describe
> how to implement that change in the project's source code. For example,
> major design decisions are discussed in JIRA."[1]
>
> It's worth reading the wiki page for all the details, but I will summarise
> the suggested workflow for code changes:
>
>1. Fork the Github repository at http://github.com/apache/kafka (if you
>haven't already)
>2. git checkout -b kafka-XXX
>3. Make one or more commits (smaller commits can be easier to review and
>reviewboard makes that hard)
>4. git push origin kafka-XXX
>5. Create PR against upstream/trunk (this will update JIRA
>automatically[2] and it will send an email to the dev mailing list too)
>6. A CI build will be triggered[3]
>7. Review process happens on GitHub (it's quite handy to be able to
>comment on both commit or PR-level, unlike Review Board)
>8. Once all feedback has been addressed and the build is green, a
>variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
>push, close the PR and JIRA issue. The squashed commit generated by the
>script includes a bunch of useful information including links to the
>original commits[5] (in the future, I think it's worth reconsidering the
>squashing of commits, but retaining the information in the commit is
>already an improvement)
>
> Neha merged a couple of commits via GitHub already and it went smoothly
> although we are still missing a few of the pieces described above:
>
>1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
>we need to request it for Kafka and provide whatever configuration is
>needed)
>2. Adapting Spark's merge_park_pr script and integrating it into the
>kafka Git repository
>3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
>the Git repository (this is shown when someone is creating a pull
> request)
>4. Go through existing GitHub pull requests and close the ones that are
>no longer relevant (there are quite a few as people have been opening
> them
>over the years, but nothing was done about most of them)
>5. Other things I may be missing
>
> I am volunteering to help with the above if people agree that this is the
> right direction for Kafka. Thoughts?
>
> Best.
> Ismael
>
> P.S. I was told in the Apache Infra HipChat that it's not currently
> possible (and there are no plans to change that in the near future) to use
> the GitHub merge button to merge PRs. The merge script does quite a few
> useful things that the merge button does not in any case.
>
> [1]
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> [2]
>
> https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
> [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
> [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
> [5]
>
> https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00
>


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-01 Thread Parth Brahmbhatt
+1. 

Thanks
Parth

On 5/1/15, 12:38 AM, "Ewen Cheslack-Postava"  wrote:

>Also +1. There are some drawbacks to using Github for reviews, e.g. lots
>of
>emails for each review because they don't let you publish your entire
>review in one go like RB does, but it drastically lowers the barrier to
>contributing for most developers. Also, if you haven't tried it, hub
>https://hub.github.com/ makes it really easy to checkout and test PRs.
>
>One thing I noticed is that when you try to generate a PR it defaults to
>the 0.8.2 branch. Can we fix that up to be trunk by default? That's the
>most common use case; version branches are really only useful when a
>release is being prepared. Do changes to the Github repo require tickets
>to
>the Apache Infra team or is this something committers have control over?
>
>
>On a related note, which perhaps should be discussed on another thread:
>
>The CI setup is a related issue that we might want to rethink. Apache also
>supports Travis CI, and now pays for dedicated build slaves:
>https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci I
>think the setup should be pretty easy since building + running tests is
>just a gradle command. Having tests run automatically on PRs (and
>promptly!) makes it a lot easier to confidently commit a change,
>especially
>if the build merges with trunk first. I started looking at this when I was
>trying to sort out Confluent's build & test infrastructure. See
>https://travis-ci.org/ewencp/kafka/builds/60802386 for an example, the
>patch is about 10 lines in a .travis.yml file and the failure in that
>example seems to be unrelated to the Travis confg. I think the basic
>config
>I created is all we'd need.
>
>Unfortunately, I couldn't easily tell what the delay on builds is, i.e.
>would it be an improvement over the delays with the current Jenkins setup.
>But having them run on PR creation/update means the results will usually
>be
>ready by the time a reviewer gets to looking at the PR and would be
>reported in the PR so the state is easy to evaluate. (I'm also having
>trouble telling exactly how the two ASF Jenkins builders differ since they
>both seem to poll trunk, so I can't be certain whether Travis would be
>able
>to completely replace the current setup. That said, it should be telling
>that I have never paid attention to Jenkins output at all since it seems
>so
>far removed from any of my actions as a contributor.)
>
>As another alternative, Confluent would also be happy to provide
>build/test
>infrastructure to help out. AMPLab does this for Spark, so it seems to be
>acceptable to ASF. We already have PR builders and trunk/master builders
>set up for other projects, so it wouldn't be hard to setup the same for
>Kafka with the right access to the repos. Based on the current frequency
>of
>builds (https://builds.apache.org/view/All/job/Kafka-trunk/ and
>https://builds.apache.org/view/All/job/KafkaPreCommit/) I think it'll be
>easy for even our current infrastructure to keep up.
>
>
>On Thu, Apr 30, 2015 at 9:41 PM, Jaikiran Pai 
>wrote:
>
>> I think this will help a lot in contributions. Some of my local changes
>> that I want to contribute back have been pending because I sometimes
>>switch
>> machines and I then have to go through setting up the Ruby/python and
>>other
>> stuff for the current review process. Using just github is going to
>>help in
>> quickly submitting the changes.
>>
>> -Jaikiran
>>
>> On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:
>>
>>> Hi all,
>>>
>>> Kafka currently uses a combination of Review Board and JIRA for
>>> contributions and code review. In my opinion, this makes contribution
>>>and
>>> code review a bit harder than it has to be.
>>>
>>> I think the approach used by Spark would improve the current situation:
>>>
>>> "Generally, Spark uses JIRA to track logical issues, including bugs and
>>> improvements, and uses Github pull requests to manage the review and
>>>merge
>>> of specific code changes. That is, JIRAs are used to describe what
>>>should
>>> be fixed or changed, and high-level approaches, and pull requests
>>>describe
>>> how to implement that change in the project's source code. For example,
>>> major design decisions are discussed in JIRA."[1]
>>>
>>> It's worth reading the wiki page for all the details, but I will
>>>summarise
>>> the suggested workflow for code changes:
>>>
>>> 1. Fork the Github repository at http://github.com/apache/kafka (if
>>> you
>>> haven't already)
>>> 2. git checkout -b kafka-XXX
>>> 3. Make one or more commits (smaller commits can be easier to
>>>review
>>> and
>>> reviewboard makes that hard)
>>> 4. git push origin kafka-XXX
>>> 5. Create PR against upstream/trunk (this will update JIRA
>>> automatically[2] and it will send an email to the dev mailing list
>>> too)
>>> 6. A CI build will be triggered[3]
>>> 7. Review process happens on GitHub (it's quite handy to be able to
>>> comment on both commit 

Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-05-01 Thread Ewen Cheslack-Postava
Also +1. There are some drawbacks to using Github for reviews, e.g. lots of
emails for each review because they don't let you publish your entire
review in one go like RB does, but it drastically lowers the barrier to
contributing for most developers. Also, if you haven't tried it, hub
https://hub.github.com/ makes it really easy to checkout and test PRs.

One thing I noticed is that when you try to generate a PR it defaults to
the 0.8.2 branch. Can we fix that up to be trunk by default? That's the
most common use case; version branches are really only useful when a
release is being prepared. Do changes to the Github repo require tickets to
the Apache Infra team or is this something committers have control over?


On a related note, which perhaps should be discussed on another thread:

The CI setup is a related issue that we might want to rethink. Apache also
supports Travis CI, and now pays for dedicated build slaves:
https://blogs.apache.org/infra/entry/apache_gains_additional_travis_ci I
think the setup should be pretty easy since building + running tests is
just a gradle command. Having tests run automatically on PRs (and
promptly!) makes it a lot easier to confidently commit a change, especially
if the build merges with trunk first. I started looking at this when I was
trying to sort out Confluent's build & test infrastructure. See
https://travis-ci.org/ewencp/kafka/builds/60802386 for an example, the
patch is about 10 lines in a .travis.yml file and the failure in that
example seems to be unrelated to the Travis confg. I think the basic config
I created is all we'd need.

Unfortunately, I couldn't easily tell what the delay on builds is, i.e.
would it be an improvement over the delays with the current Jenkins setup.
But having them run on PR creation/update means the results will usually be
ready by the time a reviewer gets to looking at the PR and would be
reported in the PR so the state is easy to evaluate. (I'm also having
trouble telling exactly how the two ASF Jenkins builders differ since they
both seem to poll trunk, so I can't be certain whether Travis would be able
to completely replace the current setup. That said, it should be telling
that I have never paid attention to Jenkins output at all since it seems so
far removed from any of my actions as a contributor.)

As another alternative, Confluent would also be happy to provide build/test
infrastructure to help out. AMPLab does this for Spark, so it seems to be
acceptable to ASF. We already have PR builders and trunk/master builders
set up for other projects, so it wouldn't be hard to setup the same for
Kafka with the right access to the repos. Based on the current frequency of
builds (https://builds.apache.org/view/All/job/Kafka-trunk/ and
https://builds.apache.org/view/All/job/KafkaPreCommit/) I think it'll be
easy for even our current infrastructure to keep up.


On Thu, Apr 30, 2015 at 9:41 PM, Jaikiran Pai 
wrote:

> I think this will help a lot in contributions. Some of my local changes
> that I want to contribute back have been pending because I sometimes switch
> machines and I then have to go through setting up the Ruby/python and other
> stuff for the current review process. Using just github is going to help in
> quickly submitting the changes.
>
> -Jaikiran
>
> On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:
>
>> Hi all,
>>
>> Kafka currently uses a combination of Review Board and JIRA for
>> contributions and code review. In my opinion, this makes contribution and
>> code review a bit harder than it has to be.
>>
>> I think the approach used by Spark would improve the current situation:
>>
>> "Generally, Spark uses JIRA to track logical issues, including bugs and
>> improvements, and uses Github pull requests to manage the review and merge
>> of specific code changes. That is, JIRAs are used to describe what should
>> be fixed or changed, and high-level approaches, and pull requests describe
>> how to implement that change in the project's source code. For example,
>> major design decisions are discussed in JIRA."[1]
>>
>> It's worth reading the wiki page for all the details, but I will summarise
>> the suggested workflow for code changes:
>>
>> 1. Fork the Github repository at http://github.com/apache/kafka (if
>> you
>> haven't already)
>> 2. git checkout -b kafka-XXX
>> 3. Make one or more commits (smaller commits can be easier to review
>> and
>> reviewboard makes that hard)
>> 4. git push origin kafka-XXX
>> 5. Create PR against upstream/trunk (this will update JIRA
>> automatically[2] and it will send an email to the dev mailing list
>> too)
>> 6. A CI build will be triggered[3]
>> 7. Review process happens on GitHub (it's quite handy to be able to
>> comment on both commit or PR-level, unlike Review Board)
>> 8. Once all feedback has been addressed and the build is green, a
>> variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
>> push, clo

Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Jaikiran Pai
I think this will help a lot in contributions. Some of my local changes 
that I want to contribute back have been pending because I sometimes 
switch machines and I then have to go through setting up the Ruby/python 
and other stuff for the current review process. Using just github is 
going to help in quickly submitting the changes.


-Jaikiran
On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:

Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

"Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA."[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening them
over the years, but nothing was done about most of them)
5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00





Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Neha Narkhede
Thanks a bunch for taking this up, Ismael!

+1, I think it will be much more convenient to sunset RB and move to
github. Especially looking forward to the CIs on PRs and also the merge
script.

Alas, my wonderful patch-review script will be retired :-)

On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma  wrote:

> Hi all,
>
> Kafka currently uses a combination of Review Board and JIRA for
> contributions and code review. In my opinion, this makes contribution and
> code review a bit harder than it has to be.
>
> I think the approach used by Spark would improve the current situation:
>
> "Generally, Spark uses JIRA to track logical issues, including bugs and
> improvements, and uses Github pull requests to manage the review and merge
> of specific code changes. That is, JIRAs are used to describe what should
> be fixed or changed, and high-level approaches, and pull requests describe
> how to implement that change in the project's source code. For example,
> major design decisions are discussed in JIRA."[1]
>
> It's worth reading the wiki page for all the details, but I will summarise
> the suggested workflow for code changes:
>
>1. Fork the Github repository at http://github.com/apache/kafka (if you
>haven't already)
>2. git checkout -b kafka-XXX
>3. Make one or more commits (smaller commits can be easier to review and
>reviewboard makes that hard)
>4. git push origin kafka-XXX
>5. Create PR against upstream/trunk (this will update JIRA
>automatically[2] and it will send an email to the dev mailing list too)
>6. A CI build will be triggered[3]
>7. Review process happens on GitHub (it's quite handy to be able to
>comment on both commit or PR-level, unlike Review Board)
>8. Once all feedback has been addressed and the build is green, a
>variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
>push, close the PR and JIRA issue. The squashed commit generated by the
>script includes a bunch of useful information including links to the
>original commits[5] (in the future, I think it's worth reconsidering the
>squashing of commits, but retaining the information in the commit is
>already an improvement)
>
> Neha merged a couple of commits via GitHub already and it went smoothly
> although we are still missing a few of the pieces described above:
>
>1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
>we need to request it for Kafka and provide whatever configuration is
>needed)
>2. Adapting Spark's merge_park_pr script and integrating it into the
>kafka Git repository
>3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
>the Git repository (this is shown when someone is creating a pull
> request)
>4. Go through existing GitHub pull requests and close the ones that are
>no longer relevant (there are quite a few as people have been opening
> them
>over the years, but nothing was done about most of them)
>5. Other things I may be missing
>
> I am volunteering to help with the above if people agree that this is the
> right direction for Kafka. Thoughts?
>
> Best.
> Ismael
>
> P.S. I was told in the Apache Infra HipChat that it's not currently
> possible (and there are no plans to change that in the near future) to use
> the GitHub merge button to merge PRs. The merge script does quite a few
> useful things that the merge button does not in any case.
>
> [1]
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> [2]
>
> https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
> [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
> [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
> [5]
>
> https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00
>



-- 
Thanks,
Neha


[DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Ismael Juma
Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

"Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA."[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

   1. Fork the Github repository at http://github.com/apache/kafka (if you
   haven't already)
   2. git checkout -b kafka-XXX
   3. Make one or more commits (smaller commits can be easier to review and
   reviewboard makes that hard)
   4. git push origin kafka-XXX
   5. Create PR against upstream/trunk (this will update JIRA
   automatically[2] and it will send an email to the dev mailing list too)
   6. A CI build will be triggered[3]
   7. Review process happens on GitHub (it's quite handy to be able to
   comment on both commit or PR-level, unlike Review Board)
   8. Once all feedback has been addressed and the build is green, a
   variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
   push, close the PR and JIRA issue. The squashed commit generated by the
   script includes a bunch of useful information including links to the
   original commits[5] (in the future, I think it's worth reconsidering the
   squashing of commits, but retaining the information in the commit is
   already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

   1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
   we need to request it for Kafka and provide whatever configuration is
   needed)
   2. Adapting Spark's merge_park_pr script and integrating it into the
   kafka Git repository
   3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
   the Git repository (this is shown when someone is creating a pull request)
   4. Go through existing GitHub pull requests and close the ones that are
   no longer relevant (there are quite a few as people have been opening them
   over the years, but nothing was done about most of them)
   5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00