Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

Guozhang Wang Fri, 10 Jul 2015 15:26:41 -0700

Commends inlined.

On Fri, Jul 10, 2015 at 2:10 PM, Ismael Juma <[email protected]> wrote:


> Hi Guozhang,
>
> Comments inline.
>
> On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang <[email protected]> wrote:
> >
> > I have a couple of comments on the wiki pages / merge scripts:
> >
>
> Thanks, it's important to discuss these things as the current version is
> based on what the Spark project does and may not match what we want to do
> exactly.
>
> 1. In the wiki page it mentions "If the change is new, then it usually
> > needs a new JIRA. However, trivial changes, where "what should change" is
> > virtually the same as "how it should change" do not require a JIRA.
> > Example: "Fix typos in Foo scaladoc"."
> >
> > So it sounds we are going to allow pull requests without a JIRA ticket,
> but
> > later we are also mentioning:
> >
> > "The PR title should be of the form [KAFKA-xxxx].." which is a bit
> > conflicting with the previous statement. Could you clarify it? Today our
> > commits are mostly with JIRAs except some trivial ones that are only done
> > by committers, I can either extend this to non-committers or not, just
> that
> > we need to make it clear.
> >
>
> I agree that it's not very clear and it needs to be fixed. What do we want
> to do though? It's a trade-off between consistency (always have a JIRA
> ticket) and avoiding redundancy (skip the JIRA where it doesn't add
> anything). The former is the more conservative option and I am happy to
> update the documentation if that's the preferred option.
>
>
Personally I think it is better to not enforcing a JIRA ticket for minor /
hotfix commits, for example, we can format the title with [MINOR] [HOTFIX]
etc as in Spark:

https://github.com/apache/spark/commits/master

But I think it we need some broader discussion for this, maybe we could
start another email thread regarding this?


> 2. The git commit message is a little verbose to me, for example:
>
>
> > ------
> >
> > commit ee88dbb67f19b787e12ccef37982c9459b78c7b6
> >
> > Author: Geoff Anderson <[email protected]>
> >
> > Date:   Thu Jul 9 14:58:01 2015 -0700
> >
> >    KAFKA-2327; broker doesn't start if config defines advertised.host but
> > not advertised.port
> >
> >
> >
> >    Added unit tests as well. These fail without the fix, but pass with
> the
> > fix.
> >
> >
> >
> >    Author: Geoff Anderson <[email protected]>
> >
> >
> >
> >    Closes #73 from granders/KAFKA-2327 and squashes the following
> commits:
> >
> >
> >
> >    52a2085 [Geoff Anderson] Cleaned up unecessary toString calls
> >
> >    23b3340 [Geoff Anderson] Fixes KAFKA-2327
> >
> > ------
> >
> >
> > The "Author" field is redundant, and we could probably also omit the
> github
> > commits. What do you think?
> >
>
> Is it harmful to have detailed information? None of it is actually
> redundant as far as I can see (but I could be missing something). There is
> the squashed commit author, the pull request message, the pull request
> author and the information about the individual commits. Even though Geoff
> worked on this PR by himself, multiple people can collaborate on a feature
> and then it's useful to credit correctly.
>
> The fact that we are keeping all the information encourages a style of
> development where a few small commits (each commit should must make sense
> on its own and pass the tests) are used in the pull request, which helps a
> lot during code review and when trying to understand changes after the
> fact. This is in contrast with the style where there is a single commit per
> feature even when the feature is quite large (we have a few of those
> ongoing at the moment). I don't know if you noticed, but GitHub actually
> links to the original commits, which is quite handy:
>
>
> https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6
>
> This approach is a bit of a workaround, but it's easier as the script
> handles everything (credit to the Spark project once again, it's their
> code). The ideal scenario would be to keep the individual commits by
> merging the branch into trunk after a rebase (no fast-forward, so that a
> merge commit is maintained). This has a number of nice characteristics
> (history is linear, it's clear what commits were part of the merged branch,
> easy to bisect, easy to revert, etc.), but it requires a bit more work,
> more Git knowledge and ideally a PR builder that builds and tests every
> commit in the branch (Typesafe wrote a tool that does this for the Scala
> project, for what is worth). In other words, a step too far for us right
> now. :)
>
> What do you think?
>
>
I think I'm sold on your comments, makes sense :)


> Best,
> Ismael
>



-- 
-- Guozhang

Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

Reply via email to