Hi Guozhang,

Comments inline.

On Fri, Jul 10, 2015 at 8:47 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> I have a couple of comments on the wiki pages / merge scripts:
>

Thanks, it's important to discuss these things as the current version is
based on what the Spark project does and may not match what we want to do
exactly.

1. In the wiki page it mentions "If the change is new, then it usually
> needs a new JIRA. However, trivial changes, where "what should change" is
> virtually the same as "how it should change" do not require a JIRA.
> Example: "Fix typos in Foo scaladoc"."
>
> So it sounds we are going to allow pull requests without a JIRA ticket, but
> later we are also mentioning:
>
> "The PR title should be of the form [KAFKA-xxxx].." which is a bit
> conflicting with the previous statement. Could you clarify it? Today our
> commits are mostly with JIRAs except some trivial ones that are only done
> by committers, I can either extend this to non-committers or not, just that
> we need to make it clear.
>

I agree that it's not very clear and it needs to be fixed. What do we want
to do though? It's a trade-off between consistency (always have a JIRA
ticket) and avoiding redundancy (skip the JIRA where it doesn't add
anything). The former is the more conservative option and I am happy to
update the documentation if that's the preferred option.

2. The git commit message is a little verbose to me, for example:


> ------
>
> commit ee88dbb67f19b787e12ccef37982c9459b78c7b6
>
> Author: Geoff Anderson <ge...@confluent.io>
>
> Date:   Thu Jul 9 14:58:01 2015 -0700
>
>    KAFKA-2327; broker doesn't start if config defines advertised.host but
> not advertised.port
>
>
>
>    Added unit tests as well. These fail without the fix, but pass with the
> fix.
>
>
>
>    Author: Geoff Anderson <ge...@confluent.io>
>
>
>
>    Closes #73 from granders/KAFKA-2327 and squashes the following commits:
>
>
>
>    52a2085 [Geoff Anderson] Cleaned up unecessary toString calls
>
>    23b3340 [Geoff Anderson] Fixes KAFKA-2327
>
> ------
>
>
> The "Author" field is redundant, and we could probably also omit the github
> commits. What do you think?
>

Is it harmful to have detailed information? None of it is actually
redundant as far as I can see (but I could be missing something). There is
the squashed commit author, the pull request message, the pull request
author and the information about the individual commits. Even though Geoff
worked on this PR by himself, multiple people can collaborate on a feature
and then it's useful to credit correctly.

The fact that we are keeping all the information encourages a style of
development where a few small commits (each commit should must make sense
on its own and pass the tests) are used in the pull request, which helps a
lot during code review and when trying to understand changes after the
fact. This is in contrast with the style where there is a single commit per
feature even when the feature is quite large (we have a few of those
ongoing at the moment). I don't know if you noticed, but GitHub actually
links to the original commits, which is quite handy:

https://github.com/apache/kafka/commit/ee88dbb67f19b787e12ccef37982c9459b78c7b6

This approach is a bit of a workaround, but it's easier as the script
handles everything (credit to the Spark project once again, it's their
code). The ideal scenario would be to keep the individual commits by
merging the branch into trunk after a rebase (no fast-forward, so that a
merge commit is maintained). This has a number of nice characteristics
(history is linear, it's clear what commits were part of the merged branch,
easy to bisect, easy to revert, etc.), but it requires a bit more work,
more Git knowledge and ideally a PR builder that builds and tests every
commit in the branch (Typesafe wrote a tool that does this for the Scala
project, for what is worth). In other words, a step too far for us right
now. :)

What do you think?

Best,
Ismael

Reply via email to