Hi Stamatis,

I am generally positive for every standardization process. My experience 
thought me that a standardized code is usually better in the long run than any 
super optimized solution where we specialize every codepath for the locally 
best optimum.

First question:
- Is it possible to check/enforce these guidelines somehow?

If we have some tool to enforce this, that would be super helpful.

I agree with most of your points.

There are a few question marks, but I can live with all of the proposed 
requirements if that is the consensus:
- I think having contributor / reviewer / PR number in the commit message is 
helpful (at least for me). I use the `git log --pretty=online|grep` quite 
often, and one usage is when I would like to collect the PR's which were 
authored or reviewed by a contributor
- I am not sure about the 72 character line length - the PRs are most often 
viewed by tools which do the wrapping based on the available screen size. 
Wouldn't we want to relay on that?

Thanks,
Peter

> On Sep 28, 2021, at 11:25, Alessandro Solimando 
> <alessandro.solima...@gmail.com> wrote:
> 
> Hi Stamatis,
> thanks for the suggestions, I think they are reasonable, the project would
> benefit from their adoption.
> 
> Regarding the removal of contributor/reviewers names from the commit
> message favoring the use of git metadata, there has been a similar discussion
> in Calcite ML
> <https://mail-archives.apache.org/mod_mbox/calcite-dev/202109.mbox/%3ccafqnwdzys+gemeeevhreggjedpmjn5bc4bvecdusgrtcykv...@mail.gmail.com%3e>
> which
> led to consensus.
> 
> Best regards,
> Alessandro
> 
> On Fri, 24 Sept 2021 at 16:40, Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
> 
>> Hi all,
>> 
>> I think we all more or less follow some standard pattern when committing in
>> Hive but with some small effort we could make things more uniform and
>> hopefully better.
>> I would like to start a discussion about creating some guidelines, which we
>> could put to the wiki or in contributing.md, to improve the quality of our
>> history (git log).
>> I outline some suggestions below to kick off the discussion. Many things in
>> the list are minor (and maybe even personal preferences) but one thing
>> which is really missing from the project is B3 especially the *why* part.
>> Why is the commit necessary? Why has the change been made?.
>> In some cases the why part is also missing from the JIRA making the code
>> harder to maintain.
>> 
>> Subject line:
>> S1. Start with the Jira id capitalized and followed immediately (no space)
>> by double colon (:)
>> S2. Leave one space after the Jira id, and start the summary with a capital
>> letter
>> S3. Keep it concise (ideally less than 72 characters) and provide a useful
>> description of the change
>> S4. Do not include or end the line with period
>> S5. Do not include the pull request id in the summary
>> S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added
>> a handler …”) or present tense (“Adds a handler …”)
>> S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to
>> describe the bug (“NullPointerException if user is unknown”) and people
>> will correctly surmise that the purpose of your change is to fix the bug.
>> S8. Do not add a contributor's name; the author tag is made exactly for
>> this and can be explored/parsed much more efficiently by tools/people for
>> stats or other purposes
>> S9. Do not add reviewers name; information is present in multiple places
>> (e.g., committer tag, PR, JIRA)
>> 
>> Message body: (Trivial changes may not require a body)
>> B1. Separate subject from body with a blank line
>> B2. Wrap the body at 72 characters
>> B3. Use the body to explain what and why vs. how
>> Example
>> "Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and
>> Converter to avoid executing the fallback method which in many cases
>> returns null and can cause NPE when this value propagates up the call
>> stack."
>> vs.
>> "Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
>> and Converter"
>> B4. If multiple authors include them using the standard GitHub marker,
>> "Co-authored-by:", followed by the name and email of the author (e.g.,
>> Co-authored-by: Marton Bod <m...@cloudera.com>)
>> B5. If the reviewer is different from committer (or merge via GitHub UI)
>> use "Reviewed-by:" followed by the name and email of the reviewer (e.g.,
>> Reviewed-by: Stamatis Zampetakis <zabe...@gmail.com>)
>> B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many
>> times as authors/reviewers.
>> B7. Include the PR id at the end of the message (e.g., Closes #2514);
>> someone can easily navigate back to the PR to check comments, reviewers,
>> etc.
>> 
>> A sample commit message following these guidelines is shown below:
>> 
>> commit de7781f29f82083fe01274b4d436b52920a89173
>> Author: Soumyakanti Das <soumyakanti.das...@gmail.com>
>> Commit: Stamatis Zampetakis <zabe...@gmail.com>
>> 
>>    HIVE-25354: NPE when estimating row count in external JDBC tables
>> 
>>    Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
>>    and Converter to avoid executing the fallback method which in many
>>    cases returns null and can cause NPE when this value propagates up the
>>    call stack.
>> 
>>    Co-authored-by: Krisztian Kasa <kk...@cloudera.com>
>> 
>>    Reviewed-by: Peter Vary <pv...@cloudera.com>
>>    Reviewed-by: Zoltan Haindrich <k...@rxd.hu>
>> 
>>    Closes #2514
>> 
>> Let me know your thoughts.
>> 
>> Best,
>> Stamatis
>> 

Reply via email to