Hi Stamatis, I am generally positive for every standardization process. My experience thought me that a standardized code is usually better in the long run than any super optimized solution where we specialize every codepath for the locally best optimum.
First question: - Is it possible to check/enforce these guidelines somehow? If we have some tool to enforce this, that would be super helpful. I agree with most of your points. There are a few question marks, but I can live with all of the proposed requirements if that is the consensus: - I think having contributor / reviewer / PR number in the commit message is helpful (at least for me). I use the `git log --pretty=online|grep` quite often, and one usage is when I would like to collect the PR's which were authored or reviewed by a contributor - I am not sure about the 72 character line length - the PRs are most often viewed by tools which do the wrapping based on the available screen size. Wouldn't we want to relay on that? Thanks, Peter > On Sep 28, 2021, at 11:25, Alessandro Solimando > <alessandro.solima...@gmail.com> wrote: > > Hi Stamatis, > thanks for the suggestions, I think they are reasonable, the project would > benefit from their adoption. > > Regarding the removal of contributor/reviewers names from the commit > message favoring the use of git metadata, there has been a similar discussion > in Calcite ML > <https://mail-archives.apache.org/mod_mbox/calcite-dev/202109.mbox/%3ccafqnwdzys+gemeeevhreggjedpmjn5bc4bvecdusgrtcykv...@mail.gmail.com%3e> > which > led to consensus. > > Best regards, > Alessandro > > On Fri, 24 Sept 2021 at 16:40, Stamatis Zampetakis <zabe...@gmail.com> > wrote: > >> Hi all, >> >> I think we all more or less follow some standard pattern when committing in >> Hive but with some small effort we could make things more uniform and >> hopefully better. >> I would like to start a discussion about creating some guidelines, which we >> could put to the wiki or in contributing.md, to improve the quality of our >> history (git log). >> I outline some suggestions below to kick off the discussion. Many things in >> the list are minor (and maybe even personal preferences) but one thing >> which is really missing from the project is B3 especially the *why* part. >> Why is the commit necessary? Why has the change been made?. >> In some cases the why part is also missing from the JIRA making the code >> harder to maintain. >> >> Subject line: >> S1. Start with the Jira id capitalized and followed immediately (no space) >> by double colon (:) >> S2. Leave one space after the Jira id, and start the summary with a capital >> letter >> S3. Keep it concise (ideally less than 72 characters) and provide a useful >> description of the change >> S4. Do not include or end the line with period >> S5. Do not include the pull request id in the summary >> S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added >> a handler …”) or present tense (“Adds a handler …”) >> S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to >> describe the bug (“NullPointerException if user is unknown”) and people >> will correctly surmise that the purpose of your change is to fix the bug. >> S8. Do not add a contributor's name; the author tag is made exactly for >> this and can be explored/parsed much more efficiently by tools/people for >> stats or other purposes >> S9. Do not add reviewers name; information is present in multiple places >> (e.g., committer tag, PR, JIRA) >> >> Message body: (Trivial changes may not require a body) >> B1. Separate subject from body with a blank line >> B2. Wrap the body at 72 characters >> B3. Use the body to explain what and why vs. how >> Example >> "Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and >> Converter to avoid executing the fallback method which in many cases >> returns null and can cause NPE when this value propagates up the call >> stack." >> vs. >> "Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan >> and Converter" >> B4. If multiple authors include them using the standard GitHub marker, >> "Co-authored-by:", followed by the name and email of the author (e.g., >> Co-authored-by: Marton Bod <m...@cloudera.com>) >> B5. If the reviewer is different from committer (or merge via GitHub UI) >> use "Reviewed-by:" followed by the name and email of the reviewer (e.g., >> Reviewed-by: Stamatis Zampetakis <zabe...@gmail.com>) >> B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many >> times as authors/reviewers. >> B7. Include the PR id at the end of the message (e.g., Closes #2514); >> someone can easily navigate back to the PR to check comments, reviewers, >> etc. >> >> A sample commit message following these guidelines is shown below: >> >> commit de7781f29f82083fe01274b4d436b52920a89173 >> Author: Soumyakanti Das <soumyakanti.das...@gmail.com> >> Commit: Stamatis Zampetakis <zabe...@gmail.com> >> >> HIVE-25354: NPE when estimating row count in external JDBC tables >> >> Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan >> and Converter to avoid executing the fallback method which in many >> cases returns null and can cause NPE when this value propagates up the >> call stack. >> >> Co-authored-by: Krisztian Kasa <kk...@cloudera.com> >> >> Reviewed-by: Peter Vary <pv...@cloudera.com> >> Reviewed-by: Zoltan Haindrich <k...@rxd.hu> >> >> Closes #2514 >> >> Let me know your thoughts. >> >> Best, >> Stamatis >>