This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new 8fe9d94 Travis CI build asf-site 8fe9d94 is described below commit 8fe9d947796418a3e0d2614d733e698b76432ded Author: CI <ci...@hudi.apache.org> AuthorDate: Thu Sep 3 20:05:58 2020 +0000 Travis CI build asf-site --- content/contributing.html | 97 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 92 insertions(+), 5 deletions(-) diff --git a/content/contributing.html b/content/contributing.html index 15ecc845..73fa4bc 100644 --- a/content/contributing.html +++ b/content/contributing.html @@ -201,6 +201,7 @@ <li><a href="#filing-jiras">Filing JIRAs</a></li> <li><a href="#claiming-jiras">Claiming JIRAs</a></li> <li><a href="#contributing-code">Contributing Code</a></li> + <li><a href="#coding-guidelines">Coding guidelines</a></li> <li><a href="#reviewing-coderfcs">Reviewing Code/RFCs</a></li> <li><a href="#suggest-changes">Suggest Changes</a></li> </ul> @@ -361,13 +362,9 @@ so that the community can contribute at large and help implement it much quickly <li>[Optional] Familiarize yourself with internals of Hudi using content on this page, as well as <a href="https://cwiki.apache.org/confluence/display/HUDI">wiki</a></li> <li>Make your code change <ul> - <li>Every source file needs to include the Apache license header. Every new dependency needs to have an -open source license <a href="https://www.apache.org/legal/resolved.html#criteria">compatible</a> with Apache.</li> - <li>If you are re-using code from another apache/open-source project, licensing needs to be compatible and attribution added to <code class="highlighter-rouge">LICENSE</code> file</li> - <li>Please DO NOT copy paste any code from StackOverflow or other online sources, since their license attribution would be unclear. Author them yourself!</li> <li>Get existing tests to pass using <code class="highlighter-rouge">mvn clean install -DskipITs</code></li> <li>Add adequate tests for your new functionality</li> - <li>For involved changes, it’s best to also run the entire integration test suite using <code class="highlighter-rouge">mvn clean install</code></li> + <li>For involved changes, it’s best to test the changes in real production environments and report the results in the PR.</li> <li>For website changes, please build the site locally & test navigation, formatting & links thoroughly</li> <li>If your code change changes some aspect of documentation (e.g new config, default value change), please ensure there is another PR to <a href="https://github.com/apache/hudi/tree/asf-site/README.md">update the docs</a> as well.</li> @@ -387,6 +384,92 @@ where you replace <code class="highlighter-rouge">HUDI-XXX</code> with the appro <li>Finally, once your pull request is merged, make sure to <code class="highlighter-rouge">Close</code> the JIRA.</li> </ul> +<h3 id="coding-guidelines">Coding guidelines</h3> + +<p>Our code can benefit from contributors speaking the same “language” when authoring code. After all, it gets read a lot more than it gets +written. So optimizing for “reads” is a good goal. The list below is a set of guidelines, that contributors strive to upkeep and reflective +of how we want to evolve our code in the future.</p> + +<h4 id="style">Style</h4> + +<ul> + <li><strong>Formatting</strong> We should rely on checkstyle and spotless to auto fix formatting; automate this completely. Where we cannot, + we will err on the side of not taxing contributors with manual effort.</li> + <li><strong>Refactoring</strong> + <ul> + <li>Refactor with purpose; any refactor suggested should be attributable to functionality that now becomes easy to implement.</li> + <li>A class is asking to be refactored, when it has several overloaded responsibilities/have sets of fields/methods which are used more cohesively than others.</li> + <li>Try to name tests using the given-when-then model, that cleans separates preconditions (given), an action (when), and assertions (then).</li> + </ul> + </li> + <li><strong>Naming things</strong> + <ul> + <li>Let’s name uniformly; using the same word to denote the same concept. e.g: bootstrap vs external vs source, when referring to bootstrapped tables. +Maybe they all mean the same, but having one word makes the code lot more easily readable.</li> + <li>Let’s name consistently with Hudi terminology. e.g dataset vs table, base file vs data file.</li> + <li>Class names preferably are nouns (e.g Runner) which reflect their responsibility and methods are verbs (e.g run()).</li> + <li>Avoid filler words, that don’t add value e.g xxxInfo, xxxData, etc.</li> + <li>We name classes in code starting with <code class="highlighter-rouge">Hoodie</code> and not <code class="highlighter-rouge">Hudi</code> and we want to keep it that way for consistency/historical reasons.</li> + </ul> + </li> + <li><strong>Methods</strong> + <ul> + <li>Individual methods should short (~20-30 lines) and have a single purpose; If you feel like it has a secondary purpose, then maybe it needs +to be broken down more.</li> + <li>Lesser the number of arguments, the better;</li> + <li>Place caller methods on top of callee methods, whenever possible.</li> + <li>Avoid “output” arguments e.g passing in a list and filling its values within the method.</li> + <li>Try to limit individual if/else blocks to few lines to aid readability.</li> + <li>Separate logical blocks of code with a newline in between e.g read a file into memory, loop over the lines.</li> + </ul> + </li> + <li><strong>Classes</strong> + <ul> + <li>Like method, each Class should have a single purpose/responsibility.</li> + <li>Try to keep class files to about 200 lines of length, nothing beyond 500.</li> + <li>Avoid stating the obvious in comments; e.g each line does not deserve a comment; Document corner-cases/special perf considerations etc clearly.</li> + <li>Try creating factory methods/builders and interfaces wherever you feel a specific implementation may be changed down the line.</li> + </ul> + </li> +</ul> + +<h4 id="substance">Substance</h4> + +<ul> + <li>Try to avoid large PRs; if unavoidable (many times they are) please separate refactoring with the actual implementation of functionality. +e.g renaming/breaking up a file and then changing code changes, makes the diff very hard to review.</li> + <li><strong>Licensing</strong> + <ul> + <li>Every source file needs to include the Apache license header. Every new dependency needs to have +an open source license <a href="https://www.apache.org/legal/resolved.html#criteria">compatible</a> with Apache.</li> + <li>If you are re-using code from another apache/open-source project, licensing needs to be compatible and attribution added to <code class="highlighter-rouge">LICENSE</code> file</li> + <li>Please DO NOT copy paste any code from StackOverflow or other online sources, since their license attribution would be unclear. Author them yourself!</li> + </ul> + </li> + <li><strong>Code Organization</strong> + <ul> + <li>Anything in <code class="highlighter-rouge">hudi-common</code> cannot depend on a specific engine runtime like Spark.</li> + <li>Any changes to bundles under <code class="highlighter-rouge">packaging</code>, will be reviewed with additional scrutiny to avoid breakages across versions.</li> + </ul> + </li> + <li><strong>Code reuse</strong> + <ul> + <li>Whenever you can, please use/enhance use existing utils classes in code (<code class="highlighter-rouge">CollectionUtils</code>, <code class="highlighter-rouge">ParquetUtils</code>, <code class="highlighter-rouge">HoodieAvroUtils</code>). Search for classes ending in <code class="highlighter-rouge">Utils</code>.</li> + <li>As a complex project, that must integrate with multiple systems, we tend to avoid dependencies like <code class="highlighter-rouge">guava</code>, <code class="highlighter-rouge">apache commons</code> for the sake of easy integration. + Please start a discussion on the mailing list, before attempting to reintroduce them</li> + <li>As a data system, that takes performance seriously, we also write pieces of infrastructure (e.g <code class="highlighter-rouge">ExternalSpillableMap</code>) natively, that are optimized specifically for our scenarios. + Please start with them first, when solving problems.</li> + </ul> + </li> + <li><strong>Breaking changes</strong> + <ul> + <li>Any version changes for dependencies, needs to be ideally vetted across different user environments in the community, to get enough confidence before merging.</li> + <li>Any changes to methods annotated with <code class="highlighter-rouge">PublicAPIMethod</code> or classes annotated with <code class="highlighter-rouge">PublicAPIClass</code> require upfront discussion and potentially an RFC.</li> + <li>Any non-backwards compatible changes similarly need upfront discussion and the functionality needs to implement an upgrade-downgrade path.</li> + </ul> + </li> +</ul> + <h3 id="reviewing-coderfcs">Reviewing Code/RFCs</h3> <ul> @@ -401,6 +484,10 @@ where you replace <code class="highlighter-rouge">HUDI-XXX</code> with the appro <li>Staying humble and eager to learn, goes a long way in ensuring these reviews are smooth.</li> </ul> </li> + <li>Reviewers are expected to uphold the code quality, standards outlined above.</li> + <li>When merging PRs, always make sure you are squashing the commits using the “Squash and Merge” feature in Github</li> + <li>When necessary/appropriate, reviewers could make changes themselves to PR branches, with the intent to get the PR landed sooner. (see <a href="https://cwiki.apache.org/confluence/display/HUDI/Resources#Resources-PushingChangesToPRs">how-to</a>) +Reviewers should seek explicit approval from author, before making large changes to the original PR.</li> </ul> <h3 id="suggest-changes">Suggest Changes</h3>