[Hadoop Wiki] Update of "HowToContribute" by ArpitAgarwal

Apache Wiki Thu, 06 Sep 2018 11:12:03 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "HowToContribute" page has been changed by ArpitAgarwal:
https://wiki.apache.org/hadoop/HowToContribute?action=diff&rev1=118&rev2=119

Comment:
Removing content and leaving link to cwiki where new content resides.

  = How to Contribute to Hadoop =
- This page describes the mechanics of ''how'' to contribute software to Apache 
Hadoop.  For ideas about ''what'' you might contribute, please see the 
ProjectSuggestions page.
+ Content moved to Confluence - 
https://cwiki.apache.org/confluence/display/HADOOP/HowToContribute
  
- <<TableOfContents(5)>>
+ Email [email protected] if you need write access to the cwiki.
  
- == Dev Environment Setup ==
- Here are some things you will need to build and test Hadoop. Be prepared to 
invest some time to set up a working Hadoop dev environment. Try getting the 
project to build and test locally first before  you start writing code.
- 
- === Get the source code ===
- First of all, you need the Hadoop source code. The official location for 
Hadoop is the Apache Git repository. See GitAndHadoop
- 
- === Read BUILDING.txt ===
- Once you have the source code, we strongly recommend reading BUILDING.txt 
located in the root of the source tree. It has up to date information on how to 
build Hadoop on various platforms along with some workarounds for 
platform-specific quirks. The latest 
[[https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=blob;f=BUILDING.txt|BUILDING.txt]]
 for the current trunk can also be viewed on the web.
- 
- 
- === Integrated Development Environment (IDE) ===
- You are free to use whatever IDE you prefer or your favorite text editor. 
Note that:
-  * Building and testing is often done on the command line or at least via the 
Maven support in the IDEs.
-  * Set up the IDE to follow the source layout rules of the project.
-  * Disable any added value "reformat" and "strip trailing spaces" features as 
it can create extra noise when reviewing patches.
- 
- === Build Tools ===
-  * A Java Development Kit. The Hadoop developers recommend 
[[http://java.com/|Oracle Java 8]]. You may also use 
[[http://openjdk.java.net/|OpenJDK]].
-  * Google Protocol Buffers. Check out the ProtocolBuffers guide for help 
installing protobuf.
-  * [[http://maven.apache.org/|Apache Maven]] version 3 or later (for Hadoop 
0.23+)
-  * The Java API javadocs.
- Ensure these are installed by executing {{{mvn}}}, {{{git}}} and {{{javac}}} 
respectively.
- 
- As the Hadoop builds use the external Maven repository to download artifacts, 
Maven needs to be set up with the proxy settings needed to make external HTTP 
requests. The first build of every Hadoop project needs internet connectivity 
to download Maven dependencies.
-  1. Be online for that first build, on a good network
-  1. To set the Maven proxy setttings, see 
http://maven.apache.org/guides/mini/guide-proxies.html
-  1. Because Maven doesn't pass proxy settings down to the Ant tasks it runs 
[[https://issues.apache.org/jira/browse/HDFS-2381|HDFS-2381]] some parts of the 
Hadoop build may fail. The fix for this is to pass down the Ant proxy settings 
in the build Unix: {{{mvn $ANT_OPTS}}}; Windows: {{{mvn %ANT_OPTS%}}}.
-  1. Tomcat is always downloaded, even when building offline.  Setting 
{{{-Dtomcat.download.url}}} to a local copy and {{{-Dtomcat.version}}} to the 
version pointed to by the URL will avoid that download.
- 
- 
- === Native libraries ===
- On Linux, you need the tools to create the native libraries: LZO headers,zlib 
headers, gcc, OpenSSL headers, cmake, protobuf dev tools, and libtool, and the 
GNU autotools (automake, autoconf, etc).
- 
- For RHEL (and hence also CentOS):
- {{{
- yum -y install  lzo-devel  zlib-devel  gcc gcc-c++ autoconf automake libtool 
openssl-devel fuse-devel cmake
- }}}
- 
- For Debian and Ubuntu:
- {{{
- apt-get -y install maven build-essential autoconf automake libtool cmake 
zlib1g-dev pkg-config libssl-dev libfuse-dev
- }}}
- 
- Native libraries are mandatory for Windows. For instructions see 
Hadoop2OnWindows.
- 
- === Hardware Setup ===
-  * Lots of RAM, especially if you are using a modern IDE. ECC RAM is 
recommended in large-RAM systems.
-  * Disk Space. Always handy.
-  * Network Connectivity. Hadoop tests are not guaranteed to all work if a 
machine does not have a network connection -and especially if it does not know 
its own name.
-  * Keep your computer's clock up to date via an NTP server, and set up the 
time zone correctly. This is good for avoiding change-log confusion.
- 
- == Making Changes ==
- Before you start, send a message to the 
[[http://hadoop.apache.org/core/mailing_lists.html|Hadoop developer mailing 
list]], or file a bug report in [[Jira]].  Describe your proposed changes and 
check that they fit in with what others are doing and have planned for the 
project.  Be patient, it may take folks a while to understand your 
requirements.  If you want to start with pre-existing issues, look for Jiras 
labeled `newbie`.  You can find them using 
[[https://issues.apache.org/jira/issues/?filter=12331506|this filter]].
- 
- Modify the source code and add some (very) nice features using your favorite 
IDE.<<BR>>
- 
- But take care about the following points
- 
-  * All public classes and methods should have informative 
[[http://java.sun.com/j2se/javadoc/writingdoccomments/|Javadoc comments]].
-   * Do not use @author tags.
-  * Code must be formatted according to 
[[http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html|Sun's
 conventions]], with one exception:
-   * Indent two spaces per level, not four.
-  * Contributions must pass existing unit tests.
-    * New unit tests should be provided to demonstrate bugs and fixes.  
[[http://www.junit.org|JUnit]] is our test framework:
-   * You must implement a class that uses {{{@Test}}} annotations for all test 
methods. Please note, 
[[http://wiki.apache.org/hadoop/HowToDevelopUnitTests|Hadoop uses JUnit v4]].
-   * Define methods within your class whose names begin with {{{test}}}, and 
call JUnit's many assert methods to verify conditions; these methods will be 
executed when you run {{{mvn test}}}. Please add meaningful messages to the 
assert statement to facilitate diagnostics.
-   * By default, do not let tests write any temporary files to {{{/tmp}}}.  
Instead, the tests should write to the location specified by the 
{{{test.build.data}}} system property.
-   * If a HDFS cluster or a MapReduce/YARN cluster is needed by your test, 
please use {{{org.apache.hadoop.dfs.MiniDFSCluster}}} and 
{{{org.apache.hadoop.mapred.MiniMRCluster}}} (or 
{{{org.apache.hadoop.yarn.server.MiniYARNCluster}}}), respectively.  
{{{TestMiniMRLocalFS}}} is an example of a test that uses {{{MiniMRCluster}}}.
-   * Place your class in the {{{src/test}}} tree.
-   * {{{TestFileSystem.java}}} and {{{TestMapRed.java}}} are examples of 
standalone MapReduce-based tests.
-   * {{{TestPath.java}}} is an example of a non MapReduce-based test.
-   * You can run all the project unit tests with {{{mvn test}}}, or a specific 
unit test with {{{mvn -Dtest=<class name without package prefix> test}}}. Run 
these commands from the {{{hadoop-trunk}}} directory.
-  * If you modify the Unix shell scripts, see the 
UnixShellScriptProgrammingGuide.
- 
- === Generating a patch ===
- ==== Choosing a target branch ====
- Except for the following situations it is recommended that all patches be 
based off trunk to take advantage of the Jenkins pre-commit build.
-  1. The patch is targeting a release branch that is not based off trunk e.g. 
branch-1, branch-0.23 etc.
-  1. The change is targeting a specific feature branch and is not yet ready 
for merging into trunk.
- 
- If you are unsure of the target branch then '''trunk''' is usually the best 
choice. Committers will usually merge the patch to downstream branches e.g. 
branch-2 as appropriate.
- 
- ==== Unit Tests ====
- Please make sure that all unit tests succeed before constructing your patch 
and that no new javac compiler warnings are introduced by your patch.
- 
- For building Hadoop with Maven, use the following to run all unit tests and 
build a distribution. The {{{-Ptest-patch}}} profile will check that no new 
compiler warnings have been introduced by your patch.
- 
- {{{
- mvn clean install -Pdist -Dtar -Ptest-patch
- }}}
- 
- Any test failures can be found in the {{{target/surefire-reports}}} directory 
of the relevant module. You can also run this command in one of the 
{{{hadoop-common}}}, {{{hadoop-hdfs}}}, or {{{hadoop-mapreduce}}} directories 
to just test a particular subproject.
- 
- Unit tests development guidelines HowToDevelopUnitTests
- 
- ==== Javadoc ====
- Please also check the javadoc.
- 
- {{{
- mvn javadoc:javadoc
- firefox target/site/api/index.html
- }}}
- Examine all public classes you've changed to see that documentation is 
complete, informative, and properly formatted.  Your patch must not generate 
any javadoc warnings.
- 
- Jenkins includes a javadoc run on Java 8, which is stricter than Java 7: it 
will fail if there are unbalanced HTML tags or {{{<p/>}}} clauses (use 
{{{<p>}}} here.
- 
- If Jenkins rejects a patch due to Java 8 javadoc failures, it is considered 
an automatic veto for the patch.
- 
- === Provide a patch ===
- 
- There are two patterns to provide a patch.
-  * Create and attach a diff in ASF JIRA
-  * Create a pull request in GitHub
- 
- ==== Creating a patch ====
- 
- Check to see what files you have modified with:
- 
- {{{
- git status
- }}}
- Add any new files with:
- 
- {{{
- git add src/.../MyNewClass.java
- git add src/.../TestMyNewClass.java
- }}}
- In order to create a patch, type (from the base directory of hadoop):
- 
- {{{
- git diff trunk...HEAD > HADOOP-1234.patch
- }}}
- This will report all modifications done on Hadoop sources on your local disk 
and save them into the ''HADOOP-1234.patch'' file.  Read the patch file. Make 
sure it includes ONLY the modifications required to fix a single issue.
- 
- Please do not:
- 
-  * reformat code unrelated to the bug being fixed: formatting changes should 
be separate patches/commits.
-  * comment out code that is now obsolete: just remove it.
-  * insert comments around each change, marking the change: folks can use git 
to figure out what's changed and by whom.
-  * make things public which are not required by end users.
- 
- Please do:
- 
-  * try to adhere to the coding style of files you edit;
-  * comment code whose function or rationale is not obvious;
-  * update documentation (e.g., ''package.html'' files, this wiki, etc.)
- 
- If you need to rename files in your patch:
- 
-  1. Write a shell script that uses 'git mv' to rename the original files.
-  1. Edit files as needed (e.g., to change package names).
-  1. Create a patch file with 'git diff --no-prefix trunk'.
-  1. Submit both the shell script and the patch file.
- 
- This way other developers can preview your change by running the script and 
then applying the patch.
- 
- ===== Naming your patch =====
- 
- Patches for trunk should be named according to the Jira, with a version 
number: '''<jiraName>.<versionNum>.patch''', e.g. HADOOP-1234.001.patch, 
HDFS-4321.002.patch.
- 
- Patches for a non-trunk branch should be named 
'''<jiraName>-<branchName>.<versionNum>.patch''', e.g. 
HDFS-1234-branch-2.003.patch. The branch name suffix should be the exact name 
of a git branch, such as "branch-2". Jenkins will check the name of the patch 
and detect the appropriate branch for testing.
- 
- Please note that the Jenkins precommit build will not run against branches 
that use ant.
- 
- It's also OK to upload a new patch to Jira with the same name as an existing 
patch. If you select the "Activity>All" tab then the different versions are 
linked in the comment stream, providing context. However, many reviewers find 
it helpful to include a version number in the patch name (three-digit version 
number is recommended), '''so including a version number is the preferred 
style'''.
- 
- NOTE: Our Jenkins configuration uses [[https://yetus.apache.org|Apache 
Yetus]].  More advanced patch file names are documented on their 
[[https://yetus.apache.org/documentation/in-progress/precommit-patchnames/|patch
 names page]].
- 
- ==== Creating a GitHub pull request ====
- 
- Create a pull request in https://github.com/apache/hadoop.
- 
- You need to set the title which starts with the corresponding JIRA issue 
number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
- Jenkins precommit job will search the corresponding GitHub pull request and 
apply the diff automatically.
- If there is a corresponding pull request, you don't need to attach a patch in 
this issue because the precommit job always runs on pull request instead of the 
attached patch.
- 
- If there is no corresponding issue, please create a issue in ASF JIRA before 
creating a pull request.
- 
- === Testing your patch ===
- Before submitting your patch, you are encouraged to run the same tools that 
the automated Jenkins patch test system will run on your patch.  This enables 
you to fix problems with your patch before you submit it. The 
{{{dev-support/bin/test-patch}}} script in the trunk directory will run your 
patch through the same checks that Jenkins currently does ''except'' for 
executing the unit tests. (See TestPatchTips for some tricks.)
- 
- Run this command from a clean workspace (ie {{{git status}}} shows no 
modifications or additions) as follows:
- 
- {{{
- dev-support/bin/test-patch [options] patch-file | defect-number
- }}}
- 
- At the end, you should get a message on your console that is similar to the 
comment added to Jira by Jenkins's automated patch test system, listing +1 and 
-1 results. Generally you should expect a +1 overall in order to have your 
patch committed; exceptions will be made for false positives that are unrelated 
to your patch.  The scratch directory (which defaults to the value of 
{{{${user.home}/tmp}}}) will contain some output files that will be useful in 
determining cause if issues were found in the patch.
- 
- Some things to note:
- 
-  * the optional cmd parameters will default to the ones in your {{{PATH}}} 
environment variable
-  * the {{{grep}}} command must support the -o flag (Both GNU grep and BSD 
grep support it)
-  * the {{{patch}}} command must support the -E flag
- 
- Run the same command with no arguments to see the usage options.
- 
- === Applying a patch ===
- To apply a patch either you generated or found from JIRA, you can issue
- 
- {{{
- git apply -p0 --verbose cool_patch.patch
- }}}
- 
- If you are an Eclipse user, you can apply a patch by :
- 
-  1. Right click project name in Package Explorer
-  1. Team -> Apply Patch
- 
- === Changes that span projects ===
- You may find that you need to modify both the common project and MapReduce or 
HDFS. Or perhaps you have changed something in common, and need to verify that 
these changes do not break the existing unit tests for HDFS and MapReduce. 
Hadoop's build system integrates with a local maven repository to support 
cross-project development. Use this general workflow for your development:
- 
-  * Make your changes in common
-  * Run any unit tests there (e.g. 'mvn test')
-  * ''Publish'' your new common jar to your local mvn repository:<<BR>>
-  {{{
- hadoop-common$ mvn clean install -DskipTests
- }}}
-  . A word of caution: `mvn install` pushes the artifacts into your local 
Maven repository which is shared by all your projects.
-  * Switch to the dependent project and make any changes there (e.g., that 
rely on a new API you introduced in hadoop-common).
-  * Finally, create separate patches for your common and hdfs/mapred changes, 
and file them as separate JIRA issues associated with the appropriate projects.
- 
- == Contributing your work ==
-   1. Finally, patches should be ''attached'' to an issue report in 
[[http://issues.apache.org/jira/browse/HADOOP|Jira]] via the '''Attach File''' 
link on the issue's Jira. Please add a comment that asks for a code review 
following our [[CodeReviewChecklist|code review checklist]]. Please note that 
the attachment should be granted license to ASF for inclusion in ASF works (as 
per the [[http://www.apache.org/licenses/LICENSE-2.0|Apache License]] §5).
-   1. When you believe that your patch is ready to be committed, select the 
'''Submit Patch''' link on the issue's Jira.  Submitted patches will be 
automatically tested against "trunk" by 
[[https://builds.apache.org/view/All/|Jenkins]], the project's continuous 
integration engine.  Upon test completion, Jenkins will add a success ("+1") 
message or failure ("-1") to your issue report in Jira.  If your issue contains 
multiple patch versions, Jenkins tests the last patch uploaded.  It is 
preferable to upload the trunk version last.
-   1. Folks should run {{{mvn clean install javadoc:javadoc 
checkstyle:checkstyle}}} before selecting '''Submit Patch'''.
-     1. Tests must all pass.
-     1. Javadoc should report '''no''' warnings or errors.
-     1. The Javadoc on java 8 must not fail.
-     1. Checkstyle's error count should not exceed that listed at 
lastSuccessfulBuild/artifact/trunk/build/test/checkstyle-errors.html
-   . Jenkins's tests are meant to double-check things, and not be used as a 
primary patch tester, which would create too much noise on the mailing list and 
in Jira. Submitting patches that fail Jenkins testing is frowned on, (unless 
the failure is not actually due to the patch).
-   1. If your patch involves performance optimizations, they should be 
validated by benchmarks that demonstrate an improvement.
-   1. If your patch creates an incompatibility with the latest major release, 
then you must set the '''Incompatible change''' flag on the issue's Jira 'and' 
fill in the '''Release Note''' field with an explanation of the impact of the 
incompatibility and the necessary steps users must take.
-   1. If your patch implements a major feature or improvement, then you must 
fill in the '''Release Note''' field on the issue's Jira with an explanation of 
the feature that will be comprehensible by the end user.
- 
- Once a "+1" comment is received from the automated patch testing system and a 
code reviewer has set the '''Reviewed''' flag on the issue's Jira, a committer 
should then evaluate it within a few days and either: commit it; or reject it 
with an explanation.
- 
- Please be patient.  Committers are busy people too.  If no one responds to 
your patch after a few days, please make friendly reminders.  Please 
incorporate other's suggestions into your patch if you think they're 
reasonable.  Finally, remember that even a patch that is not committed is 
useful to the community.
- 
- Should your patch receive a "-1" from the Jenkins testing, select the 
'''Cancel Patch''' on the issue's Jira, upload a new patch with necessary 
fixes, and then select the '''Submit Patch''' link again.
- 
- 
- === Submitting patches against object stores such as Amazon S3, OpenStack 
Swift and Microsoft Azure ===
- 
- The modules {{{hadoop-aws}}}, {{{hadoop-openstack}}} and {{{hadoop-azure}}} 
contain filesystem clients which work with Amazon S3, OpenStack Swift and 
Microsoft Azure storage respectively.
- 
- The test suites for these modules are not executed on Jenkins because they 
need credentials to work with.
- 
- '''Having Jenkins +1 any patch against an object store does not mean the 
patch works: it must be manually tested by the submitter, the committer and any 
other reviewers who can do so'''
- 
- If a Yetus patch run says +1 for an object store patch, all it means is "the 
compilation, javadoc and style checks passed". It does not mean the patch 
works, or that it is ready to be committed.
- 
- The details of how to test for these object stores are covered 
[[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html|in
 the filesystem specification documentation]].
- 
- When submitting a patch, make sure the patch does not include any of your 
secret credentials. The Hadoop {{{.gitignore}}} file is set to ignore specific 
XML test resources for this purpose.
- 
- {{{
- 
hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml
- hadoop-tools/hadoop-openstack/src/test/resources/contract-test-options.xml
- hadoop-tools/hadoop-aws/src/test/resources/auth-keys.xml
- hadoop-tools/hadoop-aws/src/test/resources/contract-test-options.xml
- hadoop-tools/hadoop-azure/src/test/resources/azure-auth-keys.xml
- }}}
- 
- Please state which infrastructures you have tested against, —including which 
regions you tested against. If you have not tested the patch yourself, do not 
expect anyone to look at the patch.
- 
- We welcome anyone who can test these patches: please do so and again, declare 
what you have tested against. That includes in-house/proprietary 
implementations of the APIs as well as public infrastructures.
- 
- == Jira Guidelines ==
- Please comment on issues in Jira, making their concerns known.  Please also 
vote for issues that are a high priority for you.
- 
- Please refrain from editing descriptions and comments if possible, as edits 
spam the mailing list and clutter Jira's "All" display, which is otherwise very 
useful.  Instead, preview descriptions and comments using the preview button 
(on the right) before posting them.  Keep descriptions brief and save more 
elaborate proposals for comments, since descriptions are included in Jira's 
automatically sent messages.  If you change your mind, note this in a new 
comment, rather than editing an older comment.  The issue should preserve this 
history of the discussion.
- 
- Additionally, do not set the Fix Version. Committers use this field to 
determine which branches have had patches committed.  Instead, use the Affects 
and Target Versions to notify others of the branches that should be considered.
- 
- == Stay involved ==
- Contributors should join the 
[[http://hadoop.apache.org/core/mailing_lists.html|Hadoop mailing lists]].  In 
particular, the commit list (to see changes as they are made), the dev list (to 
join discussions of changes) and the user list (to help others).
- 
- == See Also ==
-  * [[http://www.apache.org/dev/contributors.html|Apache contributor 
documentation]]
-  * [[http://www.apache.org/foundation/voting.html|Apache voting 
documentation]]
- 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Hadoop Wiki] Update of "HowToContribute" by ArpitAgarwal

Reply via email to