[Hadoop Wiki] Update of "GithubIntegration" by SteveLoughran

Apache Wiki Tue, 10 Nov 2015 06:17:37 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "GithubIntegration" page has been changed by SteveLoughran:
https://wiki.apache.org/hadoop/GithubIntegration?action=diff&rev1=1&rev2=2

Comment:
review & cleanup of doc, add best practises section, and use 
'feature/hadoop-xxxx' as path for features

  = Github Setup and Pull Requests (PRs) =
  
- There are several ways to setup Git for committers and contributors. 
Contributors can safely setup Git any way they choose but committers should 
take extra care since they can push new commits to the trunk at Apache and 
various policies there make backing out mistakes problematic. To keep the 
commit history clean take note of the use of --squash below when merging into 
apache/trunk.
+ There are several ways to setup Git for committers and contributors. 
Contributors can safely setup Git any way they choose but committers should 
take extra care since they can push new commits to the trunk at Apache and 
various policies there make backing out mistakes problematic. To keep the 
commit history clean take note of the use of `--squash` below when merging into 
`apache/trunk`.
  
  == Git setup for Committers ==
  
- This describes setup for one local repo and two remotes. It allows you to 
push the code on your machine to either your Github repo or to 
git-wip-us.apache.org. You will want to fork github's apache/hadoop to your own 
account on github, this will enable Pull Requests of your own. Cloning this 
fork locally will set up "origin" to point to your remote fork on github as the 
default remote. So if you perform "git push origin trunk" it will go to github.
+ This describes setup for one local repo and two remotes. It allows you to 
push the code on your machine to either your Github repo or to 
git-wip-us.apache.org. You will want to fork github's apache/hadoop to your own 
account on github, this will enable Pull Requests of your own. Cloning this 
fork locally will set up "origin" to point to your remote fork on github as the 
default remote. So if you perform `git push origin trunk` it will go to github.
+ 
  To attach to the apache git repo do the following:
  
  {{{
@@ -28, +29 @@

  apache    https://git-wip-us.apache.org/repos/asf/hadoop.git (push)
  }}}
  
- Now if you want to experiment with a branch everything, by default, points to 
your github account because 'origin' is default. You can work as normal using 
only github until you are ready to merge with the apache remote. Some 
conventions will integrate with Apache Jira ticket numbers.
+ Now if you want to experiment with a branch everything, by default, points to 
your github account because `origin` is the. You can work as normal using only 
github until you are ready to merge with the apache remote. Some conventions 
will integrate with Apache Jira ticket numbers.
  
  {{{
- git checkout -b hadoop-xxxx #xxxx typically is a Jira ticket number
+ git checkout -b feature/hadoop-xxxx #xxxx typically is a Jira ticket number
  #do some work on the branch
  git commit -a -m "doing some work"
- git push origin hadoop-xxxx # notice pushing to **origin** not **apache**
+ git push origin feature/hadoop-xxxx # notice pushing to **origin** not 
**apache**
  }}}
  
  Once you are ready to commit to the apache remote you can merge and push them 
directly or better yet create a PR.
+ 
+ We recommend creating new branches under `feature/` to help group ongoing 
work, especially now that as of November 2015, forced updates are disabled on 
ASF branches. We hope to reinstate that ability on feature branches to aid 
development.
  
  == How to create a PR (committers) ==
  
  Push your branch to Github:
  
  {{{
- git checkout hadoop-xxxx
+ git checkout `feature/hadoop-xxxx`
  git rebase apache/trunk # to make it apply to the current trunk
- git push origin hadoop-xxxx
+ git push origin `feature/hadoop-xxxx`
  }}}
  
- Go to your hadoop-xxxx branch on Github. Since you forked it from Github's 
apache/hadoop it will default any PR to go to apache/trunk.
+  1. Go to your `feature/hadoop-xxxx` branch on Github. Since you forked it 
from Github's `apache/hadoop` it will default any PR to go to `apache/trunk`.
- Click the green "Compare, review, and create pull request" button.
+  1. Click the green "Compare, review, and create pull request" button.
- You can edit the to and from for the PR if it isn't correct. The "base fork" 
should be apache/hadoop unless you are collaborating separately with one of the 
committers on the list. The "base" will be trunk. Don't submit a PR to one of 
the other branches unless you know what you are doing. The "head fork" will be 
your forked repo and the "compare" will be your hadoop-xxxx branch.
+  1. You can edit the to and from for the PR if it isn't correct. The "base 
fork" should be `apache/hadoop` unless you are collaborating separately with 
one of the committers on the list. The "base" will be trunk. Don't submit a PR 
to one of the other branches unless you know what you are doing. The "head 
fork" will be your forked repo and the "compare" will be your 
`feature/hadoop-xxxx` branch.
- Click the "Create pull request" button and name the request "HADOOP-XXXX" all 
caps. This will connect the comments of the PR to the mailing list and Jira 
comments.
+  1. Click the "Create pull request" button and name the request "HADOOP-XXXX" 
all caps. This will connect the comments of the PR to the mailing list and Jira 
comments.
- From now on the PR lives on github's apache/hadoop. You use the commenting UI 
there.
+ From now on the PR lives on github's `apache/hadoop` repository. You use the 
commenting UI there.
+ 
- If you are looking for a review or sharing with someone else say so in the 
comments but don't worry about automated merging of your PR--you will have to 
do that later. The PR is tied to your branch so you can respond to comments, 
make fixes, and commit them from your local repo. They will appear on the PR 
page and be mirrored to Jira and the mailing list.
+ If you are looking for a review or sharing with someone else say so in the 
comments but don't worry about automated merging of your PR —you will have to 
do that later. The PR is tied to your branch so you can respond to comments, 
make fixes, and commit them from your local repo. They will appear on the PR 
page and be mirrored to Jira and the mailing list.
  When you are satisfied and want to push it to Apache's remote repo proceed 
with Merging a PR
  
  == How to create a PR (contributors) ==
  
  Create pull requests: 
[[https://help.github.com/articles/creating-a-pull-request|GitHub PR docs]].
+ 
  Pull requests are made to apache/hadoop repository on Github. In the Github 
UI you should pick the trunk branch to target the PR as described for 
committers. This will be reviewed and commented on so the merge is not 
automatic. This can be used for discussing a contributions in progress.
  
  == Merging a PR (yours or contributors) ==
@@ -75, +80 @@

  git pull --squash https://github.com/cuser/hadoop cbranch  # merge to trunk
  }}}
  
- --squash ensures all PR history is squashed into single commit, and allows 
committer to use his/her own message. Read git help for merge or pull for more 
information about --squash option. In this example we assume that the 
contributor's Github handle is "cuser" and the PR branch name is "cbranch". 
Next, resolve conflicts, if any, or ask a contributor to rebase on top of 
trunk, if PR went out of sync.
+ The `--squash` option ensures all PR history is squashed into single commit, 
and allows committer to use his/her own message. Read git help for merge or 
pull for more information about --squash option. In this example we assume that 
the contributor's Github handle is "cuser" and the PR branch name is "cbranch". 
Next, resolve conflicts, if any, or ask a contributor to rebase on top of 
trunk, if PR went out of sync.
+ 
  If you are ready to merge your own (committer's) PR you probably only need to 
merge (not pull), since you have a local copy that you've been working on. This 
is the branch that you used to create the PR.
  
  {{{
  git checkout trunk      # switch to local trunk branch
  git pull apache trunk   # fast-forward to current remote HEAD
- git merge --squash hadoop-xxxx
+ git merge --squash feature/hadoop-xxxx
  }}}
  
- Remember to run regular patch checks, build with tests enabled, and change 
CHANGELOG.
+ Remember to run regular patch checks, build with tests enabled, and change 
CHANGES.TXT for the appropriate part of the project.
+ 
  If everything is fine, you now can commit the squashed request along the lines
+ {{{
  git commit -a -m "HADOOP-XXXX description (cuser via your-apache-id) closes 
apache/hadoop#ZZ"
+ }}}
- HADOOP-XXXX is all caps and where ZZ is the pull request number on 
apache/hadoop repository. Including "closes apache/hadoop#ZZ" will close the PR 
automatically. More information is found at 
[[https://help.github.com/articles/closing-issues-via-commit-messages|GitHub PR 
closing docs]].
+ HADOOP-XXXX is all caps and where ZZ is the pull request number on 
apache/hadoop repository. Including `closes apache/hadoop#ZZ` will close the PR 
automatically. More information is found at 
[[https://help.github.com/articles/closing-issues-via-commit-messages|GitHub PR 
closing docs]].
- Next, push to git-wip-us.a.o:
+ Next, push to git-wip-us.apache.org:
+ 
+ {{{
  push apache trunk
+ }}}
+ 
  (this will require Apache handle credentials).
+ 
- The PR, once pushed, will get mirrored to github. To update your github 
version push there too:
+ The PR, once pushed, will get mirrored to github. To update your personal 
github version push there too:
  
  {{{
  push origin trunk
@@ -108, +122 @@

  git push apache trunk
  }}}
  
- that should close PR ZZ on github mirror without merging and any code 
modifications in the master repository.
+ That will close PR ZZ on github mirror without merging and any code 
modifications in the master repository.
  
  == Apache/github integration features ==
  
- Read 
[[https://blogs.apache.org/infra/entry/improved_integration_between_apache_and|infra
 blog]]. Comments and PRs with Hadoop issue handles should post to mailing 
lists and Jira. Hadoop issue handles must in the form HADOOP-YYYYY (all 
capitals). Usually it makes sense to file a jira issue first, and then create a 
PR with description
+ Read 
[[https://blogs.apache.org/infra/entry/improved_integration_between_apache_and|infra
 blog]]. Comments and PRs with Hadoop issue handles should post to mailing 
lists and Jira. Hadoop issue handles must in the form `HADOOP-YYYYY` (all 
capitals). Usually it makes sense to file a JIRA issue first, and then create a 
PR with description
+ {{{
  HADOOP-YYYY: <jira-issue-description>
- In this case all subsequent comments will automatically be copied to jira 
without having to mention jira issue explicitly in each comment of the PR.
+ }}}
  
+ In this case all subsequent comments will automatically be copied to JIRA 
without having to mention the JIRA issue explicitly in each comment of the PR.
+ 
+ == Best Practises ==
+ 
+ === Avoiding accidentally committing private branches to the ASF repo ===
+ 
+ Its dangerously easy —especially when using IDEs— to accidentally commit 
changes to the ASF repo, be it direct to the `trunk`, `branch-2` or other 
standard branch on which you are developing, or to a private branch you had 
intended to keep on github (or a private repo).
+ 
+ Committers can avoid this by having the directory in which they develop code 
set up with read only access to the ASF repository on github, without the 
apache repository added. A separate directory should be set up with write 
access to the ASF repository as well as read access to your other repositories. 
Merging operations and pushes back to the ASF repo are done from this directory 
—so isolated from all local development.
+ 
+ If you accidentally commit a patch to an ASF branch, do not attempt to roll 
back the branch and force out a new update. Simply commit and push out a new 
patch revoking the change.
+ 
+ If you do accidentally commit a branch to the ASF repo, the infrastructure 
team can delete it —but they cannot stop it propagating to github and 
potentially being visible. Try not to do that.
+ 
+ === Avoiding accidentally committing private keys to Amazon AWS, Microsoft 
Azure or other cloud infrastructures ===
+ 
+ All the cloud integration projects under `hadoop-tools` expect a resource 
file, `resources/auth-keys.xml` to contain the credentials for authenticating 
with cloud infrastructures. These files are explicitly excluded from git 
through entries in `.gitignore`. To avoid running up large bills and/or 
exposing private data, it is critical to keep any of your credentials secret.
+ 
+ For maximum security here, clone your hadoop repository into create separate 
directory for cloud tests, one with read-only access. Create the 
`auth-keys.xml` files there. This guarantees that you cannot commit the 
credentials, albeit with a somewhat more complex workflow, as patches must be 
pushed to a git repository before being pulled and tested into the 
cloud-enabled directory.
+ 
+ Accidentally committing secret credentials 
[[http://www.devfactor.net/2014/12/30/2375-amazon-mistake/|can be very 
expensive]]. You will not only need to revoke your keys, you will need to kill 
all bitcoining machines created on all EC2 zones, and all outstanding 
spot-price bids for them.
+

[Hadoop Wiki] Update of "GithubIntegration" by SteveLoughran

Reply via email to