[09/11] hbase git commit: HBASE-20831 Copy master doc into branch-2.1 and edit to make it suit 2.1.0

zhangduo Thu, 05 Jul 2018 00:21:01 -0700

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc 
b/src/main/asciidoc/_chapters/configuration.adoc
index 66fe5dd..174aa80 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -29,7 +29,7 @@
 
 This chapter expands upon the <<getting_started>> chapter to further explain 
configuration of Apache HBase.
 Please read this chapter carefully, especially the <<basic.prerequisites,Basic 
Prerequisites>>
-to ensure that your HBase testing and deployment goes smoothly, and prevent 
data loss.
+to ensure that your HBase testing and deployment goes smoothly.
 Familiarize yourself with <<hbase_supported_tested_definitions>> as well.
 
 == Configuration Files
@@ -92,24 +92,42 @@ This section lists required services and some required 
system configuration.
 
 [[java]]
 .Java
-[cols="1,1,4", options="header"]
+
+The following table summarizes the recommendation of the HBase community wrt 
deploying on various Java versions. An entry of "yes" is meant to indicate a 
base level of testing and willingness to help diagnose and address issues you 
might run into. Similarly, an entry of "no" or "Not Supported" generally means 
that should you run into an issue the community is likely to ask you to change 
the Java environment before proceeding to help. In some cases, specific 
guidance on limitations (e.g. wether compiling / unit tests work, specific 
operational issues, etc) will also be noted.
+
+.Long Term Support JDKs are recommended
+[TIP]
+====
+HBase recommends downstream users rely on JDK releases that are marked as Long 
Term Supported (LTS) either from the OpenJDK project or vendors. As of March 
2018 that means Java 8 is the only applicable version and that the next likely 
version to see testing will be Java 11 near Q3 2018.
+====
+
+.Java support by release line
+[cols="1,1,1,1,1", options="header"]
 |===
 |HBase Version
 |JDK 7
 |JDK 8
+|JDK 9
+|JDK 10
 
 |2.0
 |link:http://search-hadoop.com/m/YGbbsPxZ723m3as[Not Supported]
 |yes
+|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
+|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
 
 |1.3
 |yes
 |yes
+|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
+|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
 
 
 |1.2
 |yes
 |yes
+|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
+|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
 
 |===
 
@@ -146,9 +164,9 @@ It is recommended to raise the ulimit to at least 10,000, 
but more likely 10,240
 +
 For example, assuming that a schema had 3 ColumnFamilies per region with an 
average of 3 StoreFiles per ColumnFamily, and there are 100 regions per 
RegionServer, the JVM will open `3 * 3 * 100 = 900` file descriptors, not 
counting open JAR files, configuration files, and others. Opening a file does 
not take many resources, and the risk of allowing a user to open too many files 
is minimal.
 +
-Another related setting is the number of processes a user is allowed to run at 
once. In Linux and Unix, the number of processes is set using the `ulimit -u` 
command. This should not be confused with the `nproc` command, which controls 
the number of CPUs available to a given user. Under load, a `ulimit -u` that is 
too low can cause OutOfMemoryError exceptions. See Jack Levin's major HDFS 
issues thread on the hbase-users mailing list, from 2011.
+Another related setting is the number of processes a user is allowed to run at 
once. In Linux and Unix, the number of processes is set using the `ulimit -u` 
command. This should not be confused with the `nproc` command, which controls 
the number of CPUs available to a given user. Under load, a `ulimit -u` that is 
too low can cause OutOfMemoryError exceptions.
 +
-Configuring the maximum number of file descriptors and processes for the user 
who is running the HBase process is an operating system configuration, rather 
than an HBase configuration. It is also important to be sure that the settings 
are changed for the user that actually runs HBase. To see which user started 
HBase, and that user's ulimit configuration, look at the first line of the 
HBase log for that instance. A useful read setting config on your hadoop 
cluster is Aaron Kimball's Configuration Parameters: What can you just ignore?
+Configuring the maximum number of file descriptors and processes for the user 
who is running the HBase process is an operating system configuration, rather 
than an HBase configuration. It is also important to be sure that the settings 
are changed for the user that actually runs HBase. To see which user started 
HBase, and that user's ulimit configuration, look at the first line of the 
HBase log for that instance.
 +
 .`ulimit` Settings on Ubuntu
 ====
@@ -183,7 +201,8 @@ See 
link:https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Suppo
 .Hadoop 2.x is recommended.
 [TIP]
 ====
-Hadoop 2.x is faster and includes features, such as short-circuit reads, which 
will help improve your HBase random read profile.
+Hadoop 2.x is faster and includes features, such as short-circuit reads (see 
<<perf.hdfs.configs.localread>>),
+which will help improve your HBase random read profile.
 Hadoop 2.x also includes important bug fixes that will improve your overall 
HBase experience. HBase does not support running with
 earlier versions of Hadoop. See the table below for requirements specific to 
different HBase versions.
 
@@ -211,7 +230,9 @@ Use the following legend to interpret this table:
 |Hadoop-2.8.2 | NT | NT | NT | NT | NT
 |Hadoop-2.8.3+ | NT | NT | NT | S | S
 |Hadoop-2.9.0 | X | X | X | X | X
-|Hadoop-3.0.0 | NT | NT | NT | NT | NT
+|Hadoop-2.9.1+ | NT | NT | NT | NT | NT
+|Hadoop-3.0.x | X | X | X | X | X
+|Hadoop-3.1.0 | X | X | X | X | X
 |===
 
 .Hadoop Pre-2.6.1 and JDK 1.8 Kerberos
@@ -232,27 +253,35 @@ HBase on top of an HDFS Encryption Zone. Failure to do so 
will result in cluster
 data loss. This patch is present in Apache Hadoop releases 2.6.1+.
 ====
 
-.Hadoop 2.7.x
+.Hadoop 2.y.0 Releases
 [TIP]
 ====
-Hadoop version 2.7.0 is not tested or supported as the Hadoop PMC has 
explicitly labeled that release as not being stable. (reference the 
link:https://s.apache.org/hadoop-2.7.0-announcement[announcement of Apache 
Hadoop 2.7.0].)
+Starting around the time of Hadoop version 2.7.0, the Hadoop PMC got into the 
habit of calling out new minor releases on their major version 2 release line 
as not stable / production ready. As such, HBase expressly advises downstream 
users to avoid running on top of these releases. Note that additionally the 
2.8.1 release was given the same caveat by the Hadoop PMC. For reference, see 
the release announcements for 
link:https://s.apache.org/hadoop-2.7.0-announcement[Apache Hadoop 2.7.0], 
link:https://s.apache.org/hadoop-2.8.0-announcement[Apache Hadoop 2.8.0], 
link:https://s.apache.org/hadoop-2.8.1-announcement[Apache Hadoop 2.8.1], and 
link:https://s.apache.org/hadoop-2.9.0-announcement[Apache Hadoop 2.9.0].
 ====
 
-.Hadoop 2.8.x
+.Hadoop 3.0.x Releases
 [TIP]
 ====
-Hadoop version 2.8.0 and 2.8.1 are not tested or supported as the Hadoop PMC 
has explicitly labeled that releases as not being stable. (reference the 
link:https://s.apache.org/hadoop-2.8.0-announcement[announcement of Apache 
Hadoop 2.8.0] and 
link:https://s.apache.org/hadoop-2.8.1-announcement[announcement of Apache 
Hadoop 2.8.1].)
+Hadoop distributions that include the Application Timeline Service feature may 
cause unexpected versions of HBase classes to be present in the application 
classpath. Users planning on running MapReduce applications with HBase should 
make sure that link:https://issues.apache.org/jira/browse/YARN-7190[YARN-7190] 
is present in their YARN service (currently fixed in 2.9.1+ and 3.1.0+).
+====
+
+.Hadoop 3.1.0 Release
+[TIP]
+====
+The Hadoop PMC called out the 3.1.0 release as not stable / production ready. 
As such, HBase expressly advises downstream users to avoid running on top of 
this release. For reference, see the 
link:https://s.apache.org/hadoop-3.1.0-announcement[release announcement for 
Hadoop 3.1.0].
 ====
 
 .Replace the Hadoop Bundled With HBase!
 [NOTE]
 ====
-Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar 
under its _lib_ directory.
-The bundled jar is ONLY for use in standalone mode.
+Because HBase depends on Hadoop, it bundles Hadoop jars under its _lib_ 
directory.
+The bundled jars are ONLY for use in standalone mode.
 In distributed mode, it is _critical_ that the version of Hadoop that is out 
on your cluster match what is under HBase.
-Replace the hadoop jar found in the HBase lib directory with the hadoop jar 
you are running on your cluster to avoid version mismatch issues.
-Make sure you replace the jar in HBase across your whole cluster.
-Hadoop version mismatch issues have various manifestations but often all look 
like its hung.
+Replace the hadoop jars found in the HBase lib directory with the equivalent 
hadoop jars from the version you are running
+on your cluster to avoid version mismatch issues.
+Make sure you replace the jars under HBase across your whole cluster.
+Hadoop version mismatch issues have various manifestations. Check for mismatch 
if
+HBase appears hung.
 ====
 
 [[dfs.datanode.max.transfer.threads]]
@@ -537,7 +566,6 @@ If you are configuring an IDE to run an HBase client, you 
should include the _co
 For Java applications using Maven, including the hbase-shaded-client module is 
the recommended dependency when connecting to a cluster:
 [source,xml]
 ----
-
 <dependency>
   <groupId>org.apache.hbase</groupId>
   <artifactId>hbase-shaded-client</artifactId>


http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc 
b/src/main/asciidoc/_chapters/datamodel.adoc
index 3674566..ba4961a 100644
--- a/src/main/asciidoc/_chapters/datamodel.adoc
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -343,6 +343,7 @@ In particular:
 Below we describe how the version dimension in HBase currently works.
 See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for 
discussion of HBase versions. 
link:https://www.ngdata.com/bending-time-in-hbase/[Bending time in HBase] makes 
for a good read on the version, or time, dimension in HBase.
 It has more detail on versioning than is provided here.
+
 As of this writing, the limitation _Overwriting values at existing timestamps_ 
mentioned in the article no longer holds in HBase.
 This section is basically a synopsis of this article by Bruno Dumon.
 
@@ -503,8 +504,42 @@ Otherwise, a delete marker with a timestamp in the future 
is kept until the majo
 NOTE: This behavior represents a fix for an unexpected change that was 
introduced in HBase 0.94, and was fixed in 
link:https://issues.apache.org/jira/browse/HBASE-10118[HBASE-10118].
 The change has been backported to HBase 0.94 and newer branches.
 
+[[new.version.behavior]]
+=== Optional New Version and Delete behavior in HBase-2.0.0
+
+In `hbase-2.0.0`, the operator can specify an alternate version and
+delete treatment by setting the column descriptor property
+`NEW_VERSION_BEHAVIOR` to true (To set a property on a column family
+descriptor, you must first disable the table and then alter the
+column family descriptor; see <<cf.keep.deleted>> for an example
+of editing an attribute on a column family descriptor).
+
+The 'new version behavior', undoes the limitations listed below
+whereby a `Delete` ALWAYS overshadows a `Put` if at the same
+location -- i.e. same row, column family, qualifier and timestamp
+-- regardless of which arrived first. Version accounting is also
+changed as deleted versions are considered toward total version count.
+This is done to ensure results are not changed should a major
+compaction intercede. See `HBASE-15968` and linked issues for
+discussion.
+
+Running with this new configuration currently costs; we factor
+the Cell MVCC on every compare so we burn more CPU. The slow
+down will depend. In testing we've seen between 0% and 25%
+degradation.
+
+If replicating, it is advised that you run with the new
+serial replication feature (See `HBASE-9465`; the serial
+replication feature did NOT make it into `hbase-2.0.0` but
+should arrive in a subsequent hbase-2.x release) as now
+the order in which Mutations arrive is a factor.
+
+
 === Current Limitations
 
+The below limitations are addressed in hbase-2.0.0. See
+the section above, <<new.version.behavior>>.
+
 ==== Deletes mask Puts
 
 Deletes mask puts, even puts that happened after the delete was entered.

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/developer.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/developer.adoc 
b/src/main/asciidoc/_chapters/developer.adoc
index 11ef4ba..6d0a7d1 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -773,15 +773,15 @@ To do this, log in to Apache's Nexus at 
link:https://repository.apache.org[repos
 Find your artifacts in the staging repository. Click on 'Staging Repositories' 
and look for a new one ending in "hbase" with a status of 'Open', select it.
 Use the tree view to expand the list of repository contents and inspect if the 
artifacts you expect are present. Check the POMs.
 As long as the staging repo is open you can re-upload if something is missing 
or built incorrectly.
-
++
 If something is seriously wrong and you would like to back out the upload, you 
can use the 'Drop' button to drop and delete the staging repository.
 Sometimes the upload fails in the middle. This is another reason you might 
have to 'Drop' the upload from the staging repository.
-
++
 If it checks out, close the repo using the 'Close' button. The repository must 
be closed before a public URL to it becomes available. It may take a few 
minutes for the repository to close. Once complete you'll see a public URL to 
the repository in the Nexus UI. You may also receive an email with the URL. 
Provide the URL to the temporary staging repository in the email that announces 
the release candidate.
 (Folks will need to add this repo URL to their local poms or to their local 
_settings.xml_ file to pull the published release candidate artifacts.)
-
++
 When the release vote concludes successfully, return here and click the 
'Release' button to release the artifacts to central. The release process will 
automatically drop and delete the staging repository.
-
++
 .hbase-downstreamer
 [NOTE]
 ====
@@ -792,15 +792,18 @@ Make sure you are pulling from the repository when tests 
run and that you are no
 ====
 
 See link:https://www.apache.org/dev/publishing-maven-artifacts.html[Publishing 
Maven Artifacts] for some pointers on this maven staging process.
-
++
 If the HBase version ends in `-SNAPSHOT`, the artifacts go elsewhere.
 They are put into the Apache snapshots repository directly and are immediately 
available.
 Making a SNAPSHOT release, this is what you want to happen.
-
-At this stage, you have two tarballs in your 'build output directory' and a 
set of artifacts in a staging area of the maven repository, in the 'closed' 
state.
-
++
+At this stage, you have two tarballs in your 'build output directory' and a 
set of artifacts
+in a staging area of the maven repository, in the 'closed' state.
 Next sign, fingerprint and then 'stage' your release candiate build output 
directory via svnpubsub by committing
-your directory to link:https://dist.apache.org/repos/dist/dev/hbase/[The 'dev' 
distribution directory] (See comments on 
link:https://issues.apache.org/jira/browse/HBASE-10554[HBASE-10554 Please 
delete old releases from mirroring system] but in essence it is an svn checkout 
of https://dist.apache.org/repos/dist/dev/hbase -- releases are at 
https://dist.apache.org/repos/dist/release/hbase). In the _version directory_ 
run the following commands:
+your directory to link:https://dist.apache.org/repos/dist/dev/hbase/[The dev 
distribution directory]
+(See comments on 
link:https://issues.apache.org/jira/browse/HBASE-10554[HBASE-10554 Please 
delete old releases from mirroring system]
+but in essence it is an svn checkout of 
link:https://dist.apache.org/repos/dist/dev/hbase[dev/hbase] -- releases are at
+link:https://dist.apache.org/repos/dist/release/hbase[release/hbase]). In the 
_version directory_ run the following commands:
 
 [source,bourne]
 ----
@@ -867,6 +870,50 @@ See link:http://search-hadoop.com/m/DHED4dhFaU[HBase, mail 
# dev - On
                 recent discussion clarifying ASF release policy].
 for how we arrived at this process.
 
+[[hbase.release.announcement]]
+== Announcing Releases
+
+Once an RC has passed successfully and the needed artifacts have been staged 
for disribution, you'll need to let everyone know about our shiny new release. 
It's not a requirement, but to make things easier for release managers we have 
a template you can start with. Be sure you replace \_version_ and other markers 
with the relevant version numbers. You should manually verify all links before 
sending.
+
+[source,email]
+----
+The HBase team is happy to announce the immediate availability of HBase 
_version_.
+
+Apache HBaseâ¢ is an open-source, distributed, versioned, non-relational 
database.
+Apache HBase gives you low latency random access to billions of rows with
+millions of columns atop non-specialized hardware. To learn more about HBase,
+see https://hbase.apache.org/.
+
+HBase _version_ is the _nth_ minor release in the HBase _major_.x line, which 
aims to
+improve the stability and reliability of HBase. This release includes roughly
+XXX resolved issues not covered by previous _major_.x releases.
+
+Notable new features include:
+- List text descriptions of features that fit on one line
+- Including if JDK or Hadoop support versions changes
+- If the "stable" pointer changes, call that out
+- For those with obvious JIRA IDs, include them (HBASE-YYYYY)
+
+The full list of issues can be found in the included CHANGES.md and 
RELEASENOTES.md,
+or via our issue tracker:
+
+    https://s.apache.org/hbase-_version_-jira
+
+To download please follow the links and instructions on our website:
+
+    https://hbase.apache.org/downloads.html
+
+
+Question, comments, and problems are always welcome at: [email protected].
+
+Thanks to all who contributed and made this release possible.
+
+Cheers,
+The HBase Dev Team
+----
+
+You should sent this message to the following lists: [email protected], 
[email protected], [email protected]. If you'd like a spot check before 
sending, feel free to ask via jira or the dev list.
+
 [[documentation]]
 == Generating the HBase Reference Guide
 
@@ -909,13 +956,21 @@ For any other module, for example `hbase-common`, the 
tests must be strict unit
 ==== Testing the HBase Shell
 
 The HBase shell and its tests are predominantly written in jruby.
-In order to make these tests run as a part of the standard build, there is a 
single JUnit test, `TestShell`, that takes care of loading the jruby 
implemented tests and running them.
+
+In order to make these tests run as a part of the standard build, there are a 
few JUnit test classes that take care of loading the jruby implemented tests 
and running them.
+The tests were split into separate classes to accomodate class level timeouts 
(see <<hbase.unittests>> for specifics).
 You can run all of these tests from the top level with:
 
 [source,bourne]
 ----
+      mvn clean test -Dtest=Test*Shell
+----
+
+If you have previously done a `mvn install`, then you can instruct maven to 
run only the tests in the hbase-shell module with:
 
-      mvn clean test -Dtest=TestShell
+[source,bourne]
+----
+      mvn clean test -pl hbase-shell
 ----
 
 Alternatively, you may limit the shell tests that run using the system 
variable `shell.test`.
@@ -924,8 +979,7 @@ For example, the tests that cover the shell commands for 
altering tables are con
 
 [source,bourne]
 ----
-
-      mvn clean test -Dtest=TestShell -Dshell.test=/AdminAlterTableTest/
+      mvn clean test -pl hbase-shell -Dshell.test=/AdminAlterTableTest/
 ----
 
 You may also use a 
link:http://docs.ruby-doc.com/docs/ProgrammingRuby/html/language.html#UJ[Ruby 
Regular Expression
@@ -935,14 +989,13 @@ You can run all of the HBase admin related tests, 
including both the normal admi
 [source,bourne]
 ----
 
-      mvn clean test -Dtest=TestShell -Dshell.test=/.*Admin.*Test/
+      mvn clean test -pl hbase-shell -Dshell.test=/.*Admin.*Test/
 ----
 
 In the event of a test failure, you can see details by examining the XML 
version of the surefire report results
 
 [source,bourne]
 ----
-
       vim 
hbase-shell/target/surefire-reports/TEST-org.apache.hadoop.hbase.client.TestShell.xml
 ----
 
@@ -1462,9 +1515,8 @@ HBase ships with several ChaosMonkey policies, available 
in the
 [[chaos.monkey.properties]]
 ==== Configuring Individual ChaosMonkey Actions
 
-Since HBase version 1.0.0 
(link:https://issues.apache.org/jira/browse/HBASE-11348[HBASE-11348]),
 ChaosMonkey integration tests can be configured per test run.
-Create a Java properties file in the HBase classpath and pass it to 
ChaosMonkey using
+Create a Java properties file in the HBase CLASSPATH and pass it to 
ChaosMonkey using
 the `-monkeyProps` configuration flag. Configurable properties, along with 
their default
 values if applicable, are listed in the 
`org.apache.hadoop.hbase.chaos.factories.MonkeyConstants`
 class. For properties that have defaults, you can override them by including 
them
@@ -1477,7 +1529,9 @@ The following example uses a properties file called 
<<monkey.properties,monkey.p
 $ bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic 
-monkeyProps monkey.properties
 ----
 
-The above command will start the integration tests and chaos monkey passing 
the properties file _monkey.properties_.
+The above command will start the integration tests and chaos monkey. It will 
look for the
+properties file _monkey.properties_ on the HBase CLASSPATH; e.g. inside the 
HBASE _conf_ dir.
+
 Here is an example chaos monkey file:
 
 [[monkey.properties]]
@@ -1492,6 +1546,8 @@ move.regions.sleep.time=80000
 batch.restart.rs.ratio=0.4f
 ----
 
+Periods/time are expressed in milliseconds.
+
 HBase 1.0.2 and newer adds the ability to restart HBase's underlying ZooKeeper 
quorum or
 HDFS nodes. To use these actions, you need to configure some new properties, 
which
 have no reasonable defaults because they are deployment-specific, in your 
ChaosMonkey
@@ -1530,35 +1586,6 @@ We use Git for source code management and latest 
development happens on `master`
 branches for past major/minor/maintenance releases and important features and 
bug fixes are often
  back-ported to them.
 
-=== Release Managers
-
-Each maintained release branch has a release manager, who volunteers to 
coordinate new features and bug fixes are backported to that release.
-The release managers are 
link:https://hbase.apache.org/team-list.html[committers].
-If you would like your feature or bug fix to be included in a given release, 
communicate with that release manager.
-If this list goes out of date or you can't reach the listed person, reach out 
to someone else on the list.
-
-NOTE: End-of-life releases are not included in this list.
-
-.Release Managers
-[cols="1,1", options="header"]
-|===
-| Release
-| Release Manager
-
-| 1.2
-| Sean Busbey
-
-| 1.3
-| Mikhail Antonov
-
-| 1.4
-| Andrew Purtell
-
-| 2.0
-| Michael Stack
-
-|===
-
 [[code.standards]]
 === Code Standards
 
@@ -2186,6 +2213,12 @@ When the amending author is different from the original 
committer, add notice of
                                 - [DISCUSSION] Best practice when amending 
commits cherry picked
                                 from master to branch].
 
+====== Close related GitHub PRs
+
+As a project we work to ensure there's a JIRA associated with each change, but 
we don't mandate any particular tool be used for reviews. Due to implementation 
details of the ASF's integration between hosted git repositories and GitHub, 
the PMC has no ability to directly close PRs on our GitHub repo. In the event 
that a contributor makes a Pull Request on GitHub, either because the 
contributor finds that easier than attaching a patch to JIRA or because a 
reviewer prefers that UI for examining changes, it's important to make note of 
the PR in the commit that goes to the master branch so that PRs are kept up to 
date.
+
+To read more about the details of what kinds of commit messages will work with 
the GitHub "close via keyword in commit" mechanism see 
link:https://help.github.com/articles/closing-issues-using-keywords/[the GitHub 
documentation for "Closing issues using keywords"]. In summary, you should 
include a line with the phrase "closes #XXX", where the XXX is the pull request 
id. The pull request id is usually given in the GitHub UI in grey at the end of 
the subject heading.
+
 [[committer.tests]]
 ====== Committers are responsible for making sure commits do not break the 
build or tests
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/external_apis.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/external_apis.adoc 
b/src/main/asciidoc/_chapters/external_apis.adoc
index ffb6ee6..8f65c4e 100644
--- a/src/main/asciidoc/_chapters/external_apis.adoc
+++ b/src/main/asciidoc/_chapters/external_apis.adoc
@@ -186,20 +186,20 @@ creation or mutation, and `DELETE` for deletion.
 
 |/_table_/schema
 |POST
-|Create a new table, or replace an existing table's schema
+|Update an existing table with the provided schema fragment
 |curl -vi -X POST \
   -H "Accept: text/xml" \
   -H "Content-Type: text/xml" \
-  -d '&lt;?xml version="1.0" encoding="UTF-8"?>&lt;TableSchema 
name="users">&lt;ColumnSchema name="cf" />&lt;/TableSchema>' \
+  -d '&lt;?xml version="1.0" encoding="UTF-8"?>&lt;TableSchema 
name="users">&lt;ColumnSchema name="cf" KEEP_DELETED_CELLS="true" 
/>&lt;/TableSchema>' \
   "http://example.com:8000/users/schema";
 
 |/_table_/schema
 |PUT
-|Update an existing table with the provided schema fragment
+|Create a new table, or replace an existing table's schema
 |curl -vi -X PUT \
   -H "Accept: text/xml" \
   -H "Content-Type: text/xml" \
-  -d '&lt;?xml version="1.0" encoding="UTF-8"?>&lt;TableSchema 
name="users">&lt;ColumnSchema name="cf" KEEP_DELETED_CELLS="true" 
/>&lt;/TableSchema>' \
+  -d '&lt;?xml version="1.0" encoding="UTF-8"?>&lt;TableSchema 
name="users">&lt;ColumnSchema name="cf" />&lt;/TableSchema>' \
   "http://example.com:8000/users/schema";
 
 |/_table_/schema
@@ -851,23 +851,14 @@ println(Bytes.toString(value))
 === Setting the Classpath
 
 To use Jython with HBase, your CLASSPATH must include HBase's classpath as 
well as
-the Jython JARs required by your code. First, use the following command on a 
server
-running the HBase RegionServer process, to get HBase's classpath.
+the Jython JARs required by your code.
 
-[source, bash]
-----
-$ ps aux |grep regionserver| awk -F 'java.library.path=' {'print $2'} | awk 
{'print $1'}
-
-/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64
-----
-
-Set the `$CLASSPATH` environment variable to include the path you found in the 
previous
-step, plus the path to `jython.jar` and each additional Jython-related JAR 
needed for
-your project.
+Set the path to directory containing the `jython.jar` and each additional 
Jython-related JAR needed for
+your project. Then export HBASE_CLASSPATH pointing to the $JYTHON_HOME env. 
variable.
 
 [source, bash]
 ----
-$ export 
CLASSPATH=$CLASSPATH:/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64:/path/to/jython.jar
+$ export HBASE_CLASSPATH=/directory/jython.jar
 ----
 
 Start a Jython shell with HBase and Hadoop JARs in the classpath:
@@ -877,55 +868,52 @@ $ bin/hbase org.python.util.jython
 
 .Table Creation, Population, Get, and Delete with Jython
 ====
-The following Jython code example creates a table, populates it with data, 
fetches
-the data, and deletes the table.
+The following Jython code example checks for table,
+if it exists, deletes it and then creates it. Then it
+populates the table with data and fetches the data.
 
 [source,jython]
 ----
 import java.lang
-from org.apache.hadoop.hbase import HBaseConfiguration, HTableDescriptor, 
HColumnDescriptor, HConstants, TableName
-from org.apache.hadoop.hbase.client import HBaseAdmin, HTable, Get
-from org.apache.hadoop.hbase.io import Cell, RowResult
+from org.apache.hadoop.hbase import HBaseConfiguration, HTableDescriptor, 
HColumnDescriptor, TableName
+from org.apache.hadoop.hbase.client import Admin, Connection, 
ConnectionFactory, Get, Put, Result, Table
+from org.apache.hadoop.conf import Configuration
 
 # First get a conf object.  This will read in the configuration
 # that is out in your hbase-*.xml files such as location of the
 # hbase master node.
-conf = HBaseConfiguration()
+conf = HBaseConfiguration.create()
+connection = ConnectionFactory.createConnection(conf)
+admin = connection.getAdmin()
 
-# Create a table named 'test' that has two column families,
-# one named 'content, and the other 'anchor'.  The colons
-# are required for column family names.
-tablename = TableName.valueOf("test")
+# Create a table named 'test' that has a column family
+# named 'content'.
+tableName = TableName.valueOf("test")
+table = connection.getTable(tableName)
 
-desc = HTableDescriptor(tablename)
-desc.addFamily(HColumnDescriptor("content:"))
-desc.addFamily(HColumnDescriptor("anchor:"))
-admin = HBaseAdmin(conf)
+desc = HTableDescriptor(tableName)
+desc.addFamily(HColumnDescriptor("content"))
 
 # Drop and recreate if it exists
-if admin.tableExists(tablename):
-    admin.disableTable(tablename)
-    admin.deleteTable(tablename)
-admin.createTable(desc)
+if admin.tableExists(tableName):
+    admin.disableTable(tableName)
+    admin.deleteTable(tableName)
 
-tables = admin.listTables()
-table = HTable(conf, tablename)
+admin.createTable(desc)
 
 # Add content to 'column:' on a row named 'row_x'
 row = 'row_x'
-update = Get(row)
-update.put('content:', 'some content')
-table.commit(update)
+put = Put(row)
+put.addColumn("content", "qual", "some content")
+table.put(put)
 
 # Now fetch the content just added, returns a byte[]
-data_row = table.get(row, "content:")
-data = java.lang.String(data_row.value, "UTF8")
+get = Get(row)
 
-print "The fetched row contains the value '%s'" % data
+result = table.get(get)
+data = java.lang.String(result.getValue("content", "qual"), "UTF8")
 
-# Delete the table.
-admin.disableTable(desc.getName())
-admin.deleteTable(desc.getName())
+print "The fetched row contains the value '%s'" % data
 ----
 ====
 
@@ -935,24 +923,23 @@ This example scans a table and returns the results that 
match a given family qua
 
 [source, jython]
 ----
-# Print all rows that are members of a particular column family
-# by passing a regex for family qualifier
-
 import java.lang
-
-from org.apache.hadoop.hbase import HBaseConfiguration
-from org.apache.hadoop.hbase.client import HTable
-
-conf = HBaseConfiguration()
-
-table = HTable(conf, "wiki")
-col = "title:.*$"
-
-scanner = table.getScanner([col], "")
+from org.apache.hadoop.hbase import TableName, HBaseConfiguration
+from org.apache.hadoop.hbase.client import Connection, ConnectionFactory, 
Result, ResultScanner, Table, Admin
+from org.apache.hadoop.conf import Configuration
+conf = HBaseConfiguration.create()
+connection = ConnectionFactory.createConnection(conf)
+admin = connection.getAdmin()
+tableName = TableName.valueOf('wiki')
+table = connection.getTable(tableName)
+
+cf = "title"
+attr = "attr"
+scanner = table.getScanner(cf)
 while 1:
     result = scanner.next()
     if not result:
-        break
-    print java.lang.String(result.row), 
java.lang.String(result.get('title:').value)
+       break
+    print java.lang.String(result.row), java.lang.String(result.getValue(cf, 
attr))
 ----
 ====

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/getting_started.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/getting_started.adoc 
b/src/main/asciidoc/_chapters/getting_started.adoc
index 1cdc0a2..84ebcaa 100644
--- a/src/main/asciidoc/_chapters/getting_started.adoc
+++ b/src/main/asciidoc/_chapters/getting_started.adoc
@@ -52,7 +52,7 @@ See <<java,Java>> for information about supported JDK 
versions.
 === Get Started with HBase
 
 .Procedure: Download, Configure, and Start HBase in Standalone Mode
-. Choose a download site from this list of 
link:https://www.apache.org/dyn/closer.cgi/hbase/[Apache Download Mirrors].
+. Choose a download site from this list of 
link:https://www.apache.org/dyn/closer.lua/hbase/[Apache Download Mirrors].
   Click on the suggested top link.
   This will take you to a mirror of _HBase Releases_.
   Click on the folder named _stable_ and then download the binary file that 
ends in _.tar.gz_ to your local filesystem.
@@ -82,7 +82,7 @@ JAVA_HOME=/usr
 +
 
 . Edit _conf/hbase-site.xml_, which is the main HBase configuration file.
-  At this time, you only need to specify the directory on the local filesystem 
where HBase and ZooKeeper write data.
+  At this time, you need to specify the directory on the local filesystem 
where HBase and ZooKeeper write data and acknowledge some risks.
   By default, a new directory is created under /tmp.
   Many servers are configured to delete the contents of _/tmp_ upon reboot, so 
you should store the data elsewhere.
   The following configuration will store HBase's data in the _hbase_ 
directory, in the home directory of the user called `testuser`.
@@ -102,6 +102,21 @@ JAVA_HOME=/usr
     <name>hbase.zookeeper.property.dataDir</name>
     <value>/home/testuser/zookeeper</value>
   </property>
+  <property>
+    <name>hbase.unsafe.stream.capability.enforce</name>
+    <value>false</value>
+    <description>
+      Controls whether HBase will check for stream capabilities (hflush/hsync).
+
+      Disable this if you intend to run on LocalFileSystem, denoted by a 
rootdir
+      with the 'file://' scheme, but be mindful of the NOTE below.
+
+      WARNING: Setting this to false blinds you to potential data loss and
+      inconsistent system state in the event of process and/or node failures. 
If
+      HBase is complaining of an inability to use hsync or hflush it's most
+      likely not a false positive.
+    </description>
+  </property>
 </configuration>
 ----
 ====
@@ -111,7 +126,14 @@ HBase will do this for you.  If you create the directory,
 HBase will attempt to do a migration, which is not what you want.
 +
 NOTE: The _hbase.rootdir_ in the above example points to a directory
-in the _local filesystem_. The 'file:/' prefix is how we denote local 
filesystem.
+in the _local filesystem_. The 'file://' prefix is how we denote local
+filesystem. You should take the WARNING present in the configuration example
+to heart. In standalone mode HBase makes use of the local filesystem 
abstraction
+from the Apache Hadoop project. That abstraction doesn't provide the durability
+promises that HBase needs to operate safely. This is fine for local development
+and testing use cases where the cost of cluster failure is well contained. It 
is
+not appropriate for production deployments; eventually you will lose data.
+
 To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to 
point at a
 directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_.
 For more on this variant, see the section below on Standalone HBase over HDFS.
@@ -163,7 +185,7 @@ hbase(main):001:0> create 'test', 'cf'
 
 . List Information About your Table
 +
-Use the `list` command to
+Use the `list` command to confirm your table exists
 +
 ----
 hbase(main):002:0> list 'test'
@@ -174,6 +196,22 @@ test
 => ["test"]
 ----
 
++
+Now use the `describe` command to see details, including configuration defaults
++
+----
+hbase(main):003:0> describe 'test'
+Table test is ENABLED
+test
+COLUMN FAMILIES DESCRIPTION
+{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', 
NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', 
CACHE_DATA_ON_WRITE =>
+'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
+alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
'true', BLOCKSIZE
+ => '65536'}
+1 row(s)
+Took 0.9998 seconds
+----
+
 . Put data into your table.
 +
 To put data into your table, use the `put` command.
@@ -314,7 +352,7 @@ First, add the following property which directs HBase to 
run in distributed mode
 ----
 +
 Next, change the `hbase.rootdir` from the local filesystem to the address of 
your HDFS instance, using the `hdfs:////` URI syntax.
-In this example, HDFS is running on the localhost at port 8020.
+In this example, HDFS is running on the localhost at port 8020. Be sure to 
either remove the entry for `hbase.unsafe.stream.capability.enforce` or set it 
to true.
 +
 [source,xml]
 ----
@@ -371,7 +409,7 @@ The following command starts 3 backup servers using ports 
16002/16012, 16003/160
 +
 ----
 
-$ ./bin/local-master-backup.sh 2 3 5
+$ ./bin/local-master-backup.sh start 2 3 5
 ----
 +
 To kill a backup master without killing the entire cluster, you need to find 
its process ID (PID). The PID is stored in a file with a name like 
_/tmp/hbase-USER-X-master.pid_.
@@ -566,18 +604,14 @@ On each node of the cluster, run the `jps` command and 
verify that the correct p
 You may see additional Java processes running on your servers as well, if they 
are used for other purposes.
 +
 .`node-a` `jps` Output
-====
 ----
-
 $ jps
 20355 Jps
 20071 HQuorumPeer
 20137 HMaster
 ----
-====
 +
 .`node-b` `jps` Output
-====
 ----
 $ jps
 15930 HRegionServer
@@ -585,17 +619,14 @@ $ jps
 15838 HQuorumPeer
 16010 HMaster
 ----
-====
 +
 .`node-c` `jps` Output
-====
 ----
 $ jps
 13901 Jps
 13639 HQuorumPeer
 13737 HRegionServer
 ----
-====
 +
 .ZooKeeper Process Name
 [NOTE]

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/hbase-default.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc 
b/src/main/asciidoc/_chapters/hbase-default.adoc
index 7798657..f809f28 100644
--- a/src/main/asciidoc/_chapters/hbase-default.adoc
+++ b/src/main/asciidoc/_chapters/hbase-default.adoc
@@ -150,7 +150,7 @@ A comma-separated list of BaseLogCleanerDelegate invoked by
 *`hbase.master.logcleaner.ttl`*::
 +
 .Description
-Maximum time a WAL can stay in the .oldlogdir directory,
+Maximum time a WAL can stay in the oldWALs directory,
     after which it will be cleaned by a Master thread.
 +
 .Default

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/hbase_mob.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase_mob.adoc 
b/src/main/asciidoc/_chapters/hbase_mob.adoc
index 9730529..8048772 100644
--- a/src/main/asciidoc/_chapters/hbase_mob.adoc
+++ b/src/main/asciidoc/_chapters/hbase_mob.adoc
@@ -61,12 +61,10 @@ an object is considered to be a MOB. Only `IS_MOB` is 
required. If you do not
 specify the `MOB_THRESHOLD`, the default threshold value of 100 KB is used.
 
 .Configure a Column for MOB Using HBase Shell
-====
 ----
 hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
 hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
 ----
-====
 
 .Configure a Column for MOB Using the Java API
 ====
@@ -91,7 +89,6 @@ weekly policy - compact MOB Files for one week into one large 
MOB file
 montly policy - compact MOB Files for one  month into one large MOB File
 
 .Configure MOB compaction policy Using HBase Shell
-====
 ----
 hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'daily'}
 hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'weekly'}
@@ -101,7 +98,6 @@ hbase> alter 't1', {NAME => 'f1', IS_MOB => true, 
MOB_THRESHOLD => 102400, MOB_C
 hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'weekly'}
 hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400, 
MOB_COMPACT_PARTITION_POLICY => 'monthly'}
 ----
-====
 
 === Configure MOB Compaction mergeable threshold
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/images
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/images 
b/src/main/asciidoc/_chapters/images
index 1e0c6c1..dc4cd20 120000
--- a/src/main/asciidoc/_chapters/images
+++ b/src/main/asciidoc/_chapters/images
@@ -1 +1 @@
-../../site/resources/images
\ No newline at end of file
+../../../site/resources/images/
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc 
b/src/main/asciidoc/_chapters/ops_mgt.adoc
index c7362ac..10508f4 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -68,8 +68,12 @@ Some commands take arguments. Pass no args or -h for usage.
   pe              Run PerformanceEvaluation
   ltt             Run LoadTestTool
   canary          Run the Canary tool
-  regionsplitter  Run the RegionSplitter tool
   version         Print the version
+  backup          Backup tables for recovery
+  restore         Restore tables from existing backup image
+  regionsplitter  Run RegionSplitter tool
+  rowcounter      Run RowCounter tool
+  cellcounter     Run CellCounter tool
   CLASSNAME       Run the class named CLASSNAME
 ----
 
@@ -79,7 +83,7 @@ Others, such as `hbase shell` (<<shell>>), `hbase upgrade` 
(<<upgrading>>), and
 === Canary
 
 There is a Canary class can help users to canary-test the HBase cluster 
status, with every column-family for every regions or RegionServer's 
granularity.
-To see the usage, use the `--help` parameter.
+To see the usage, use the `-help` parameter.
 
 ----
 $ ${HBASE_HOME}/bin/hbase canary -help
@@ -108,6 +112,13 @@ Usage: hbase canary [opts] [table1 [table2]...] | 
[regionserver1 [regionserver2]
    -D<configProperty>=<value> assigning or override the configuration params
 ----
 
+[NOTE]
+The `Sink` class is instantiated using the `hbase.canary.sink.class` 
configuration property which
+will also determine the used Monitor class. In the absence of this property 
RegionServerStdOutSink
+will be used. You need to use the Sink according to the passed parameters to 
the _canary_ command.
+As an example you have to set `hbase.canary.sink.class` property to
+`org.apache.hadoop.hbase.tool.Canary$RegionStdOutSink` for using table 
parameters.
+
 This tool will return non zero error codes to user for collaborating with 
other monitoring tools, such as Nagios.
 The error code definitions are:
 
@@ -192,10 +203,10 @@ This daemon will stop itself and return non-zero error 
code if any error occurs,
 $ ${HBASE_HOME}/bin/hbase canary -daemon
 ----
 
-Run repeatedly with internal 5 seconds and will not stop itself even if errors 
occur in the test.
+Run repeatedly with 5 second intervals and will not stop itself even if errors 
occur in the test.
 
 ----
-$ ${HBASE_HOME}/bin/hbase canary -daemon -interval 50000 -f false
+$ ${HBASE_HOME}/bin/hbase canary -daemon -interval 5 -f false
 ----
 
 ==== Force timeout if canary test stuck
@@ -205,7 +216,7 @@ Because of this we provide a timeout option to kill the 
canary test and return a
 This run sets the timeout value to 60 seconds, the default value is 600 
seconds.
 
 ----
-$ ${HBASE_HOME}/bin/hbase canary -t 600000
+$ ${HBASE_HOME}/bin/hbase canary -t 60000
 ----
 
 ==== Enable write sniffing in canary
@@ -234,7 +245,7 @@ while returning normal exit code. To treat read / write 
failure as error, you ca
 with the `-treatFailureAsError` option. When enabled, read / write failure 
would result in error
 exit code.
 ----
-$ ${HBASE_HOME}/bin/hbase canary --treatFailureAsError
+$ ${HBASE_HOME}/bin/hbase canary -treatFailureAsError
 ----
 
 ==== Running Canary in a Kerberos-enabled Cluster
@@ -266,7 +277,7 @@ This example shows each of the properties with valid values.
   <value>/etc/hbase/conf/keytab.krb5</value>
 </property>
 <!-- optional params -->
-property>
+<property>
   <name>hbase.client.dns.interface</name>
   <value>default</value>
 </property>
@@ -381,7 +392,7 @@ directory.
 You can get a textual dump of a WAL file content by doing the following:
 
 ----
- $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --dump 
hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
+ $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --dump 
hdfs://example.org:8020/hbase/WALs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
 ----
 
 The return code will be non-zero if there are any issues with the file so you 
can test wholesomeness of file by redirecting `STDOUT` to `/dev/null` and 
testing the program return.
@@ -389,7 +400,7 @@ The return code will be non-zero if there are any issues 
with the file so you ca
 Similarly you can force a split of a log file directory by doing:
 
 ----
- $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --split 
hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/
+ $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --split 
hdfs://example.org:8020/hbase/WALs/example.org,60020,1283516293161/
 ----
 
 [[hlog_tool.prettyprint]]
@@ -399,7 +410,7 @@ The `WALPrettyPrinter` is a tool with configurable options 
to print the contents
 You can invoke it via the HBase cli with the 'wal' command.
 
 ----
- $ ./bin/hbase wal 
hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
+ $ ./bin/hbase wal 
hdfs://example.org:8020/hbase/WALs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
 ----
 
 .WAL Printing in older versions of HBase
@@ -677,6 +688,7 @@ Assuming you're running HDFS with permissions enabled, 
those permissions will ne
 
 For more information about bulk-loading HFiles into HBase, see 
<<arch.bulk.load,arch.bulk.load>>.
 
+[[walplayer]]
 === WALPlayer
 
 WALPlayer is a utility to replay WAL files into HBase.
@@ -701,25 +713,63 @@ $ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer 
/backuplogdir oldTable1,
 WALPlayer, by default, runs as a mapreduce job.
 To NOT run WALPlayer as a mapreduce job on your cluster, force it to run all 
in the local process by adding the flags `-Dmapreduce.jobtracker.address=local` 
on the command line.
 
+[[walplayer.options]]
+==== WALPlayer Options
+
+Running `WALPlayer` with no arguments prints brief usage information:
+
+----
+Usage: WALPlayer [options] <wal inputdir> <tables> [<tableMappings>]
+Replay all WAL files into HBase.
+<tables> is a comma separated list of tables.
+If no tables ("") are specified, all tables are imported.
+(Be careful, hbase:meta entries will be imported in this case.)
+
+WAL entries can be mapped to new set of tables via <tableMappings>.
+<tableMappings> is a comma separated list of target tables.
+If specified, each table in <tables> must have a mapping.
+
+By default WALPlayer will load data directly into HBase.
+To generate HFiles for a bulk data load instead, pass the following option:
+  -Dwal.bulk.output=/path/for/output
+  (Only one table can be specified, and no mapping is allowed!)
+Time range options:
+  -Dwal.start.time=[date|ms]
+  -Dwal.end.time=[date|ms]
+  (The start and the end date of timerange. The dates can be expressed
+  in milliseconds since epoch or in yyyy-MM-dd'T'HH:mm:ss.SS format.
+  E.g. 1234567890120 or 2009-02-13T23:32:30.12)
+Other options:
+  -Dmapreduce.job.name=jobName
+  Use the specified mapreduce job name for the wal player
+For performance also consider the following options:
+  -Dmapreduce.map.speculative=false
+  -Dmapreduce.reduce.speculative=false
+----
+
 [[rowcounter]]
-=== RowCounter and CellCounter
+=== RowCounter
 
-link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter]
        is a mapreduce job to count all the rows of a table.
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter]
 is a mapreduce job to count all the rows of a table.
 This is a good utility to use as a sanity check to ensure that HBase can read 
all the blocks of a table if there are any concerns of metadata inconsistency.
-It will run the mapreduce all in a single process but it will run faster if 
you have a MapReduce cluster in place for it to exploit. It is also possible to 
limit
-the time range of data to be scanned by using the `--starttime=[starttime]` 
and `--endtime=[endtime]` flags.
+It will run the mapreduce all in a single process but it will run faster if 
you have a MapReduce cluster in place for it to exploit.
+It is possible to limit the time range of data to be scanned by using the 
`--starttime=[starttime]` and `--endtime=[endtime]` flags.
+The scanned data can be limited based on keys using the 
`--range=[startKey],[endKey][;[startKey],[endKey]...]` option.
 
 ----
-$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> 
[<column1> <column2>...]
+$ bin/hbase rowcounter [options] <tablename> [--starttime=<start> 
--endtime=<end>] [--range=[startKey],[endKey][;[startKey],[endKey]...]] 
[<column1> <column2>...]
 ----
 
 RowCounter only counts one version per cell.
 
-Note: caching for the input Scan is configured via 
`hbase.client.scanner.caching` in the job configuration.
+For performance consider to use `-Dhbase.client.scanner.caching=100` and 
`-Dmapreduce.map.speculative=false` options.
+
+[[cellcounter]]
+=== CellCounter
 
 HBase ships another diagnostic mapreduce job called 
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html[CellCounter].
 Like RowCounter, it gathers more fine-grained statistics about your table.
-The statistics gathered by RowCounter are more fine-grained and include:
+The statistics gathered by CellCounter are more fine-grained and include:
 
 * Total number of rows in the table.
 * Total number of CFs across all rows.
@@ -730,12 +780,12 @@ The statistics gathered by RowCounter are more 
fine-grained and include:
 
 The program allows you to limit the scope of the run.
 Provide a row regex or prefix to limit the rows to analyze.
-Specify a time range to scan the table by using the `--starttime=[starttime]` 
and `--endtime=[endtime]` flags.
+Specify a time range to scan the table by using the `--starttime=<starttime>` 
and `--endtime=<endtime>` flags.
 
 Use `hbase.mapreduce.scan.column.family` to specify scanning a single column 
family.
 
 ----
-$ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> 
<outputDir> [regex or prefix]
+$ bin/hbase cellcounter <tablename> <outputDir> [reportSeparator] [regex or 
prefix] [--starttime=<starttime> --endtime=<endtime>]
 ----
 
 Note: just like RowCounter, caching for the input Scan is configured via 
`hbase.client.scanner.caching` in the job configuration.
@@ -743,8 +793,7 @@ Note: just like RowCounter, caching for the input Scan is 
configured via `hbase.
 === mlockall
 
 It is possible to optionally pin your servers in physical memory making them 
less likely to be swapped out in oversubscribed environments by having the 
servers call link:http://linux.die.net/man/2/mlockall[mlockall] on startup.
-See link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add 
ability to
-          start RS as root and call mlockall] for how to build the optional 
library and have it run on startup.
+See link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add 
ability to start RS as root and call mlockall] for how to build the optional 
library and have it run on startup.
 
 [[compaction.tool]]
 === Offline Compaction Tool
@@ -1024,13 +1073,10 @@ The script requires you to set some environment 
variables before running it.
 Examine the script and modify it to suit your needs.
 
 ._rolling-restart.sh_ General Usage
-====
 ----
-
 $ ./bin/rolling-restart.sh --help
 Usage: rolling-restart.sh [--config <hbase-confdir>] [--rs-only] 
[--master-only] [--graceful] [--maxthreads xx]
 ----
-====
 
 Rolling Restart on RegionServers Only::
   To perform a rolling restart on the RegionServers only, use the `--rs-only` 
option.
@@ -2645,8 +2691,10 @@ full implications and have a sufficient background in 
managing HBase clusters.
 It was developed by Yahoo! and they run it at scale on their large grid 
cluster.
 See 
link:http://www.slideshare.net/HBaseCon/keynote-apache-hbase-at-yahoo-scale[HBase
 at Yahoo! Scale].
 
-RSGroups can be defined and managed with shell commands or corresponding Java
-APIs. A server can be added to a group with hostname and port pair and tables
+RSGroups are defined and managed with shell commands. The shell drives a
+Coprocessor Endpoint whose API is marked private given this is an evolving
+feature; the Coprocessor API is not for public consumption.
+A server can be added to a group with hostname and port pair and tables
 can be moved to this group so that only regionservers in the same rsgroup can
 host the regions of the table. RegionServers and tables can only belong to one
 rsgroup at a time. By default, all tables and regionservers belong to the
@@ -2781,6 +2829,48 @@ Viewing the Master log will give you insight on rsgroup 
operation.
 
 If it appears stuck, restart the Master process.
 
+=== Remove RegionServer Grouping
+Removing RegionServer Grouping feature from a cluster on which it was enabled 
involves
+more steps in addition to removing the relevant properties from 
`hbase-site.xml`. This is
+to clean the RegionServer grouping related meta data so that if the feature is 
re-enabled
+in the future, the old meta data will not affect the functioning of the 
cluster.
+
+- Move all tables in non-default rsgroups to `default` regionserver group
+[source,bash]
+----
+#Reassigning table t1 from non default group - hbase shell
+hbase(main):005:0> move_tables_rsgroup 'default',['t1']
+----
+- Move all regionservers in non-default rsgroups to `default` regionserver 
group
+[source, bash]
+----
+#Reassigning all the servers in the non-default rsgroup to default - hbase 
shell
+hbase(main):008:0> move_servers_rsgroup 
'default',['rs1.xxx.com:16206','rs2.xxx.com:16202','rs3.xxx.com:16204']
+----
+- Remove all non-default rsgroups. `default` rsgroup created implicitly 
doesn't have to be removed
+[source,bash]
+----
+#removing non default rsgroup - hbase shell
+hbase(main):009:0> remove_rsgroup 'group2'
+----
+- Remove the changes made in `hbase-site.xml` and restart the cluster
+- Drop the table `hbase:rsgroup` from `hbase`
+[source, bash]
+----
+#Through hbase shell drop table hbase:rsgroup
+hbase(main):001:0> disable 'hbase:rsgroup'
+0 row(s) in 2.6270 seconds
+
+hbase(main):002:0> drop 'hbase:rsgroup'
+0 row(s) in 1.2730 seconds
+----
+- Remove znode `rsgroup` from the cluster ZooKeeper using zkCli.sh
+[source, bash]
+----
+#From ZK remove the node /hbase/rsgroup through zkCli.sh
+rmr /hbase/rsgroup
+----
+
 === ACL
 To enable ACL, add the following to your hbase-site.xml and restart your 
Master:
 
@@ -2793,3 +2883,141 @@ To enable ACL, add the following to your hbase-site.xml 
and restart your Master:
 ----
 
 
+
+[[normalizer]]
+== Region Normalizer
+
+The Region Normalizer tries to make Regions all in a table about the same in 
size.
+It does this by finding a rough average. Any region that is larger than twice 
this
+size is split. Any region that is much smaller is merged into an adjacent 
region.
+It is good to run the Normalizer on occasion on a down time after the cluster 
has
+been running a while or say after a burst of activity such as a large delete.
+
+(The bulk of the below detail was copied wholesale from the blog by Romil 
Choksi at
+link:https://community.hortonworks.com/articles/54987/hbase-region-normalizer.html[HBase
 Region Normalizer])
+
+The Region Normalizer is feature available since HBase-1.2. It runs a set of
+pre-calculated merge/split actions to resize regions that are either too
+large or too small compared to the average region size for a given table. 
Region
+Normalizer when invoked computes a normalization 'plan' for all of the tables 
in
+HBase. System tables (such as hbase:meta, hbase:namespace, Phoenix system 
tables
+etc) and user tables with normalization disabled are ignored while computing 
the
+plan. For normalization enabled tables, normalization plan is carried out in
+parallel across multiple tables.
+
+Normalizer can be enabled or disabled globally for the entire cluster using the
+ânormalizer_switchâ command in the HBase shell. Normalization can also be
+controlled on a per table basis, which is disabled by default when a table is
+created. Normalization for a table can be enabled or disabled by setting the
+NORMALIZATION_ENABLED table attribute to true or false.
+
+To check normalizer status and enable/disable normalizer
+
+[source,bash]
+----
+hbase(main):001:0> normalizer_enabled
+true
+0 row(s) in 0.4870 seconds
+
+hbase(main):002:0> normalizer_switch false
+true
+0 row(s) in 0.0640 seconds
+
+hbase(main):003:0> normalizer_enabled
+false
+0 row(s) in 0.0120 seconds
+
+hbase(main):004:0> normalizer_switch true
+false
+0 row(s) in 0.0200 seconds
+
+hbase(main):005:0> normalizer_enabled
+true
+0 row(s) in 0.0090 seconds
+----
+
+When enabled, Normalizer is invoked in the background every 5 mins (by 
default),
+which can be configured using `hbase.normalization.period` in `hbase-site.xml`.
+Normalizer can also be invoked manually/programmatically at will using HBase 
shellâs
+`normalize` command. HBase by default uses `SimpleRegionNormalizer`, but users 
can
+design their own normalizer as long as they implement the RegionNormalizer 
Interface.
+Details about the logic used by `SimpleRegionNormalizer` to compute its 
normalization
+plan can be found 
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.html[here].
+
+The below example shows a normalization plan being computed for an user table, 
and
+merge action being taken as a result of the normalization plan computed by 
SimpleRegionNormalizer.
+
+Consider an user table with some pre-split regions having 3 equally large 
regions
+(about 100K rows) and 1 relatively small region (about 25K rows). Following is 
the
+snippet from an hbase meta table scan showing each of the pre-split regions for
+the user table.
+
+----
+table_p8ddpd6q5z,,1469494305548.68b9892220865cb6048 column=info:regioninfo, 
timestamp=1469494306375, value={ENCODED => 68b9892220865cb604809c950d1adf48, 
NAME => 'table_p8ddpd6q5z,,1469494305548.68b989222 09c950d1adf48.   
0865cb604809c950d1adf48.', STARTKEY => '', ENDKEY => '1'}
+....
+table_p8ddpd6q5z,1,1469494317178.867b77333bdc75a028 column=info:regioninfo, 
timestamp=1469494317848, value={ENCODED => 867b77333bdc75a028bb4c5e4b235f48, 
NAME => 'table_p8ddpd6q5z,1,1469494317178.867b7733 bb4c5e4b235f48.  
3bdc75a028bb4c5e4b235f48.', STARTKEY => '1', ENDKEY => '3'}
+....
+table_p8ddpd6q5z,3,1469494328323.98f019a753425e7977 column=info:regioninfo, 
timestamp=1469494328486, value={ENCODED => 98f019a753425e7977ab8636e32deeeb, 
NAME => 'table_p8ddpd6q5z,3,1469494328323.98f019a7 ab8636e32deeeb.  
53425e7977ab8636e32deeeb.', STARTKEY => '3', ENDKEY => '7'}
+....
+table_p8ddpd6q5z,7,1469494339662.94c64e748979ecbb16 column=info:regioninfo, 
timestamp=1469494339859, value={ENCODED => 94c64e748979ecbb166f6cc6550e25c6, 
NAME => 'table_p8ddpd6q5z,7,1469494339662.94c64e74 6f6cc6550e25c6.   
8979ecbb166f6cc6550e25c6.', STARTKEY => '7', ENDKEY => '8'}
+....
+table_p8ddpd6q5z,8,1469494339662.6d2b3f5fd1595ab8e7 column=info:regioninfo, 
timestamp=1469494339859, value={ENCODED => 6d2b3f5fd1595ab8e7c031876057b1ee, 
NAME => 'table_p8ddpd6q5z,8,1469494339662.6d2b3f5f c031876057b1ee.   
d1595ab8e7c031876057b1ee.', STARTKEY => '8', ENDKEY => ''}
+----
+Invoking the normalizer using ânormalizeâ int the HBase shell, the below 
log snippet
+from HMaster log shows the normalization plan computed as per the logic 
defined for
+SimpleRegionNormalizer. Since the total region size (in MB) for the adjacent 
smallest
+regions in the table is less than the average region size, the normalizer 
computes a
+plan to merge these two regions.
+
+----
+2016-07-26 07:08:26,928 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] master.HMaster: Skipping 
normalization for table: hbase:namespace, as it's either system table or 
doesn't have auto
+normalization turned on
+2016-07-26 07:08:26,928 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] master.HMaster: Skipping 
normalization for table: hbase:backup, as it's either system table or doesn't 
have auto normalization turned on
+2016-07-26 07:08:26,928 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] master.HMaster: Skipping 
normalization for table: hbase:meta, as it's either system table or doesn't 
have auto normalization turned on
+2016-07-26 07:08:26,928 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] master.HMaster: Skipping 
normalization for table: table_h2osxu3wat, as it's either system table or 
doesn't have autonormalization turned on
+2016-07-26 07:08:26,928 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] 
normalizer.SimpleRegionNormalizer: Computing normalization plan for table: 
table_p8ddpd6q5z, number of regions: 5
+2016-07-26 07:08:26,929 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_p8ddpd6q5z, total aggregated 
regions size: 12
+2016-07-26 07:08:26,929 DEBUG 
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_p8ddpd6q5z, average region size: 
2.4
+2016-07-26 07:08:26,929 INFO  
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_p8ddpd6q5z, small region size: 0 
plus its neighbor size: 0, less thanthe avg size 2.4, merging them
+2016-07-26 07:08:26,971 INFO  
[B.fifo.QRpcServer.handler=20,queue=2,port=20000] 
normalizer.MergeNormalizationPlan: Executing merging normalization plan: 
MergeNormalizationPlan{firstRegion={ENCODED=> d51df2c58e9b525206b1325fd925a971, 
NAME => 'table_p8ddpd6q5z,,1469514755237.d51df2c58e9b525206b1325fd925a971.', 
STARTKEY => '', ENDKEY => '1'}, secondRegion={ENCODED => 
e69c6b25c7b9562d078d9ad3994f5330, NAME => 
'table_p8ddpd6q5z,1,1469514767669.e69c6b25c7b9562d078d9ad3994f5330.',
+STARTKEY => '1', ENDKEY => '3'}}
+----
+Region normalizer as per itâs computed plan, merged the region with start 
key as ââ
+and end key as â1â, with another region having start key as â1â and 
end key as â3â.
+Now, that these regions have been merged we see a single new region with start 
key
+as ââ and end key as â3â
+----
+table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:mergeA, 
timestamp=1469516907431,
+value=PBUF\x08\xA5\xD9\x9E\xAF\xE2*\x12\x1B\x0A\x07default\x12\x10table_p8ddpd6q5z\x1A\x00"\x011(\x000\x00
 ea74d246741ba.   8\x00
+table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:mergeB, 
timestamp=1469516907431,
+value=PBUF\x08\xB5\xBA\x9F\xAF\xE2*\x12\x1B\x0A\x07default\x12\x10table_p8ddpd6q5z\x1A\x011"\x013(\x000\x0
 ea74d246741ba.   08\x00
+table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:regioninfo, 
timestamp=1469516907431, value={ENCODED => e06c9b83c4a252b130eea74d246741ba, 
NAME => 'table_p8ddpd6q5z,,1469516907210.e06c9b83c ea74d246741ba.   
4a252b130eea74d246741ba.', STARTKEY => '', ENDKEY => '3'}
+....
+table_p8ddpd6q5z,3,1469514778736.bf024670a847c0adff column=info:regioninfo, 
timestamp=1469514779417, value={ENCODED => bf024670a847c0adffb74b2e13408b32, 
NAME => 'table_p8ddpd6q5z,3,1469514778736.bf024670 b74b2e13408b32.  
a847c0adffb74b2e13408b32.' STARTKEY => '3', ENDKEY => '7'}
+....
+table_p8ddpd6q5z,7,1469514790152.7c5a67bc755e649db2 column=info:regioninfo, 
timestamp=1469514790312, value={ENCODED => 7c5a67bc755e649db22f49af6270f1e1, 
NAME => 'table_p8ddpd6q5z,7,1469514790152.7c5a67bc 2f49af6270f1e1.  
755e649db22f49af6270f1e1.', STARTKEY => '7', ENDKEY => '8'}
+....
+table_p8ddpd6q5z,8,1469514790152.58e7503cda69f98f47 column=info:regioninfo, 
timestamp=1469514790312, value={ENCODED => 58e7503cda69f98f4755178e74288c3a, 
NAME => 'table_p8ddpd6q5z,8,1469514790152.58e7503c 55178e74288c3a.  
da69f98f4755178e74288c3a.', STARTKEY => '8', ENDKEY => ''}
+----
+
+A similar example can be seen for an user table with 3 smaller regions and 1
+relatively large region. For this example, we have an user table with 1 large 
region containing 100K rows, and 3 relatively smaller regions with about 33K 
rows each. As seen from the normalization plan, since the larger region is more 
than twice the average region size it ends being split into two regions â one 
with start key as â1â and end key as â154717â and the other region with 
start key as '154717' and end key as â3â
+----
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
master.HMaster: Skipping normalization for table: hbase:backup, as it's either 
system table or doesn't have auto normalization turned on
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Computing normalization plan for table: 
table_p8ddpd6q5z, number of regions: 4
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_p8ddpd6q5z, total aggregated 
regions size: 12
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_p8ddpd6q5z, average region size: 
3.0
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: No normalization needed, regions look good 
for table: table_p8ddpd6q5z
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Computing normalization plan for table: 
table_h2osxu3wat, number of regions: 5
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_h2osxu3wat, total aggregated 
regions size: 7
+2016-07-26 07:39:45,636 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_h2osxu3wat, average region size: 
1.4
+2016-07-26 07:39:45,636 INFO  [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SimpleRegionNormalizer: Table table_h2osxu3wat, large region 
table_h2osxu3wat,1,1469515926544.27f2fdbb2b6612ea163eb6b40753c3db. has size 4, 
more than twice avg size, splitting
+2016-07-26 07:39:45,640 INFO [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
normalizer.SplitNormalizationPlan: Executing splitting normalization plan: 
SplitNormalizationPlan{regionInfo={ENCODED => 27f2fdbb2b6612ea163eb6b40753c3db, 
NAME => 'table_h2osxu3wat,1,1469515926544.27f2fdbb2b6612ea163eb6b40753c3db.', 
STARTKEY => '1', ENDKEY => '3'}, splitPoint=null}
+2016-07-26 07:39:45,656 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
master.HMaster: Skipping normalization for table: hbase:namespace, as it's 
either system table or doesn't have auto normalization turned on
+2016-07-26 07:39:45,656 DEBUG [B.fifo.QRpcServer.handler=7,queue=1,port=20000] 
master.HMaster: Skipping normalization for table: hbase:meta, as it's either 
system table or doesn't
+have auto normalization turned on â¦..â¦..â¦.
+2016-07-26 07:39:46,246 DEBUG [AM.ZK.Worker-pool2-t278] master.RegionStates: 
Onlined 54de97dae764b864504704c1c8d3674a on 
hbase-test-rc-5.openstacklocal,16020,1469419333913 {ENCODED => 
54de97dae764b864504704c1c8d3674a, NAME => 
'table_h2osxu3wat,1,1469518785661.54de97dae764b864504704c1c8d3674a.', STARTKEY 
=> '1', ENDKEY => '154717'}
+2016-07-26 07:39:46,246 INFO  [AM.ZK.Worker-pool2-t278] master.RegionStates: 
Transition {d6b5625df331cfec84dce4f1122c567f state=SPLITTING_NEW, 
ts=1469518786246, server=hbase-test-rc-5.openstacklocal,16020,1469419333913} to 
{d6b5625df331cfec84dce4f1122c567f state=OPEN, ts=1469518786246,
+server=hbase-test-rc-5.openstacklocal,16020,1469419333913}
+2016-07-26 07:39:46,246 DEBUG [AM.ZK.Worker-pool2-t278] master.RegionStates: 
Onlined d6b5625df331cfec84dce4f1122c567f on 
hbase-test-rc-5.openstacklocal,16020,1469419333913 {ENCODED => 
d6b5625df331cfec84dce4f1122c567f, NAME => 
'table_h2osxu3wat,154717,1469518785661.d6b5625df331cfec84dce4f1122c567f.', 
STARTKEY => '154717', ENDKEY => '3'}
+----

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/performance.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/performance.adoc 
b/src/main/asciidoc/_chapters/performance.adoc
index c917646..866779c 100644
--- a/src/main/asciidoc/_chapters/performance.adoc
+++ b/src/main/asciidoc/_chapters/performance.adoc
@@ -188,11 +188,9 @@ It is useful for tuning the IO impact of prefetching 
versus the time before all
 To enable prefetching on a given column family, you can use HBase Shell or use 
the API.
 
 .Enable Prefetch Using HBase Shell
-====
 ----
 hbase> create 'MyTable', { NAME => 'myCF', PREFETCH_BLOCKS_ON_OPEN => 'true' }
 ----
-====
 
 .Enable Prefetch Using the API
 ====

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/pv2.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/pv2.adoc 
b/src/main/asciidoc/_chapters/pv2.adoc
new file mode 100644
index 0000000..5ecad3f
--- /dev/null
+++ b/src/main/asciidoc/_chapters/pv2.adoc
@@ -0,0 +1,163 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+[[pv2]]
+= Procedure Framework (Pv2): 
link:https://issues.apache.org/jira/browse/HBASE-12439[HBASE-12439]
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+
+_Procedure v2 ...aims to provide a unified way to build...multi-step 
procedures with a rollback/roll-forward ability in case of failure (e.g. 
create/delete table) -- Matteo Bertozzi, the author of Pv2._
+
+With Pv2 you can build and run state machines. It was built by Matteo to make 
distributed state transitions in HBase resilient in the face of process 
failures. Previous to Pv2, state transition handling was spread about the 
codebase with implementation varying by transition-type and context. Pv2 was 
inspired by 
link:https://accumulo.apache.org/1.8/accumulo_user_manual.html#_fault_tolerant_executor_fate[FATE],
 of Apache Accumulo. +
+
+Early Pv2 aspects have been shipping in HBase with a good while now but it has 
continued to evolve as it takes on more involved scenarios. What we have now is 
powerful but intricate in operation and incomplete, in need of cleanup and 
hardening. In this doc we have given overview on the system so you can make use 
of it (and help with its polishing).
+
+This system has the awkward name of Pv2 because HBase already had the notion 
of a Procedure used in snapshots (see hbase-server 
_org.apache.hadoop.hbase.procedure_ as opposed to hbase-procedure 
_org.apache.hadoop.hbase.procedure2_). Pv2 supercedes and is to replace 
Procedure.
+
+== Procedures
+
+A Procedure is a transform made on an HBase entity. Examples of HBase entities 
would be Regions and Tables. +
+Procedures are run by a ProcedureExecutor instance. Procedure current state is 
kept in the ProcedureStore. +
+The ProcedureExecutor has but a primitive view on what goes on inside a 
Procedure. From its PoV, Procedures are submitted and then the 
ProcedureExecutor keeps calling _#execute(Object)_ until the Procedure is done. 
Execute may be called multiple times in the case of failure or restart, so 
Procedure code must be idempotent yielding the same result each time it run. 
Procedure code can also implement _rollback_ so steps can be undone if failure. 
A call to _execute()_ can result in one of following possibilities:
+
+* _execute()_ returns
+** _null_: indicates we are done.
+** _this_: indicates there is more to do so, persist current procedure state 
and re-_execute()_.
+** _Array_ of sub-procedures: indicates a set of procedures needed to be run 
to completion before we can proceed (after which we expect the framework to 
call our execute again).
+* _execute()_ throws exception
+** _suspend_: indicates execution of procedure is suspended and can be resumed 
due to some external event. The procedure state is persisted.
+** _yield_: procedure is added back to scheduler. The procedure state is not 
persisted.
+** _interrupted_: currently same as _yield_.
+** Any _exception_ not listed above: Procedure _state_ is changed to _FAILED_ 
(after which we expect the framework will attempt rollback).
+
+The ProcedureExecutor stamps the frameworks notions of Procedure State into 
the Procedure itself; e.g. it marks Procedures as INITIALIZING on submit. It 
moves the state to RUNNABLE when it goes to execute. When done, a Procedure 
gets marked FAILED or SUCCESS depending. Here is the list of all states as of 
this writing:
+
+* *_INITIALIZING_* Procedure in construction, not yet added to the executor
+* *_RUNNABLE_* Procedure added to the executor, and ready to be executed.
+* *_WAITING_* The procedure is waiting on children (subprocedures) to be 
completed
+* *_WAITING_TIMEOUT_* The procedure is waiting a timeout or an external event
+* *_ROLLEDBACK_* The procedure failed and was rolledback.
+* *_SUCCESS_* The procedure execution completed successfully.
+* *_FAILED_* The procedure execution failed, may need to rollback.
+
+After each execute, the Procedure state is persisted to the ProcedureStore. 
Hooks are invoked on Procedures so they can preserve custom state. Post-fault, 
the ProcedureExecutor re-hydrates its pre-crash state by replaying the content 
of the ProcedureStore. This makes the Procedure Framework resilient against 
process failure.
+
+=== Implementation
+
+In implementation, Procedures tend to divide transforms into finer-grained 
tasks and while some of these work items are handed off to sub-procedures,
+the bulk are done as processing _steps_ in-Procedure; each invocation of the 
execute is used to perform a single step, and then the Procedure relinquishes 
returning to the framework. The Procedure does its own tracking of where it is 
in the processing.
+
+What comprises a sub-task, or _step_ in the execution is up to the Procedure 
author but generally it is a small piece of work that cannot be further 
decomposed and that moves the processing forward toward its end state. Having 
procedures made of many small steps rather than a few large ones allows the 
Procedure framework give out insight on where we are in the processing. It also 
allows the framework be more fair in its execution. As stated per above, each 
step may be called multiple times (failure/restart) so steps must be 
implemented idempotent. +
+It is easy to confuse the state that the Procedure itself is keeping with that 
of the Framework itself. Try to keep them distinct. +
+
+=== Rollback
+
+Rollback is called when the procedure or one of the sub-procedures has failed. 
The rollback step is supposed to cleanup the resources created during the 
execute() step. In case of failure and restart, rollback() may be called 
multiple times, so again the code must be idempotent.
+
+=== Metrics
+
+There are hooks for collecting metrics on submit of the procedure and on 
finish.
+
+* updateMetricsOnSubmit()
+* updateMetricsOnFinish()
+
+Individual procedures can override these methods to collect procedure specific 
metrics. The default implementations of these methods  try to get an object 
implementing an interface ProcedureMetrics which encapsulates following set of 
generic metrics:
+
+* SubmittedCount (Counter): Total number of procedure instances submitted of a 
type.
+* Time (Histogram): Histogram of runtime for procedure instances.
+* FailedCount (Counter): Total number of failed procedure instances.
+
+Individual procedures can implement this object and define these generic set 
of metrics.
+
+=== Baggage
+
+Procedures can carry baggage. One example is the _step_ the procedure last 
attained (see previous section); procedures persist the enum that marks where 
they are currently. Other examples might be the Region or Server name the 
Procedure is currently working. After each call to execute, the 
Procedure#serializeStateData is called. Procedures can persist whatever.
+
+=== Result/State and Queries
+
+(From Matteoâs 
https://issues.apache.org/jira/secure/attachment/12693273/Procedurev2Notification-Bus.pdf[ProcedureV2
 and Notification Bus] doc) +
+In the case of asynchronous operations, the result must be kept around until 
the client asks for it. Once we receive a âgetâ of the result we can 
schedule the delete of the record. For some operations the result may be 
âunnecessaryâ especially in case of failure (e.g. if the create table fail, 
we can query the operation result or we can just do a list table to see if it 
was created) so in some cases we can schedule the delete after a timeout. On 
the client side the operation will return a âProcedure IDâ, this ID can be 
used to wait until the procedure is completed and get the result/exception. +
+
+[source]
+----
+Admin.doOperation() { longprocId=master.doOperation(); 
master.waitCompletion(procId); }  +
+----
+
+If the master goes down while performing the operation the backup master will 
pickup the half inÂprogress operation and complete it. The client will not 
notice the failure.
+
+== Subprocedures
+
+Subprocedures are _Procedure_ instances created and returned by 
_#execute(Object)_ method of a procedure instance (parent procedure). As 
subprocedures are of type _Procedure_, they can instantiate their own 
subprocedures. As its a recursive, procedure stack is maintained by the 
framework. The framework makes sure that the parent procedure does not proceed 
till all sub-procedures and their subprocedures in a procedure stack are 
successfully finished.
+
+== ProcedureExecutor
+
+_ProcedureExecutor_ uses _ProcedureStore_ and _ProcedureScheduler_ and 
executes procedures submitted to it. Some of the basic operations supported are:
+
+* _abort(procId)_: aborts specified procedure if its not finished
+* _submit(Procedure)_: submits procedure for execution
+* _retrieve:_ list of get methods to get _Procedure_ instances and results
+* _register/ unregister_ listeners: for listening on Procedure related 
notifications
+
+When _ProcedureExecutor_ starts it loads procedure instances persisted in 
_ProcedureStore_ from previous run. All unfinished procedures are resumed from 
the last stored state.
+
+== Nonces
+
+You can pass the nonce that came in with the RPC to the Procedure on submit at 
the executor. This nonce will then be serialized along w/ the Procedure on 
persist. If a crash, on reload, the nonce will be put back into a map of nonces 
to pid in case a client tries to run same procedure for a second time (it will 
be rejected). See the base Procedure and how nonce is a base data member.
+
+== Wait/Wake/Suspend/Yield
+
+âsuspendâ means stop processing a procedure because we can make no more 
progress until a condition changes; i.e. we sent RPC and need to wait on 
response. The way this works is that a Procedure throws a suspend exception 
from down in its guts as a GOTO the end-of-the-current-processing step. Suspend 
also puts the Procedure back on the scheduler. Problematic is we do some 
accounting on our way out even on suspend making it so it can take time exiting 
(We have to update state in the WAL).
+
+RegionTransitionProcedure#reportTransition is called on receipt of a report 
from a RS. For Assign and Unassign, this event response from the server we sent 
an RPC wakes up suspended Assign/Unassigns.
+
+== Locking
+
+Procedure Locks are not about concurrency! They are about giving a Procedure 
read/write access to an HBase Entity such as a Table or Region so that is 
possible to shut out other Procedures from making modifications to an HBase 
Entity state while the current one is running.
+
+Locking is optional, up to the Procedure implementor but if an entity is being 
operated on by a Procedure, all transforms need to be done via Procedures using 
the same locking scheme else havoc.
+
+Two ProcedureExecutor Worker threads can actually end up both processing the 
same Procedure instance. If it happens, the threads are meant to be running 
different parts of the one Procedure -- changes that do not stamp on each other 
(This gets awkward around the procedure frameworks notion of âsuspendâ. 
More on this below).
+
+Locks optionally may be held for the life of a Procedure. For example, if 
moving a Region, you probably want to have exclusive access to the HBase Region 
until the Region completes (or fails).  This is used in conjunction with {@link 
#holdLock(Object)}. If {@link #holdLock(Object)} returns true, the procedure 
executor will call acquireLock() once and thereafter not call {@link 
#releaseLock(Object)} until the Procedure is done (Normally, it calls 
release/acquire around each invocation of {@link #execute(Object)}.
+
+Locks also may live the life of a procedure; i.e. once an Assign Procedure 
starts, we do not want another procedure meddling w/ the region under 
assignment. Procedures that hold the lock for the life of the procedure set 
Procedure#holdLock to true. AssignProcedure does this as do Split and Move (If 
in the middle of a Region move, you do not want it Splitting).
+
+Locking can be for life of Procedure.
+
+Some locks have a hierarchy. For example, taking a region lock also takes 
(read) lock on its containing table and namespace to prevent another Procedure 
obtaining an exclusive lock on the hosting table (or namespace).
+
+== Procedure Types
+
+=== StateMachineProcedure
+
+One can consider each call to _#execute(Object)_ method as transitioning from 
one state to another in a state machine. Abstract class _StateMachineProcedure_ 
is wrapper around base _Procedure_ class which provides constructs for 
implementing a state machine as a _Procedure_. After each state transition 
current state is persisted so that, in case of crash/ restart, the state 
transition can be resumed from the previous state of a procedure before crash/ 
restart. Individual procedures need to define initial and terminus states and 
hooks _executeFromState()_ and _setNextState()_ are provided for state 
transitions.
+
+=== RemoteProcedureDispatcher
+
+A new RemoteProcedureDispatcher (+ subclass RSProcedureDispatcher) primitive 
takes care of running the Procedure-based Assignments âremoteâ component. 
This dispatcher knows about âserversâ. It does aggregation of assignments 
by time on a time/count basis so can send procedures in batches rather than one 
per RPC. Procedure status comes back on the back of the RegionServer heartbeat 
reporting online/offline regions (No more notifications via ZK). The response 
is passed to the AMv2 to âprocessâ. It will check against the in-memory 
state. If there is a mismatch, it fences out the RegionServer on the assumption 
that something went wrong on the RS side. Timeouts trigger retries (Not Yet 
Implemented!). The Procedure machine ensures only one operation at a time on 
any one Region/Table using entity _locking_ and smarts about what is serial and 
what can be run concurrently (Locking was zk-based -- youâd put a znode in zk 
for a table -- but now has been converted to be procedure-
 based as part of this project).
+
+== References
+
+* Matteo had a slide deck on what it the Procedure Framework would look like 
and the problems it addresses initially 
link:https://issues.apache.org/jira/secure/attachment/12845124/ProcedureV2b.pdf[attached
 to the Pv2 issue.]
+* 
link:https://issues.apache.org/jira/secure/attachment/12693273/Procedurev2Notification-Bus.pdf[A
 good doc by Matteo] on problem and how Pv2 addresses it w/ roadmap (from the 
Pv2 JIRA). We should go back to the roadmap to do the Notification Bus, 
convertion of log splitting to Pv2, etc.

http://git-wip-us.apache.org/repos/asf/hbase/blob/073af9b7/src/main/asciidoc/_chapters/schema_design.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 4cd7656..b7a6936 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -504,11 +504,9 @@ Deleted cells are still subject to TTL and there will 
never be more than "maximu
 A new "raw" scan options returns all deleted rows and the delete markers.
 
 .Change the Value of `KEEP_DELETED_CELLS` Using HBase Shell
-====
 ----
 hbase> hbase> alter ât1â², NAME => âf1â², KEEP_DELETED_CELLS => true
 ----
-====
 
 .Change the Value of `KEEP_DELETED_CELLS` Using the API
 ====
@@ -1148,16 +1146,41 @@ Detect regionserver failure as fast as reasonable. Set 
the following parameters:
 - `dfs.namenode.avoid.read.stale.datanode = true`
 - `dfs.namenode.avoid.write.stale.datanode = true`
 
+[[shortcircuit.reads]]
 ===  Optimize on the Server Side for Low Latency
-
-* Skip the network for local blocks. In `hbase-site.xml`, set the following 
parameters:
+Skip the network for local blocks when the RegionServer goes to read from HDFS 
by exploiting HDFS's
+link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html[Short-Circuit
 Local Reads] facility.
+Note how setup must be done both at the datanode and on the dfsclient ends of 
the conneciton -- i.e. at the RegionServer
+and how both ends need to have loaded the hadoop native `.so` library.
+After configuring your hadoop setting _dfs.client.read.shortcircuit_ to _true_ 
and configuring
+the _dfs.domain.socket.path_ path for the datanode and dfsclient to share and 
restarting, next configure
+the regionserver/dfsclient side.
+
+* In `hbase-site.xml`, set the following parameters:
 - `dfs.client.read.shortcircuit = true`
-- `dfs.client.read.shortcircuit.buffer.size = 131072` (Important to avoid OOME)
+- `dfs.client.read.shortcircuit.skip.checksum = true` so we don't double 
checksum (HBase does its own checksumming to save on i/os. See 
<<hbase.regionserver.checksum.verify.performance>> for more on this.
+- `dfs.domain.socket.path` to match what was set for the datanodes.
+- `dfs.client.read.shortcircuit.buffer.size = 131072` Important to avoid OOME 
-- hbase has a default it uses if unset, see 
`hbase.dfs.client.read.shortcircuit.buffer.size`; its default is 131072.
 * Ensure data locality. In `hbase-site.xml`, set 
`hbase.hstore.min.locality.to.skip.major.compact = 0.7` (Meaning that 0.7 \<= n 
\<= 1)
 * Make sure DataNodes have enough handlers for block transfers. In 
`hdfs-site.xml`, set the following parameters:
 - `dfs.datanode.max.xcievers >= 8192`
 - `dfs.datanode.handler.count =` number of spindles
 
+Check the RegionServer logs after restart. You should only see complaint if 
misconfiguration.
+Otherwise, shortcircuit read operates quietly in background. It does not 
provide metrics so
+no optics on how effective it is but read latencies should show a marked 
improvement, especially if
+good data locality, lots of random reads, and dataset is larger than available 
cache.
+
+Other advanced configurations that you might play with, especially if 
shortcircuit functionality
+is complaining in the logs,  include 
`dfs.client.read.shortcircuit.streams.cache.size` and
+`dfs.client.socketcache.capacity`. Documentation is sparse on these options. 
You'll have to
+read source code.
+
+For more on short-circuit reads, see Colin's old blog on rollout,
+link:http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/[How
 Improved Short-Circuit Local Reads Bring Better Performance and Security to 
Hadoop].
+The link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347] issue also 
makes for an
+interesting read showing the HDFS community at its best (caveat a few 
comments).
+
 ===  JVM Tuning
 
 ====  Tune JVM GC for low collection latencies

[09/11] hbase git commit: HBASE-20831 Copy master doc into branch-2.1 and edit to make it suit 2.1.0

Reply via email to