hbase git commit: HBASE-18335 configuration guide fixes

chia7712 Tue, 19 Dec 2017 07:36:04 -0800

Repository: hbase
Updated Branches:
  refs/heads/branch-2 ada514604 -> d53430b5f



HBASE-18335 configuration guide fixes

Signed-off-by: Chia-Ping Tsai <chia7...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/d53430b5
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/d53430b5
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/d53430b5

Branch: refs/heads/branch-2
Commit: d53430b5f7bfcc6278c79864b47e8cfd4b6e96c6
Parents: ada5146
Author: Artem Ervits <generi...@gmail.com>
Authored: Tue Dec 19 23:33:56 2017 +0800
Committer: Chia-Ping Tsai <chia7...@gmail.com>
Committed: Tue Dec 19 23:35:23 2017 +0800

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/configuration.adoc | 61 ++++++++++-----------
 src/main/asciidoc/_chapters/hbase-default.adoc | 40 +++++++-------
 2 files changed, 50 insertions(+), 51 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/d53430b5/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc 
b/src/main/asciidoc/_chapters/configuration.adoc
index 9b616c5..d2b17a8 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -79,11 +79,10 @@ To check for well-formedness and only print output if 
errors exist, use the comm
 .Keep Configuration In Sync Across the Cluster
 [WARNING]
 ====
-When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the content of the _conf/_ directory to all 
nodes of the cluster.
+When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the contents of the _conf/_ directory to all 
nodes of the cluster.
 HBase will not do this for you.
 Use `rsync`, `scp`, or another secure mechanism for copying the configuration 
files to your nodes.
-For most configuration, a restart is needed for servers to pick up changes An 
exception is dynamic configuration.
-to be described later below.
+For most configurations, a restart is needed for servers to pick up changes. 
Dynamic configuration is an exception to this, to be described later below.
 ====
 
 [[basic.prerequisites]]
@@ -131,11 +130,11 @@ DNS::
   HBase uses the local hostname to self-report its IP address. Both forward 
and reverse DNS resolving must work in versions of HBase previous to 0.92.0. 
The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool 
can be used to verify DNS is working correctly on the cluster. The project 
`README` file provides detailed instructions on usage.
 
 Loopback IP::
-  Prior to hbase-0.96.0, HBase only used the IP address `127.0.0.1` to refer 
to `localhost`, and this could not be configured.
+  Prior to hbase-0.96.0, HBase only used the IP address `127.0.0.1` to refer 
to `localhost`, and this was not configurable.
   See <<loopback.ip,Loopback IP>> for more details.
 
 NTP::
-  The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism, on your cluster, and that all nodes look to the same service for 
time synchronization. See the 
link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP 
Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up 
NTP.
+  The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism on your cluster and that all nodes look to the same service for time 
synchronization. See the 
link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP 
Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up 
NTP.
 
 [[ulimit]]
 Limits on Number of Files and Processes (ulimit)::
@@ -176,8 +175,8 @@ Linux Shell::
   All of the shell scripts that come with HBase rely on the 
link:http://www.gnu.org/software/bash[GNU Bash] shell.
 
 Windows::
-  Prior to HBase 0.96, testing for running HBase on Microsoft Windows was 
limited.
-  Running a on Windows nodes is not recommended for production systems.
+  Prior to HBase 0.96, running HBase on Microsoft Windows was limited only for 
testing purposes.
+  Running production systems on Windows machines is not recommended.
 
 
 [[hadoop]]
@@ -262,8 +261,8 @@ Because HBase depends on Hadoop, it bundles an instance of 
the Hadoop jar under
 The bundled jar is ONLY for use in standalone mode.
 In distributed mode, it is _critical_ that the version of Hadoop that is out 
on your cluster match what is under HBase.
 Replace the hadoop jar found in the HBase lib directory with the hadoop jar 
you are running on your cluster to avoid version mismatch issues.
-Make sure you replace the jar in HBase everywhere on your cluster.
-Hadoop version mismatch issues have various manifestations but often all looks 
like its hung up.
+Make sure you replace the jar in HBase across your whole cluster.
+Hadoop version mismatch issues have various manifestations but often all look 
like its hung.
 ====
 
 [[dfs.datanode.max.transfer.threads]]
@@ -333,7 +332,7 @@ data must persist across node comings and goings. Writing to
 HDFS where data is replicated ensures the latter.
 
 To configure this standalone variant, edit your _hbase-site.xml_
-setting the _hbase.rootdir_ to point at a directory in your
+setting _hbase.rootdir_  to point at a directory in your
 HDFS instance but then set _hbase.cluster.distributed_
 to _false_. For example:
 
@@ -373,18 +372,18 @@ Some of the information that was originally in this 
section has been moved there
 ====
 
 A pseudo-distributed mode is simply a fully-distributed mode run on a single 
host.
-Use this configuration testing and prototyping on HBase.
-Do not use this configuration for production nor for evaluating HBase 
performance.
+Use this HBase configuration for testing and prototyping purposes only.
+Do not use this configuration for production or for performance evaluation.
 
 [[fully_dist]]
 === Fully-distributed
 
 By default, HBase runs in standalone mode.
 Both standalone mode and pseudo-distributed mode are provided for the purposes 
of small-scale testing.
-For a production environment, distributed mode is appropriate.
+For a production environment, distributed mode is advised.
 In distributed mode, multiple instances of HBase daemons run on multiple 
servers in the cluster.
 
-Just as in pseudo-distributed mode, a fully distributed configuration requires 
that you set the `hbase-cluster.distributed` property to `true`.
+Just as in pseudo-distributed mode, a fully distributed configuration requires 
that you set the `hbase.cluster.distributed` property to `true`.
 Typically, the `hbase.rootdir` is configured to point to a highly-available 
HDFS filesystem.
 
 In addition, the cluster is configured so that multiple cluster nodes enlist 
as RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers.
@@ -509,7 +508,7 @@ Just as in Hadoop where you add site-specific HDFS 
configuration to the _hdfs-si
 For the list of configurable properties, see 
<<hbase_default_configurations,hbase default configurations>> below or view the 
raw _hbase-default.xml_ source file in the HBase source code at 
_src/main/resources_.
 
 Not all configuration options make it out to _hbase-default.xml_.
-Configuration that it is thought rare anyone would change can exist only in 
code; the only way to turn up such configurations is via a reading of the 
source code itself.
+Some configurations would only appear in source code; the only way to identify 
these changes are through code review.
 
 Currently, changes here will require a cluster restart for HBase to notice the 
change.
 // hbase/src/main/asciidoc
@@ -544,7 +543,7 @@ If you are running HBase in standalone mode, you don't need 
to configure anythin
 Since the HBase Master may move around, clients bootstrap by looking to 
ZooKeeper for current critical locations.
 ZooKeeper is where all these values are kept.
 Thus clients require the location of the ZooKeeper ensemble before they can do 
anything else.
-Usually this the ensemble location is kept out in the _hbase-site.xml_ and is 
picked up by the client from the `CLASSPATH`.
+Usually this ensemble location is kept out in the _hbase-site.xml_ and is 
picked up by the client from the `CLASSPATH`.
 
 If you are configuring an IDE to run an HBase client, you should include the 
_conf/_ directory on your classpath so _hbase-site.xml_ settings can be found 
(or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
 
@@ -559,7 +558,7 @@ Minimally, an HBase client needs hbase-client module in its 
dependencies when co
 </dependency>
 ----
 
-An example basic _hbase-site.xml_ for client only might look as follows:
+A basic example _hbase-site.xml_ for client only may look as follows:
 [source,xml]
 ----
 <?xml version="1.0"?>
@@ -595,7 +594,7 @@ If multiple ZooKeeper instances make up your ZooKeeper 
ensemble, they may be spe
 
 === Basic Distributed HBase Install
 
-Here is an example basic configuration for a distributed ten node cluster:
+Here is a basic configuration example for a distributed ten node cluster:
 * The nodes are named `example0`, `example1`, etc., through node `example9` in 
this example.
 * The HBase Master and the HDFS NameNode are running on the node `example0`.
 * RegionServers run on nodes `example1`-`example9`.
@@ -706,10 +705,10 @@ See 
link:https://issues.apache.org/jira/browse/HBASE-6389[HBASE-6389 Modify the
 ===== `zookeeper.session.timeout`
 
 The default timeout is three minutes (specified in milliseconds). This means 
that if a server crashes, it will be three minutes before the Master notices 
the crash and starts recovery.
-You might like to tune the timeout down to a minute or even less so the Master 
notices failures the sooner.
-Before changing this value, be sure you have your JVM garbage collection 
configuration under control otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer (You might 
be fine with this -- you probably want recovery to start on the server if a 
RegionServer has been in GC for a long period of time).
+You might need to tune the timeout down to a minute or even less so the Master 
notices failures sooner.
+Before changing this value, be sure you have your JVM garbage collection 
configuration under control, otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer. (You 
might be fine with this -- you probably want recovery to start on the server if 
a RegionServer has been in GC for a long period of time).
 
-To change this configuration, edit _hbase-site.xml_, copy the changed file 
around the cluster and restart.
+To change this configuration, edit _hbase-site.xml_, copy the changed file 
across the cluster and restart.
 
 We set this value high to save our having to field questions up on the mailing 
lists asking why a RegionServer went down during a massive import.
 The usual cause is that their JVM is untuned and they are running into long GC 
pauses.
@@ -725,14 +724,14 @@ See <<zookeeper,zookeeper>>.
 ==== HDFS Configurations
 
 [[dfs.datanode.failed.volumes.tolerated]]
-===== dfs.datanode.failed.volumes.tolerated
+===== `dfs.datanode.failed.volumes.tolerated`
 
 This is the "...number of volumes that are allowed to fail before a DataNode 
stops offering service.
 By default any volume failure will cause a datanode to shutdown" from the 
_hdfs-default.xml_ description.
 You might want to set this to about half the amount of your available disks.
 
-[[hbase.regionserver.handler.count_description]]
-==== `hbase.regionserver.handler.count`
+[[hbase.regionserver.handler.count]]
+===== `hbase.regionserver.handler.count`
 
 This setting defines the number of threads that are kept open to answer 
incoming requests to user tables.
 The rule of thumb is to keep this number low when the payload per request 
approaches the MB (big puts, scans using a large cache) and high when the 
payload is small (gets, small puts, ICVs, deletes). The total size of the 
queries in progress is limited by the setting 
`hbase.ipc.server.max.callqueue.size`.
@@ -748,7 +747,7 @@ You can get a sense of whether you have too little or too 
many handlers by <<rpc
 ==== Configuration for large memory machines
 
 HBase ships with a reasonable, conservative configuration that will work on 
nearly all machine types that people might want to test with.
-If you have larger machines -- HBase has 8G and larger heap -- you might the 
following configuration options helpful.
+If you have larger machines -- HBase has 8G and larger heap -- you might find 
the following configuration options helpful.
 TODO.
 
 [[config.compression]]
@@ -773,10 +772,10 @@ However, as all memstores are not expected to be full all 
the time, less WAL fil
 [[disable.splitting]]
 ==== Managed Splitting
 
-HBase generally handles splitting your regions, based upon the settings in 
your _hbase-default.xml_ and _hbase-site.xml_          configuration files.
+HBase generally handles splitting of your regions based upon the settings in 
your _hbase-default.xml_ and _hbase-site.xml_          configuration files.
 Important settings include `hbase.regionserver.region.split.policy`, 
`hbase.hregion.max.filesize`, `hbase.regionserver.regionSplitLimit`.
 A simplistic view of splitting is that when a region grows to 
`hbase.hregion.max.filesize`, it is split.
-For most use patterns, most of the time, you should use automatic splitting.
+For most usage patterns, you should use automatic splitting.
 See <<manual_region_splitting_decisions,manual region splitting decisions>> 
for more information about manual region splitting.
 
 Instead of allowing HBase to split your regions automatically, you can choose 
to manage the splitting yourself.
@@ -802,8 +801,8 @@ It is better to err on the side of too few regions and 
perform rolling splits la
 The optimal number of regions depends upon the largest StoreFile in your 
region.
 The size of the largest StoreFile will increase with time if the amount of 
data grows.
 The goal is for the largest region to be just large enough that the compaction 
selection algorithm only compacts it during a timed major compaction.
-Otherwise, the cluster can be prone to compaction storms where a large number 
of regions under compaction at the same time.
-It is important to understand that the data growth causes compaction storms, 
and not the manual split decision.
+Otherwise, the cluster can be prone to compaction storms with a large number 
of regions under compaction at the same time.
+It is important to understand that the data growth causes compaction storms 
and not the manual split decision.
 
 If the regions are split into too many large regions, you can increase the 
major compaction interval by configuring `HConstants.MAJOR_COMPACTION_PERIOD`.
 HBase 0.90 introduced `org.apache.hadoop.hbase.util.RegionSplitter`, which 
provides a network-IO-safe rolling split of all regions.
@@ -863,9 +862,9 @@ You might also see the graphs on the tail of 
link:https://issues.apache.org/jira
 This section is about configurations that will make servers come back faster 
after a fail.
 See the Deveraj Das and Nicolas Liochon blog post 
link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction
 to HBase Mean Time to Recover (MTTR)] for a brief introduction.
 
-The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 
forces Namenode into loop with lease recovery requests] is messy but has a 
bunch of good discussion toward the end on low timeouts and how to effect 
faster recovery including citation of fixes added to HDFS. Read the Varun 
Sharma comments.
+The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 
forces Namenode into loop with lease recovery requests] is messy but has a 
bunch of good discussion toward the end on low timeouts and how to cause faster 
recovery including citation of fixes added to HDFS. Read the Varun Sharma 
comments.
 The below suggested configurations are Varun's suggestions distilled and 
tested.
-Make sure you are running on a late-version HDFS so you have the fixes he 
refers too and himself adds to HDFS that help HBase MTTR (e.g.
+Make sure you are running on a late-version HDFS so you have the fixes he 
refers to and himself adds to HDFS that help HBase MTTR (e.g.
 HDFS-3703, HDFS-3712, and HDFS-4791 -- Hadoop 2 for sure has them and late 
Hadoop 1 has some). Set the following in the RegionServer.
 
 [source,xml]

http://git-wip-us.apache.org/repos/asf/hbase/blob/d53430b5/src/main/asciidoc/_chapters/hbase-default.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc 
b/src/main/asciidoc/_chapters/hbase-default.adoc
index d66cc0d..9b3cfb7 100644
--- a/src/main/asciidoc/_chapters/hbase-default.adoc
+++ b/src/main/asciidoc/_chapters/hbase-default.adoc
@@ -57,7 +57,7 @@ The directory shared by region servers and into
     HDFS directory '/hbase' where the HDFS instance's namenode is
     running at namenode.example.org on port 9000, set this value to:
     hdfs://namenode.example.org:9000/hbase.  By default, we write
-    to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
+    to whatever ${hbase.tmp.dir} is set to -- usually /tmp --
     so change this configuration or else all data will be lost on
     machine restart.
 +
@@ -72,7 +72,7 @@ The directory shared by region servers and into
 The mode the cluster will be in. Possible values are
       false for standalone mode and true for distributed mode.  If
       false, startup will run all HBase and ZooKeeper daemons together
-      in the one JVM.
+      in one JVM.
 +
 .Default
 `false`
@@ -87,11 +87,11 @@ Comma separated list of servers in the ZooKeeper ensemble
     For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
     By default this is set to localhost for local and pseudo-distributed modes
     of operation. For a fully-distributed setup, this should be set to a full
-    list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in 
hbase-env.sh
+    list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in 
hbase-env.sh,
     this is the list of servers which hbase will start/stop ZooKeeper on as
     part of cluster start/stop.  Client-side, we will take this list of
     ensemble members and put it together with the hbase.zookeeper.clientPort
-    config. and pass it into zookeeper constructor as the connectString
+    config and pass it into zookeeper constructor as the connectString
     parameter.
 +
 .Default
@@ -259,7 +259,7 @@ Factor to determine the number of call queues.
 Split the call queues into read and write queues.
       The specified interval (which should be between 0.0 and 1.0)
       will be multiplied by the number of call queues.
-      A value of 0 indicate to not split the call queues, meaning that both 
read and write
+      A value of 0 indicates to not split the call queues, meaning that both 
read and write
       requests will be pushed to the same set of queues.
       A value lower than 0.5 means that there will be less read queues than 
write queues.
       A value of 0.5 means there will be the same number of read and write 
queues.
@@ -292,7 +292,7 @@ Given the number of read call queues, calculated from the 
total number
       A value lower than 0.5 means that there will be less long-read queues 
than short-read queues.
       A value of 0.5 means that there will be the same number of short-read 
and long-read queues.
       A value greater than 0.5 means that there will be more long-read queues 
than short-read queues
-      A value of 0 or 1 indicate to use the same set of queues for gets and 
scans.
+      A value of 0 or 1 indicates to use the same set of queues for gets and 
scans.
 
       Example: Given the total number of read call queues being 8
       a scan.ratio of 0 or 1 means that: 8 queues will contain both long and 
short read requests.
@@ -395,7 +395,7 @@ Maximum size of all memstores in a region server before new
 .Description
 Maximum size of all memstores in a region server before flushes are forced.
       Defaults to 95% of hbase.regionserver.global.memstore.size.
-      A 100% value for this value causes the minimum possible flushing to 
occur when updates are
+      A 100% value for this property causes the minimum possible flushing to 
occur when updates are
       blocked due to memstore limiting.
 +
 .Default
@@ -688,7 +688,7 @@ The maximum number of concurrent tasks a single HTable 
instance will
 The maximum number of concurrent connections the client will
     maintain to a single Region. That is, if there is already
     hbase.client.max.perregion.tasks writes in progress for this region, new 
puts
-    won't be sent to this region until some writes finishes.
+    won't be sent to this region until some writes finish.
 +
 .Default
 `1`
@@ -748,8 +748,8 @@ Client scanner lease period in milliseconds.
 *`hbase.bulkload.retries.number`*::
 +
 .Description
-Maximum retries.  This is maximum number of iterations
-    to atomic bulk loads are attempted in the face of splitting operations
+Maximum retries. This is a maximum number of iterations
+    atomic bulk loads are attempted in the face of splitting operations,
     0 means never give up.
 +
 .Default
@@ -1305,10 +1305,10 @@ This is for the RPC layer to define how long HBase 
client applications
 *`hbase.rpc.shortoperation.timeout`*::
 +
 .Description
-This is another version of "hbase.rpc.timeout". For those RPC operation
+This is another version of "hbase.rpc.timeout". For those RPC operations
         within cluster, we rely on this configuration to set a short timeout 
limitation
-        for short operation. For example, short rpc timeout for region 
server's trying
-        to report to active master can benefit quicker master failover process.
+        for short operations. For example, short rpc timeout for region server 
trying
+        to report to active master can benefit from quicker master failover 
process.
 +
 .Default
 `10000`
@@ -1749,10 +1749,10 @@ How long we wait on dfs lease recovery in total before 
giving up.
 *`hbase.lease.recovery.dfs.timeout`*::
 +
 .Description
-How long between dfs recover lease invocations. Should be larger than the sum 
of
+How long between dfs recovery lease invocations. Should be larger than the sum 
of
         the time it takes for the namenode to issue a block recovery command 
as part of
-        datanode; dfs.heartbeat.interval and the time it takes for the primary
-        datanode, performing block recovery to timeout on a dead datanode; 
usually
+        datanode dfs.heartbeat.interval and the time it takes for the primary
+        datanode performing block recovery to timeout on a dead datanode, 
usually
         dfs.client.socket-timeout. See the end of HBASE-8389 for more.
 +
 .Default
@@ -2052,7 +2052,7 @@ A comma-separated list of
       be initialized. Then, the Filter will be applied to all user facing jsp
       and servlet web pages.
       The ordering of the list defines the ordering of the filters.
-      The default StaticUserWebFilter add a user principal as defined by the
+      The default StaticUserWebFilter adds a user principal as defined by the
       hbase.http.staticuser.user property.
 
 +
@@ -2107,8 +2107,8 @@ A comma-separated list of
 +
 .Description
 
-      The user name to filter as, on static web filters
-      while rendering content. An example use is the HDFS
+      The user name to filter as on static web filters
+      while rendering content. For example, the HDFS
       web UI (user to be used for browsing files).
 
 +
@@ -2123,7 +2123,7 @@ A comma-separated list of
 The percent of region server RPC threads failed to abort RS.
     -1 Disable aborting; 0 Abort if even a single handler has died;
     0.x Abort only when this percent of handlers have died;
-    1 Abort only all of the handers have died.
+    1 Abort only all of the handlers have died.
 +
 .Default
 `0.5`

hbase git commit: HBASE-18335 configuration guide fixes

Reply via email to