http://git-wip-us.apache.org/repos/asf/hbase/blob/c0bcf7cc/src/main/asciidoc/compression.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/compression.adoc 
b/src/main/asciidoc/compression.adoc
deleted file mode 100644
index 7909e17..0000000
--- a/src/main/asciidoc/compression.adoc
+++ /dev/null
@@ -1,461 +0,0 @@
-////
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-////
-
-[appendix]
-[[compression]]
-== Compression and Data Block Encoding In HBase(((Compression,Data 
BlockEncoding)))
-:doctype: book
-:numbered:
-:toc: left
-:icons: font
-:experimental:
-:docinfo1:
-
-NOTE: Codecs mentioned in this section are for encoding and decoding data 
blocks or row keys.
-For information about replication codecs, see 
<<cluster.replication.preserving.tags,cluster.replication.preserving.tags>>.
-
-Some of the information in this section is pulled from a 
link:http://search-hadoop.com/m/lL12B1PFVhp1/v=threaded[discussion] on the 
HBase Development mailing list.
-
-HBase supports several different compression algorithms which can be enabled 
on a ColumnFamily.
-Data block encoding attempts to limit duplication of information in keys, 
taking advantage of some of the fundamental designs and patterns of HBase, such 
as sorted row keys and the schema of a given table.
-Compressors reduce the size of large, opaque byte arrays in cells, and can 
significantly reduce the storage space needed to store uncompressed data.
-
-Compressors and data block encoding can be used together on the same 
ColumnFamily.
-
-.Changes Take Effect Upon Compaction
-If you change compression or encoding for a ColumnFamily, the changes take 
effect during compaction.
-
-Some codecs take advantage of capabilities built into Java, such as GZip 
compression. Others rely on native libraries. Native libraries may be available 
as part of Hadoop, such as LZ4. In this case, HBase only needs access to the 
appropriate shared library.
-
-Other codecs, such as Google Snappy, need to be installed first.
-Some codecs are licensed in ways that conflict with HBase's license and cannot 
be shipped as part of HBase.
-
-This section discusses common codecs that are used and tested with HBase.
-No matter what codec you use, be sure to test that it is installed correctly 
and is available on all nodes in your cluster.
-Extra operational steps may be necessary to be sure that codecs are available 
on newly-deployed nodes.
-You can use the <<compression.test,compression.test>> utility to check that a 
given codec is correctly installed.
-
-To configure HBase to use a compressor, see 
<<compressor.install,compressor.install>>.
-To enable a compressor for a ColumnFamily, see 
<<changing.compression,changing.compression>>.
-To enable data block encoding for a ColumnFamily, see 
<<data.block.encoding.enable,data.block.encoding.enable>>.
-
-.Block Compressors
-* none
-* Snappy
-* LZO
-* LZ4
-* GZ
-
-.Data Block Encoding Types
-Prefix::
-  Often, keys are very similar. Specifically, keys often share a common prefix 
and only differ near the end. For instance, one key might be 
[literal]+RowKey:Family:Qualifier0+ and the next key might be 
[literal]+RowKey:Family:Qualifier1+.
-  +
-In Prefix encoding, an extra column is added which holds the length of the 
prefix shared between the current key and the previous key.
-Assuming the first key here is totally different from the key before, its 
prefix length is 0.
-+
-The second key's prefix length is [literal]+23+, since they have the first 23 
characters in common.
-+
-Obviously if the keys tend to have nothing in common, Prefix will not provide 
much benefit.
-+
-The following image shows a hypothetical ColumnFamily with no data block 
encoding.
-+
-.ColumnFamily with No Encoding
-image::data_block_no_encoding.png[]
-+
-Here is the same data with prefix data encoding.
-+
-.ColumnFamily with Prefix Encoding
-image::data_block_prefix_encoding.png[]
-
-Diff::
-  Diff encoding expands upon Prefix encoding.
-  Instead of considering the key sequentially as a monolithic series of bytes, 
each key field is split so that each part of the key can be compressed more 
efficiently.
-+
-Two new fields are added: timestamp and type.
-+
-If the ColumnFamily is the same as the previous row, it is omitted from the 
current row.
-+
-If the key length, value length or type are the same as the previous row, the 
field is omitted.
-+
-In addition, for increased compression, the timestamp is stored as a Diff from 
the previous row's timestamp, rather than being stored in full.
-Given the two row keys in the Prefix example, and given an exact match on 
timestamp and the same type, neither the value length, or type needs to be 
stored for the second row, and the timestamp value for the second row is just 
0, rather than a full timestamp.
-+
-Diff encoding is disabled by default because writing and scanning are slower 
but more data is cached.
-+
-This image shows the same ColumnFamily from the previous images, with Diff 
encoding.
-+
-.ColumnFamily with Diff Encoding
-image::data_block_diff_encoding.png[]
-
-Fast Diff::
-  Fast Diff works similar to Diff, but uses a faster implementation. It also 
adds another field which stores a single bit to track whether the data itself 
is the same as the previous row. If it is, the data is not stored again.
-+
-Fast Diff is the recommended codec to use if you have long keys or many 
columns.
-+
-The data format is nearly identical to Diff encoding, so there is not an image 
to illustrate it.
-
-
-Prefix Tree::
-  Prefix tree encoding was introduced as an experimental feature in HBase 0.96.
-  It provides similar memory savings to the Prefix, Diff, and Fast Diff 
encoder, but provides faster random access at a cost of slower encoding speed.
-+
-Prefix Tree may be appropriate for applications that have high block cache hit 
ratios. It introduces new 'tree' fields for the row and column.
-The row tree field contains a list of offsets/references corresponding to the 
cells in that row. This allows for a good deal of compression.
-For more details about Prefix Tree encoding, see 
link:https://issues.apache.org/jira/browse/HBASE-4676[HBASE-4676].
-+
-It is difficult to graphically illustrate a prefix tree, so no image is 
included. See the Wikipedia article for 
link:http://en.wikipedia.org/wiki/Trie[Trie] for more general information about 
this data structure.
-
-=== Which Compressor or Data Block Encoder To Use
-
-The compression or codec type to use depends on the characteristics of your 
data. Choosing the wrong type could cause your data to take more space rather 
than less, and can have performance implications.
-
-In general, you need to weigh your options between smaller size and faster 
compression/decompression. Following are some general guidelines, expanded from 
a discussion at link:http://search-hadoop.com/m/lL12B1PFVhp1[Documenting 
Guidance on compression and codecs].
-
-* If you have long keys (compared to the values) or many columns, use a prefix 
encoder.
-  FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding.
-* If the values are large (and not precompressed, such as images), use a data 
block compressor.
-* Use GZIP for [firstterm]_cold data_, which is accessed infrequently.
-  GZIP compression uses more CPU resources than Snappy or LZO, but provides a 
higher compression ratio.
-* Use Snappy or LZO for [firstterm]_hot data_, which is accessed frequently.
-  Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high 
of a compression ratio.
-* In most cases, enabling Snappy or LZO by default is a good choice, because 
they have a low performance overhead and provide space savings.
-* Before Snappy became available by Google in 2011, LZO was the default.
-  Snappy has similar qualities as LZO but has been shown to perform better.
-
-[[hadoop.native.lib]]
-=== Making use of Hadoop Native Libraries in HBase
-
-The Hadoop shared library has a bunch of facility including compression 
libraries and fast crc'ing. To make this facility available to HBase, do the 
following. HBase/Hadoop will fall back to use alternatives if it cannot find 
the native library versions -- or fail outright if you asking for an explicit 
compressor and there is no alternative available.
-
-If you see the following in your HBase logs, you know that HBase was unable to 
locate the Hadoop native libraries: 
-[source]
-----
-2014-08-07 09:26:20,139 WARN  [main] util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
-----      
-If the libraries loaded successfully, the WARN message does not show. 
-
-Lets presume your Hadoop shipped with a native library that suits the platform 
you are running HBase on.
-To check if the Hadoop native library is available to HBase, run the following 
tool (available in  Hadoop 2.1 and greater): 
-[source]
-----
-$ ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker
-2014-08-26 13:15:38,717 WARN  [main] util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
-Native library checking:
-hadoop: false
-zlib:   false
-snappy: false
-lz4:    false
-bzip2:  false
-2014-08-26 13:15:38,863 INFO  [main] util.ExitUtil: Exiting with status 1
-----
-Above shows that the native hadoop library is not available in HBase context. 
-
-To fix the above, either copy the Hadoop native libraries local or symlink to 
them if the Hadoop and HBase stalls are adjacent in the filesystem.
-You could also point at their location by setting the [var]+LD_LIBRARY_PATH+ 
environment variable.
-
-Where the JVM looks to find native librarys is "system dependent" (See 
[class]+java.lang.System#loadLibrary(name)+). On linux, by default, is going to 
look in [path]_lib/native/PLATFORM_ where [var]+PLATFORM+      is the label for 
the platform your HBase is installed on.
-On a local linux machine, it seems to be the concatenation of the java 
properties [var]+os.name+ and [var]+os.arch+ followed by whether 32 or 64 bit.
-HBase on startup prints out all of the java system properties so find the 
os.name and os.arch in the log.
-For example: 
-[source]
-----
-...
-2014-08-06 15:27:22,853 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.name=Linux
-2014-08-06 15:27:22,853 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.arch=amd64
-...
-----     
-So in this case, the PLATFORM string is [var]+Linux-amd64-64+.
-Copying the Hadoop native libraries or symlinking at 
[path]_lib/native/Linux-amd64-64_     will ensure they are found.
-Check with the Hadoop [path]_NativeLibraryChecker_.
- 
-
-Here is example of how to point at the Hadoop libs with [var]+LD_LIBRARY_PATH+ 
     environment variable: 
-[source]
-----
-$ LD_LIBRARY_PATH=~/hadoop-2.5.0-SNAPSHOT/lib/native ./bin/hbase --config 
~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker
-2014-08-26 13:42:49,332 INFO  [main] bzip2.Bzip2Factory: Successfully loaded & 
initialized native-bzip2 library system-native
-2014-08-26 13:42:49,337 INFO  [main] zlib.ZlibFactory: Successfully loaded & 
initialized native-zlib library
-Native library checking:
-hadoop: true /home/stack/hadoop-2.5.0-SNAPSHOT/lib/native/libhadoop.so.1.0.0
-zlib:   true /lib64/libz.so.1
-snappy: true /usr/lib64/libsnappy.so.1
-lz4:    true revision:99
-bzip2:  true /lib64/libbz2.so.1
-----
-Set in [path]_hbase-env.sh_ the LD_LIBRARY_PATH environment variable when 
starting your HBase. 
-
-=== Compressor Configuration, Installation, and Use
-
-[[compressor.install]]
-==== Configure HBase For Compressors
-
-Before HBase can use a given compressor, its libraries need to be available.
-Due to licensing issues, only GZ compression is available to HBase (via native 
Java libraries) in a default installation.
-Other compression libraries are available via the shared library bundled with 
your hadoop.
-The hadoop native library needs to be findable when HBase starts.
-See 
-
-.Compressor Support On the Master
-
-A new configuration setting was introduced in HBase 0.95, to check the Master 
to determine which data block encoders are installed and configured on it, and 
assume that the entire cluster is configured the same.
-This option, [code]+hbase.master.check.compression+, defaults to 
[literal]+true+.
-This prevents the situation described in 
link:https://issues.apache.org/jira/browse/HBASE-6370[HBASE-6370], where a 
table is created or modified to support a codec that a region server does not 
support, leading to failures that take a long time to occur and are difficult 
to debug. 
-
-If [code]+hbase.master.check.compression+ is enabled, libraries for all 
desired compressors need to be installed and configured on the Master, even if 
the Master does not run a region server.
-
-.Install GZ Support Via Native Libraries
-
-HBase uses Java's built-in GZip support unless the native Hadoop libraries are 
available on the CLASSPATH.
-The recommended way to add libraries to the CLASSPATH is to set the 
environment variable [var]+HBASE_LIBRARY_PATH+ for the user running HBase.
-If native libraries are not available and Java's GZIP is used, [literal]+Got
-              brand-new compressor+ reports will be present in the logs.
-See <<brand.new.compressor,brand.new.compressor>>).
-
-[[lzo.compression]]
-.Install LZO Support
-
-HBase cannot ship with LZO because of incompatibility between HBase, which 
uses an Apache Software License (ASL) and LZO, which uses a GPL license.
-See the link:http://wiki.apache.org/hadoop/UsingLzoCompression[Using LZO
-              Compression] wiki page for information on configuring LZO 
support for HBase. 
-
-If you depend upon LZO compression, consider configuring your RegionServers to 
fail to start if LZO is not available.
-See <<hbase.regionserver.codecs,hbase.regionserver.codecs>>.
-
-[[lz4.compression]]
-.Configure LZ4 Support
-
-LZ4 support is bundled with Hadoop.
-Make sure the hadoop shared library (libhadoop.so) is accessible when you 
start HBase.
-After configuring your platform (see 
<<hbase.native.platform,hbase.native.platform>>), you can make a symbolic link 
from HBase to the native Hadoop libraries.
-This assumes the two software installs are colocated.
-For example, if my 'platform' is Linux-amd64-64: 
-[source,bourne]
-----
-$ cd $HBASE_HOME
-$ mkdir lib/native
-$ ln -s $HADOOP_HOME/lib/native lib/native/Linux-amd64-64
-----            
-Use the compression tool to check that LZ4 is installed on all nodes.
-Start up (or restart) HBase.
-Afterward, you can create and alter tables to enable LZ4 as a compression 
codec.: 
-----
-hbase(main):003:0> alter 'TestTable', {NAME => 'info', COMPRESSION => 'LZ4'}
-----          
-
-[[snappy.compression.installation]]
-.Install Snappy Support
-
-HBase does not ship with Snappy support because of licensing issues.
-You can install Snappy binaries (for instance, by using +yum install snappy+ 
on CentOS) or build Snappy from source.
-After installing Snappy, search for the shared library, which will be called 
[path]_libsnappy.so.X_ where X is a number.
-If you built from source, copy the shared library to a known location on your 
system, such as [path]_/opt/snappy/lib/_.
-
-In addition to the Snappy library, HBase also needs access to the Hadoop 
shared library, which will be called something like [path]_libhadoop.so.X.Y_, 
where X and Y are both numbers.
-Make note of the location of the Hadoop library, or copy it to the same 
location as the Snappy library.
-
-[NOTE]
-====
-The Snappy and Hadoop libraries need to be available on each node of your 
cluster.
-See <<compression.test,compression.test>> to find out how to test that this is 
the case.
-
-See <<hbase.regionserver.codecs,hbase.regionserver.codecs>> to configure your 
RegionServers to fail to start if a given compressor is not available.
-====
-
-Each of these library locations need to be added to the environment variable 
[var]+HBASE_LIBRARY_PATH+ for the operating system user that runs HBase.
-You need to restart the RegionServer for the changes to take effect.
-
-[[compression.test]]
-.CompressionTest
-
-You can use the CompressionTest tool to verify that your compressor is 
available to HBase:
-
-----
-
- $ hbase org.apache.hadoop.hbase.util.CompressionTest 
hdfs://host/path/to/hbase snappy
-----
-
-[[hbase.regionserver.codecs]]
-.Enforce Compression Settings On a RegionServer
-
-You can configure a RegionServer so that it will fail to restart if 
compression is configured incorrectly, by adding the option 
hbase.regionserver.codecs to the [path]_hbase-site.xml_, and setting its value 
to a comma-separated list of codecs that need to be available.
-For example, if you set this property to [literal]+lzo,gz+, the RegionServer 
would fail to start if both compressors were not available.
-This would prevent a new server from being added to the cluster without having 
codecs configured properly.
-
-[[changing.compression]]
-==== Enable Compression On a ColumnFamily
-
-To enable compression for a ColumnFamily, use an [code]+alter+ command.
-You do not need to re-create the table or copy data.
-If you are changing codecs, be sure the old codec is still available until all 
the old StoreFiles have been compacted.
-
-.Enabling Compression on a ColumnFamily of an Existing Table using HBaseShell
-====
-----
-
-hbase> disable 'test'
-hbase> alter 'test', {NAME => 'cf', COMPRESSION => 'GZ'}
-hbase> enable 'test'
-----
-====
-
-.Creating a New Table with Compression On a ColumnFamily
-====
-----
-
-hbase> create 'test2', { NAME => 'cf2', COMPRESSION => 'SNAPPY' }
-----
-====
-
-.Verifying a ColumnFamily's Compression Settings
-====
-----
-
-hbase> describe 'test'
-DESCRIPTION                                          ENABLED
- 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE false
- ', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',
- VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS
- => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'fa
- lse', BLOCKSIZE => '65536', IN_MEMORY => 'false', B
- LOCKCACHE => 'true'}
-1 row(s) in 0.1070 seconds
-----
-====
-
-==== Testing Compression Performance
-
-HBase includes a tool called LoadTestTool which provides mechanisms to test 
your compression performance.
-You must specify either [literal]+-write+ or [literal]+-update-read+ as your 
first parameter, and if you do not specify another parameter, usage advice is 
printed for each option.
-
-.+LoadTestTool+ Usage
-====
-----
-
-$ bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -h            
-usage: bin/hbase org.apache.hadoop.hbase.util.LoadTestTool <options>
-Options:
- -batchupdate                 Whether to use batch as opposed to separate
-                              updates for every column in a row
- -bloom <arg>                 Bloom filter type, one of [NONE, ROW, ROWCOL]
- -compression <arg>           Compression type, one of [LZO, GZ, NONE, SNAPPY,
-                              LZ4]
- -data_block_encoding <arg>   Encoding algorithm (e.g. prefix compression) to
-                              use for data blocks in the test column family, 
one
-                              of [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE].
- -encryption <arg>            Enables transparent encryption on the test table,
-                              one of [AES]
- -generator <arg>             The class which generates load for the tool. Any
-                              args for this class can be passed as colon
-                              separated after class name
- -h,--help                    Show usage
- -in_memory                   Tries to keep the HFiles of the CF inmemory as 
far
-                              as possible.  Not guaranteed that reads are 
always
-                              served from inmemory
- -init_only                   Initialize the test table only, don't do any
-                              loading
- -key_window <arg>            The 'key window' to maintain between reads and
-                              writes for concurrent write/read workload. The
-                              default is 0.
- -max_read_errors <arg>       The maximum number of read errors to tolerate
-                              before terminating all reader threads. The 
default
-                              is 10.
- -multiput                    Whether to use multi-puts as opposed to separate
-                              puts for every column in a row
- -num_keys <arg>              The number of keys to read/write
- -num_tables <arg>            A positive integer number. When a number n is
-                              speicfied, load test tool  will load n table
-                              parallely. -tn parameter value becomes table name
-                              prefix. Each table name is in format
-                              <tn>_1...<tn>_n
- -read <arg>                  <verify_percent>[:<#threads=20>]
- -regions_per_server <arg>    A positive integer number. When a number n is
-                              specified, load test tool will create the test
-                              table with n regions per server
- -skip_init                   Skip the initialization; assume test table 
already
-                              exists
- -start_key <arg>             The first key to read/write (a 0-based index). 
The
-                              default value is 0.
- -tn <arg>                    The name of the table to read or write
- -update <arg>                <update_percent>[:<#threads=20>][:<#whether to
-                              ignore nonce collisions=0>]
- -write <arg>                 
<avg_cols_per_key>:<avg_data_size>[:<#threads=20>]
- -zk <arg>                    ZK quorum as comma-separated host names without
-                              port numbers
- -zk_root <arg>               name of parent znode in zookeeper
-----
-====
-
-.Example Usage of LoadTestTool
-====
-----
-
-$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys 
1000000
-          -read 100:30 -num_tables 1 -data_block_encoding NONE -tn 
load_test_tool_NONE
-----
-====
-
-[[data.block.encoding.enable]]
-== Enable Data Block Encoding
-
-Codecs are built into HBase so no extra configuration is needed.
-Codecs are enabled on a table by setting the [code]+DATA_BLOCK_ENCODING+ 
property.
-Disable the table before altering its DATA_BLOCK_ENCODING setting.
-Following is an example using HBase Shell:
-
-.Enable Data Block Encoding On a Table
-====
-----
-
-hbase>  disable 'test'
-hbase> alter 'test', { NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
-Updating all regions with the new schema...
-0/1 regions updated.
-1/1 regions updated.
-Done.
-0 row(s) in 2.2820 seconds
-hbase> enable 'test'
-0 row(s) in 0.1580 seconds
-----
-====
-
-.Verifying a ColumnFamily's Data Block Encoding
-====
-----
-
-hbase> describe 'test'
-DESCRIPTION                                          ENABLED
- 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST true
- _DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>
- '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERS
- IONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =
- > 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fals
- e', BLOCKCACHE => 'true'}
-1 row(s) in 0.0650 seconds
-----
-====
-
-:numbered:
-
-ifdef::backend-docbook[]
-[index]
-== Index
-// Generated automatically by the DocBook toolchain.
-endif::backend-docbook[]

http://git-wip-us.apache.org/repos/asf/hbase/blob/c0bcf7cc/src/main/asciidoc/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/configuration.adoc 
b/src/main/asciidoc/configuration.adoc
deleted file mode 100644
index 7e18307..0000000
--- a/src/main/asciidoc/configuration.adoc
+++ /dev/null
@@ -1,1075 +0,0 @@
-////
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-////
-
-[[configuration]]
-= Apache HBase Configuration
-:doctype: book
-:numbered:
-:toc: left
-:icons: font
-:experimental:
-:docinfo1:
-
-This chapter expands upon the <<getting_started,getting started>> chapter to 
further explain configuration of Apache HBase.
-Please read this chapter carefully, especially 
<<basic.prerequisites,basic.prerequisites>> to ensure that your HBase testing 
and deployment goes smoothly, and prevent data loss.
-
-Apache HBase uses the same configuration system as Apache Hadoop.
-All configuration files are located in the [path]_conf/_ directory, which 
needs to be kept in sync for each node on your cluster.
-
-.HBase Configuration Files
-[path]_backup-masters_::
-  Not present by default.
-  A plain-text file which lists hosts on which the Master should start a 
backup Master process, one host per line.
-
-[path]_hadoop-metrics2-hbase.properties_::
-  Used to connect HBase Hadoop's Metrics2 framework.
-  See the link:http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki
-              entry] for more information on Metrics2.
-  Contains only commented-out examples by default.
-
-[path]_hbase-env.cmd_ and [path]_hbase-env.sh_::
-  Script for Windows and Linux / Unix environments to set up the working 
environment for HBase, including the location of Java, Java options, and other 
environment variables.
-  The file contains many commented-out examples to provide guidance.
-
-[path]_hbase-policy.xml_::
-  The default policy configuration file used by RPC servers to make 
authorization decisions on client requests.
-  Only used if HBase security (<<security,security>>) is enabled.
-
-[path]_hbase-site.xml_::
-  The main HBase configuration file.
-  This file specifies configuration options which override HBase's default 
configuration.
-  You can view (but do not edit) the default configuration file at 
[path]_docs/hbase-default.xml_.
-  You can also view the entire effective configuration for your cluster 
(defaults and overrides) in the [label]#HBase Configuration# tab of the HBase 
Web UI.
-
-[path]_log4j.properties_::
-  Configuration file for HBase logging via [code]+log4j+.
-
-[path]_regionservers_::
-  A plain-text file containing a list of hosts which should run a RegionServer 
in your HBase cluster.
-  By default this file contains the single entry [literal]+localhost+.
-  It should contain a list of hostnames or IP addresses, one per line, and 
should only contain [literal]+localhost+ if each node in your cluster will run 
a RegionServer on its [literal]+localhost+ interface.
-
-.Checking XML Validity
-[TIP]
-====
-When you edit XML, it is a good idea to use an XML-aware editor to be sure 
that your syntax is correct and your XML is well-formed.
-You can also use the +xmllint+      utility to check that your XML is 
well-formed.
-By default, +xmllint+ re-flows and prints the XML to standard output.
-To check for well-formedness and only print output if errors exist, use the 
command +xmllint -noout
-        filename.xml+.
-====
-
-.Keep Configuration In Sync Across the Cluster
-[WARNING]
-====
-When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the content of the [path]_conf/_ directory to 
all nodes of the cluster.
-HBase will not do this for you.
-Use +rsync+, +scp+, or another secure mechanism for copying the configuration 
files to your nodes.
-For most configuration, a restart is needed for servers to pick up changes An 
exception is dynamic configuration.
-to be described later below.
-====
-
-[[basic.prerequisites]]
-== Basic Prerequisites
-
-This section lists required services and some required system configuration. 
-
-.Java
-[cols="2", options="header"]
-|===
-| HBase Version | Support
-| JDK 6 | Not Supported
-| JDK 7 | Running with JDK 8 will work but is not well tested.
-| JDK 8 | Running with JDK 8 works but is not well tested. Building with JDK 8 
would require removal of the deprecated remove() method of the PoolMap class 
and is under consideration. See ee HBASE-7608 for more information about JDK 8 
support.
-|===
-
-NOTE: In HBase 0.98.5 and newer, you must set [var]+JAVA_HOME+ on each node of 
your cluster. [path]_hbase-env.sh_ provides a handy mechanism to do this.
-
-.Operating System Utilities
-ssh::
-  HBase uses the Secure Shell (ssh) command and utilities extensively to 
communicate between cluster nodes. Each server in the cluster must be running 
+ssh+            so that the Hadoop and HBase daemons can be managed. You must 
be able to connect to all nodes via SSH, including the local node, from the 
Master as well as any backup Master, using a shared key rather than a password. 
You can see the basic methodology for such a set-up in Linux or Unix systems at 
<<passwordless.ssh.quickstart,passwordless.ssh.quickstart>>. If your cluster 
nodes use OS X, see the section, 
link:http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH:
 Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
-
-DNS::
-  HBase uses the local hostname to self-report its IP address. Both forward 
and reverse DNS resolving must work in versions of HBase previous to 0.92.0. 
The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker]        
        tool can be used to verify DNS is working correctly on the cluster. The 
project README file provides detailed instructions on usage.
-
-Loopback IP::
-  Prior to hbase-0.96.0, HBase only used the IP address 
[systemitem]+127.0.0.1+ to refer to [code]+localhost+, and this could not be 
configured.
-  See <<loopback.ip,loopback.ip>>.
-
-NTP::
-  The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism, on your cluster, and that all nodes look to the same service for 
time synchronization. See the 
link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP 
Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up 
NTP.
-
-Limits on Number of Files and Processes (ulimit)::
-  Apache HBase is a database. It requires the ability to open a large number 
of files at once. Many Linux distributions limit the number of files a single 
user is allowed to open to [literal]+1024+ (or [literal]+256+ on older versions 
of OS X). You can check this limit on your servers by running the command 
+ulimit -n+ when logged in as the user which runs HBase. See 
<<trouble.rs.runtime.filehandles,trouble.rs.runtime.filehandles>> for some of 
the problems you may experience if the limit is too low. You may also notice 
errors such as the following:
-+
-----
-2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception 
increateBlockOutputStream java.io.EOFException
-2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
block blk_-6935524980745310745_1391901
-----
-+
-It is recommended to raise the ulimit to at least 10,000, but more likely 
10,240, because the value is usually expressed in multiples of 1024. Each 
ColumnFamily has at least one StoreFile, and possibly more than 6 StoreFiles if 
the region is under load. The number of open files required depends upon the 
number of ColumnFamilies and the number of regions. The following is a rough 
formula for calculating the potential number of open files on a RegionServer.
-+
-.Calculate the Potential Number of Open Files
-----
-(StoreFiles per ColumnFamily) x (regions per RegionServer)
-----
-+
-For example, assuming that a schema had 3 ColumnFamilies per region with an 
average of 3 StoreFiles per ColumnFamily, and there are 100 regions per 
RegionServer, the JVM will open `3 * 3 * 100 = 900` file descriptors, not 
counting open JAR files, configuration files, and others. Opening a file does 
not take many resources, and the risk of allowing a user to open too many files 
is minimal.
-+
-Another related setting is the number of processes a user is allowed to run at 
once. In Linux and Unix, the number of processes is set using the ulimit -u 
command. This should not be confused with the nproc command, which controls the 
number of CPUs available to a given user. Under load, a nproc that is too low 
can cause OutOfMemoryError exceptions. See Jack Levin's major hdfs issues 
thread on the hbase-users mailing list, from 2011.
-+
-Configuring the maximum number of ile descriptors and processes for the user 
who is running the HBase process is an operating system configuration, rather 
than an HBase configuration. It is also important to be sure that the settings 
are changed for the user that actually runs HBase. To see which user started 
HBase, and that user's ulimit configuration, look at the first line of the 
HBase log for that instance. A useful read setting config on you hadoop cluster 
is Aaron Kimballs' Configuration Parameters: What can you just ignore?
-+ 
-.`ulimit` Settings on Ubuntu 
-====
-To configure ulimit settings on Ubuntu, edit /etc/security/limits.conf, which 
is a space-delimited file with four columns. Refer to the man page for 
limits.conf for details about the format of this file. In the following 
example, the first line sets both soft and hard limits for the number of open 
files (nofile) to 32768 for the operating system user with the username hadoop. 
The second line sets the number of processes to 32000 for the same user.
-----
-hadoop  -       nofile  32768
-hadoop  -       nproc   32000
-----
-The settings are only applied if the Pluggable Authentication Module (PAM) 
environment is directed to use them. To configure PAM to use these limits, be 
sure that the /etc/pam.d/common-session file contains the following line:
-----
-session required  pam_limits.so
-----
-====
-
-Windows::
-  Prior to HBase 0.96, testing for running HBase on Microsoft Windows was 
limited.
-  Running a on Windows nodes is not recommended for production systems.
-
-
-[[hadoop]]
-=== link:http://hadoop.apache.org[Hadoop](((Hadoop)))
-
-The following table summarizes the versions of Hadoop supported with each 
version of HBase.
-Based on the version of HBase, you should select the most appropriate version 
of Hadoop.
-You can use Apache Hadoop, or a vendor's distribution of Hadoop.
-No distinction is made here.
-See 
link:http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support   
     for information about vendors of Hadoop.
-
-.Hadoop 2.x is recommended.
-[TIP]
-====
-Hadoop 2.x is faster and includes features, such as short-circuit reads, which 
will help improve your HBase random read profile.
-Hadoop 2.x also includes important bug fixes that will improve your overall 
HBase experience.
-HBase 0.98 drops support for Hadoop 1.0, deprecates use of Hadoop 1.1+, and 
HBase 1.0 will not support Hadoop 1.x.
-====
-
-Use the following legend to interpret this table:
-
-.Hadoop version support matrix
-[cols="1,1,1,1,1,1", options="header"]
-|===
-| | HBase-0.92.x | HBase-0.94.x | HBase-0.96.x | HBase-0.98.x (Support for 
Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) 
-|Hadoop-0.20.205 | S | X | X | X | X
-|Hadoop-0.22.x | S | X | X | X | X
-|Hadoop-1.0.x  |X | X | X | X | X
-|Hadoop-1.1.x | NT | S | S | NT | X
-|Hadoop-0.23.x | X | S | NT | X | X
-|Hadoop-2.0.x-alpha | X | NT | X | X | X
-|Hadoop-2.1.0-beta | X | NT | S | X | X
-|Hadoop-2.2.0 | X | NT | S | S | NT
-|Hadoop-2.3.x | X | NT | S | S | NT
-|Hadoop-2.4.x | X | NT | S | S | S
-|Hadoop-2.5.x | X | NT | S | S | S
-|===
-
-.Replace the Hadoop Bundled With HBase!
-[NOTE]
-====
-Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar 
under its [path]_lib_ directory.
-The bundled jar is ONLY for use in standalone mode.
-In distributed mode, it is _critical_ that the version of Hadoop that is out 
on your cluster match what is under HBase.
-Replace the hadoop jar found in the HBase lib directory with the hadoop jar 
you are running on your cluster to avoid version mismatch issues.
-Make sure you replace the jar in HBase everywhere on your cluster.
-Hadoop version mismatch issues have various manifestations but often all looks 
like its hung up. 
-====
-
-[[hadoop2.hbase_0.94]]
-==== Apache HBase 0.94 with Hadoop 2
-
-To get 0.94.x to run on hadoop 2.2.0, you need to change the hadoop 2 and 
protobuf versions in the [path]_pom.xml_: Here is a diff with pom.xml changes: 
-
-[source]
-----
-$ svn diff pom.xml
-Index: pom.xml
-===================================================================
---- pom.xml     (revision 1545157)
-+++ pom.xml     (working copy)
-@@ -1034,7 +1034,7 @@
-     <slf4j.version>1.4.3</slf4j.version>
-     <log4j.version>1.2.16</log4j.version>
-     <mockito-all.version>1.8.5</mockito-all.version>
--    <protobuf.version>2.4.0a</protobuf.version>
-+    <protobuf.version>2.5.0</protobuf.version>
-     <stax-api.version>1.0.1</stax-api.version>
-     <thrift.version>0.8.0</thrift.version>
-     <zookeeper.version>3.4.5</zookeeper.version>
-@@ -2241,7 +2241,7 @@
-         </property>
-       </activation>
-       <properties>
--        <hadoop.version>2.0.0-alpha</hadoop.version>
-+        <hadoop.version>2.2.0</hadoop.version>
-         <slf4j.version>1.6.1</slf4j.version>
-       </properties>
-       <dependencies>
-----
-
-The next step is to regenerate Protobuf files and assuming that the Protobuf 
has been installed:
-
-* Go to the hbase root folder, using the command line;
-* Type the following commands:
-+
-
-[source,bourne]
-----
-$ protoc -Isrc/main/protobuf --java_out=src/main/java 
src/main/protobuf/hbase.proto
-----                      
-+
-
-[source,bourne]
-----
-$ protoc -Isrc/main/protobuf --java_out=src/main/java 
src/main/protobuf/ErrorHandling.proto
-----                      
-
-
-Building against the hadoop 2 profile by running something like the following 
command: 
-
-----
-$  mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests
-----
-
-[[hadoop.hbase_0.94]]
-==== Apache HBase 0.92 and 0.94
-
-HBase 0.92 and 0.94 versions can work with Hadoop versions, 0.20.205, 0.22.x, 
1.0.x, and 1.1.x.
-HBase-0.94 can additionally work with Hadoop-0.23.x and 2.x, but you may have 
to recompile the code using the specific maven profile (see top level pom.xml)
-
-[[hadoop.hbase_0.96]]
-==== Apache HBase 0.96
-
-As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required.
-Hadoop 2 is strongly encouraged (faster but also has fixes that help MTTR). We 
will no longer run properly on older Hadoops such as 0.20.205 or 
branch-0.20-append.
-Do not move to Apache HBase 0.96.x if you cannot upgrade your Hadoop.. See 
link:http://search-hadoop.com/m/7vFVx4EsUb2[HBase, mail # dev - DISCUSS:
-                Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?]
-
-[[hadoop.older.versions]]
-==== Hadoop versions 0.20.x - 1.x
-
-HBase will lose data unless it is running on an HDFS that has a durable 
[code]+sync+ implementation.
-DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 which DO 
NOT have this attribute.
-Currently only Hadoop versions 0.20.205.x or any release in excess of this 
version -- this includes hadoop-1.0.0 -- have a working, durable sync.
-The Cloudera blog post 
link:http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/[An
-            update on Apache Hadoop 1.0] by Charles Zedlweski has a nice 
exposition on how all the Hadoop versions relate.
-Its worth checking out if you are having trouble making sense of the Hadoop 
version morass. 
-
-Sync has to be explicitly enabled by setting [var]+dfs.support.append+ equal 
to true on both the client side -- in [path]_hbase-site.xml_ -- and on the 
serverside in [path]_hdfs-site.xml_ (The sync facility HBase needs is a subset 
of the append code path).
-
-[source,xml]
-----
-  
-<property>
-  <name>dfs.support.append</name>
-  <value>true</value>
-</property>
-----
-
-You will have to restart your cluster after making this edit.
-Ignore the chicken-little comment you'll find in the [path]_hdfs-default.xml_ 
in the description for the [var]+dfs.support.append+ configuration. 
-
-[[hadoop.security]]
-==== Apache HBase on Secure Hadoop
-
-Apache HBase will run on any Hadoop 0.20.x that incorporates Hadoop security 
features as long as you do as suggested above and replace the Hadoop jar that 
ships with HBase with the secure version.
-If you want to read more about how to setup Secure HBase, see 
<<hbase.secure.configuration,hbase.secure.configuration>>.
-
-[var]+dfs.datanode.max.transfer.threads+
-[[dfs.datanode.max.transfer.threads]]
-==== (((dfs.datanode.max.transfer.threads)))
-
-An HDFS datanode has an upper bound on the number of files that it will serve 
at any one time.
-Before doing any loading, make sure you have configured Hadoop's 
[path]_conf/hdfs-site.xml_, setting the 
[var]+dfs.datanode.max.transfer.threads+ value to at least the following: 
-
-[source,xml]
-----
-
-<property>
-  <name>dfs.datanode.max.transfer.threads</name>
-  <value>4096</value>
-</property>
-----
-
-Be sure to restart your HDFS after making the above configuration.
-
-Not having this configuration in place makes for strange-looking failures.
-One manifestation is a complaint about missing blocks.
-For example:
-
-----
-10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block
-          blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: 
java.io.IOException: No live nodes
-          contain current block. Will get new block locations from namenode 
and retry...
-----
-
-See also <<casestudies.max.transfer.threads,casestudies.max.transfer.threads>> 
and note that this property was previously known as 
[var]+dfs.datanode.max.xcievers+ (e.g. 
link:http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html[
-            Hadoop HDFS: Deceived by Xciever]). 
-
-[[zookeeper.requirements]]
-=== ZooKeeper Requirements
-
-ZooKeeper 3.4.x is required as of HBase 1.0.0.
-HBase makes use of the [method]+multi+ functionality that is only available 
since 3.4.0 (The +useMulti+ is defaulted true in HBase 1.0.0). See 
link:[HBASE-12241 The crash of regionServer when taking deadserver's 
replication queue breaks replication]        and link:[Use ZK.multi when 
available for HBASE-6710 0.92/0.94 compatibility fix] for background.
-
-[[standalone_dist]]
-== HBase run modes: Standalone and Distributed
-
-HBase has two run modes: <<standalone,standalone>> and 
<<distributed,distributed>>.
-Out of the box, HBase runs in standalone mode.
-Whatever your mode, you will need to configure HBase by editing files in the 
HBase [path]_conf_      directory.
-At a minimum, you must edit [code]+conf/hbase-env.sh+ to tell HBase which 
+java+ to use.
-In this file you set HBase environment variables such as the heapsize and 
other options for the +JVM+, the preferred location for log files, etc.
-Set [var]+JAVA_HOME+ to point at the root of your +java+ install.
-
-[[standalone]]
-=== Standalone HBase
-
-This is the default mode.
-Standalone mode is what is described in the <<quickstart,quickstart>> section.
-In standalone mode, HBase does not use HDFS -- it uses the local filesystem 
instead -- and it runs all HBase daemons and a local ZooKeeper all up in the 
same JVM.
-Zookeeper binds to a well known port so clients may talk to HBase.
-
-=== Distributed
-
-Distributed mode can be subdivided into distributed but all daemons run on a 
single node -- a.k.a _pseudo-distributed_-- and _fully-distributed_ where the 
daemons are spread across all nodes in the cluster.
-The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.
-
-Pseudo-distributed mode can run against the local filesystem or it can run 
against an instance of the _Hadoop Distributed File System_ (HDFS). 
Fully-distributed mode can ONLY run on HDFS.
-See the Hadoop 
link:http://hadoop.apache.org/common/docs/r1.1.1/api/overview-summary.html#overview_description[
-          requirements and instructions] for how to set up HDFS for Hadoop 1.x.
-A good walk-through for setting up HDFS on Hadoop 2 is at 
link:http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
-
-Below we describe the different distributed setups.
-Starting, verification and exploration of your install, whether a 
_pseudo-distributed_ or _fully-distributed_ configuration is described in a 
section that follows, <<confirm,confirm>>.
-The same verification script applies to both deploy types.
-
-[[pseudo]]
-==== Pseudo-distributed
-
-.Pseudo-Distributed Quickstart
-[NOTE]
-====
-A quickstart has been added to the <<quickstart,quickstart>> chapter.
-See <<quickstart_pseudo,quickstart-pseudo>>.
-Some of the information that was originally in this section has been moved 
there.
-====
-
-A pseudo-distributed mode is simply a fully-distributed mode run on a single 
host.
-Use this configuration testing and prototyping on HBase.
-Do not use this configuration for production nor for evaluating HBase 
performance.
-
-[[fully_dist]]
-=== Fully-distributed
-
-By default, HBase runs in standalone mode.
-Both standalone mode and pseudo-distributed mode are provided for the purposes 
of small-scale testing.
-For a production environment, distributed mode is appropriate.
-In distributed mode, multiple instances of HBase daemons run on multiple 
servers in the cluster.
-
-Just as in pseudo-distributed mode, a fully distributed configuration requires 
that you set the [code]+hbase-cluster.distributed+ property to [literal]+true+.
-Typically, the [code]+hbase.rootdir+ is configured to point to a 
highly-available HDFS filesystem. 
-
-In addition, the cluster is configured so that multiple cluster nodes enlist 
as RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers.
-These configuration basics are all demonstrated in 
<<quickstart_fully_distributed,quickstart-fully-distributed>>.
-
-.Distributed RegionServers
-Typically, your cluster will contain multiple RegionServers all running on 
different servers, as well as primary and backup Master and Zookeeper daemons.
-The [path]_conf/regionservers_ file on the master server contains a list of 
hosts whose RegionServers are associated with this cluster.
-Each host is on a separate line.
-All hosts listed in this file will have their RegionServer processes started 
and stopped when the master server starts or stops.
-
-.ZooKeeper and HBase
-See section <<zookeeper,zookeeper>> for ZooKeeper setup for HBase.
-
-.Example Distributed HBase Cluster
-====
-This is a bare-bones [path]_conf/hbase-site.xml_ for a distributed HBase 
cluster.
-A cluster that is used for real-world work would contain more custom 
configuration parameters.
-Most HBase configuration directives have default values, which are used unless 
the value is overridden in the [path]_hbase-site.xml_.
-See <<config.files,config.files>> for more information.
-
-[source,xml]
-----
-
-<configuration>
-  <property>
-    <name>hbase.rootdir</name>
-    <value>hdfs://namenode.example.org:8020/hbase</value>
-  </property>
-  <property>
-    <name>hbase.cluster.distributed</name>
-    <value>true</value>
-  </property>
-  <property>
-      <name>hbase.zookeeper.quorum</name>
-      <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
-    </property>
-</configuration>
-----
-
-This is an example [path]_conf/regionservers_ file, which contains a list of 
each node that should run a RegionServer in the cluster.
-These nodes need HBase installed and they need to use the same contents of the 
[path]_conf/_          directory as the Master server..
-
-[source]
-----
-
-node-a.example.com
-node-b.example.com
-node-c.example.com
-----
-
-This is an example [path]_conf/backup-masters_ file, which contains a list of 
each node that should run a backup Master instance.
-The backup Master instances will sit idle unless the main Master becomes 
unavailable.
-
-[source]
-----
-
-node-b.example.com
-node-c.example.com
-----
-====
-
-.Distributed HBase Quickstart
-See <<quickstart_fully_distributed,quickstart-fully-distributed>> for a 
walk-through of a simple three-node cluster configuration with multiple 
ZooKeeper, backup HMaster, and RegionServer instances.
-
-.Procedure: HDFS Client Configuration
-. Of note, if you have made HDFS client configuration on your Hadoop cluster, 
such as configuration directives for HDFS clients, as opposed to server-side 
configurations, you must use one of the following methods to enable HBase to 
see and use these configuration changes:
-+
-a. Add a pointer to your [var]+HADOOP_CONF_DIR+ to the [var]+HBASE_CLASSPATH+ 
environment variable in [path]_hbase-env.sh_.
-b. Add a copy of [path]_hdfs-site.xml_ (or [path]_hadoop-site.xml_) or, 
better, symlinks, under [path]_${HBASE_HOME}/conf_, or
-c. if only a small set of HDFS client configurations, add them to 
[path]_hbase-site.xml_.
-
-
-An example of such an HDFS client configuration is [var]+dfs.replication+.
-If for example, you want to run with a replication factor of 5, hbase will 
create files with the default of 3 unless you do the above to make the 
configuration available to HBase.
-
-[[confirm]]
-== Running and Confirming Your Installation
-
-Make sure HDFS is running first.
-Start and stop the Hadoop HDFS daemons by running [path]_bin/start-hdfs.sh_ 
over in the [var]+HADOOP_HOME+        directory.
-You can ensure it started properly by testing the +put+ and +get+ of files 
into the Hadoop filesystem.
-HBase does not normally use the mapreduce daemons.
-These do not need to be started.
-
-_If_ you are managing your own ZooKeeper, start it and confirm its running 
else, HBase will start up ZooKeeper for you as part of its start process.
-
-Start HBase with the following command:
-
-----
-bin/start-hbase.sh
-----
-
-Run the above from the [var]+HBASE_HOME+ directory.
-
-You should now have a running HBase instance.
-HBase logs can be found in the [path]_logs_ subdirectory.
-Check them out especially if HBase had trouble starting.
-
-HBase also puts up a UI listing vital attributes.
-By default its deployed on the Master host at port 16010 (HBase RegionServers 
listen on port 16020 by default and put up an informational http server at 
16030). If the Master were running on a host named [var]+master.example.org+ on 
the default port, to see the Master's homepage you'd point your browser at 
[path]_http://master.example.org:16010_.
-
-Prior to HBase 0.98, the default ports the master ui was deployed on port 
16010, and the HBase RegionServers would listen on port 16020 by default and 
put up an informational http server at 16030. 
-
-Once HBase has started, see the <<shell_exercises,shell exercises>> for how to 
create tables, add data, scan your insertions, and finally disable and drop 
your tables.
-
-To stop HBase after exiting the HBase shell enter
-
-----
-$ ./bin/stop-hbase.sh
-stopping hbase...............
-----
-
-Shutdown can take a moment to complete.
-It can take longer if your cluster is comprised of many machines.
-If you are running a distributed operation, be sure to wait until HBase has 
shut down completely before stopping the Hadoop daemons.
-
-[[config.files]]
-== Configuration Files
-
-[[hbase.site]]
-=== [path]_hbase-site.xml_ and [path]_hbase-default.xml_
-
-Just as in Hadoop where you add site-specific HDFS configuration to the 
[path]_hdfs-site.xml_ file, for HBase, site specific customizations go into the 
file [path]_conf/hbase-site.xml_.
-For the list of configurable properties, see 
<<hbase_default_configurations,hbase default configurations>> below or view the 
raw [path]_hbase-default.xml_ source file in the HBase source code at 
[path]_src/main/resources_. 
-
-Not all configuration options make it out to [path]_hbase-default.xml_.
-Configuration that it is thought rare anyone would change can exist only in 
code; the only way to turn up such configurations is via a reading of the 
source code itself. 
-
-Currently, changes here will require a cluster restart for HBase to notice the 
change. 
-
-include::hbase-default.adoc[]
-
-[[hbase.env.sh]]
-=== [path]_hbase-env.sh_
-
-Set HBase environment variables in this file.
-Examples include options to pass the JVM on start of an HBase daemon such as 
heap size and garbage collector configs.
-You can also set configurations for HBase configuration, log directories, 
niceness, ssh options, where to locate process pid files, etc.
-Open the file at [path]_conf/hbase-env.sh_ and peruse its content.
-Each option is fairly well documented.
-Add your own environment variables here if you want them read by HBase daemons 
on startup.
-
-Changes here will require a cluster restart for HBase to notice the change. 
-
-[[log4j]]
-=== [path]_log4j.properties_
-
-Edit this file to change rate at which HBase files are rolled and to change 
the level at which HBase logs messages. 
-
-Changes here will require a cluster restart for HBase to notice the change 
though log levels can be changed for particular daemons via the HBase UI. 
-
-[[client_dependencies]]
-=== Client configuration and dependencies connecting to an HBase cluster
-
-If you are running HBase in standalone mode, you don't need to configure 
anything for your client to work provided that they are all on the same machine.
-
-Since the HBase Master may move around, clients bootstrap by looking to 
ZooKeeper for current critical locations.
-ZooKeeper is where all these values are kept.
-Thus clients require the location of the ZooKeeper ensemble information before 
they can do anything else.
-Usually this the ensemble location is kept out in the [path]_hbase-site.xml_   
     and is picked up by the client from the [var]+CLASSPATH+.
-
-If you are configuring an IDE to run a HBase client, you should include the 
[path]_conf/_ directory on your classpath so [path]_hbase-site.xml_ settings 
can be found (or add [path]_src/test/resources_ to pick up the hbase-site.xml 
used by tests). 
-
-Minimally, a client of HBase needs several libraries in its [var]+CLASSPATH+ 
when connecting to a cluster, including: 
-[source]
-----
-
-commons-configuration (commons-configuration-1.6.jar)
-commons-lang (commons-lang-2.5.jar)
-commons-logging (commons-logging-1.1.1.jar)
-hadoop-core (hadoop-core-1.0.0.jar)
-hbase (hbase-0.92.0.jar)
-log4j (log4j-1.2.16.jar)
-slf4j-api (slf4j-api-1.5.8.jar)
-slf4j-log4j (slf4j-log4j12-1.5.8.jar)
-zookeeper (zookeeper-3.4.2.jar)
-----      
-
-An example basic [path]_hbase-site.xml_ for client only might look as follows: 
-[source,xml]
-----
-
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-  <property>
-    <name>hbase.zookeeper.quorum</name>
-    <value>example1,example2,example3</value>
-    <description>The directory shared by region servers.
-    </description>
-  </property>
-</configuration>
-----      
-
-[[java.client.config]]
-==== Java client configuration
-
-The configuration used by a Java client is kept in an 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration]
          instance.
-
-The factory method on HBaseConfiguration, 
[code]+HBaseConfiguration.create();+, on invocation, will read in the content 
of the first [path]_hbase-site.xml_ found on the client's [var]+CLASSPATH+, if 
one is present (Invocation will also factor in any [path]_hbase-default.xml_ 
found; an hbase-default.xml ships inside the [path]_hbase.X.X.X.jar_). It is 
also possible to specify configuration directly without having to read from a 
[path]_hbase-site.xml_.
-For example, to set the ZooKeeper ensemble for the cluster programmatically do 
as follows: 
-
-[source,java]
-----
-Configuration config = HBaseConfiguration.create();
-config.set("hbase.zookeeper.quorum", "localhost");  // Here we are running 
zookeeper locally
-----          
-
-If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be 
specified in a comma-separated list (just as in the [path]_hbase-site.xml_ 
file). This populated [class]+Configuration+ instance can then be passed to an 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html[HTable],
 and so on. 
-
-[[example_config]]
-== Example Configurations
-
-=== Basic Distributed HBase Install
-
-Here is an example basic configuration for a distributed ten node cluster.
-The nodes are named [var]+example0+, [var]+example1+, etc., through node 
[var]+example9+ in this example.
-The HBase Master and the HDFS namenode are running on the node [var]+example0+.
-RegionServers run on nodes [var]+example1+-[var]+example9+.
-A 3-node ZooKeeper ensemble runs on [var]+example1+, [var]+example2+, and 
[var]+example3+        on the default ports.
-ZooKeeper data is persisted to the directory [path]_/export/zookeeper_.
-Below we show what the main configuration files -- [path]_hbase-site.xml_, 
[path]_regionservers_, and [path]_hbase-env.sh_ -- found in the HBase 
[path]_conf_        directory might look like.
-
-[[hbase_site]]
-==== [path]_hbase-site.xml_
-
-[source,bourne]
-----
-
-
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-  <property>
-    <name>hbase.zookeeper.quorum</name>
-    <value>example1,example2,example3</value>
-    <description>The directory shared by RegionServers.
-    </description>
-  </property>
-  <property>
-    <name>hbase.zookeeper.property.dataDir</name>
-    <value>/export/zookeeper</value>
-    <description>Property from ZooKeeper config zoo.cfg.
-    The directory where the snapshot is stored.
-    </description>
-  </property>
-  <property>
-    <name>hbase.rootdir</name>
-    <value>hdfs://example0:8020/hbase</value>
-    <description>The directory shared by RegionServers.
-    </description>
-  </property>
-  <property>
-    <name>hbase.cluster.distributed</name>
-    <value>true</value>
-    <description>The mode the cluster will be in. Possible values are
-      false: standalone and pseudo-distributed setups with managed Zookeeper
-      true: fully-distributed with unmanaged Zookeeper Quorum (see 
hbase-env.sh)
-    </description>
-  </property>
-</configuration>
-----
-
-[[regionservers]]
-==== [path]_regionservers_
-
-In this file you list the nodes that will run RegionServers.
-In our case, these nodes are [var]+example1+-[var]+example9+. 
-
-[source]
-----
-
-example1
-example2
-example3
-example4
-example5
-example6
-example7
-example8
-example9
-----
-
-[[hbase_env]]
-==== [path]_hbase-env.sh_
-
-The following lines in the [path]_hbase-env.sh_ file show how to set the 
[var]+JAVA_HOME+ environment variable (required for HBase 0.98.5 and newer) and 
set the heap to 4 GB (rather than the default value of 1 GB). If you copy and 
paste this example, be sure to adjust the [var]+JAVA_HOME+ to suit your 
environment.
-
-----
-
-# The java implementation to use.
-export JAVA_HOME=/usr/java/jdk1.7.0/          
-
-# The maximum amount of heap to use, in MB. Default is 1000.
-export HBASE_HEAPSIZE=4096
-----
-
-Use +rsync+ to copy the content of the [path]_conf_          directory to all 
nodes of the cluster.
-
-[[important_configurations]]
-== The Important Configurations
-
-Below we list what the _important_ Configurations.
-We've divided this section into required configuration and worth-a-look 
recommended configs. 
-
-[[required_configuration]]
-=== Required Configurations
-
-Review the <<os,os>> and <<hadoop,hadoop>> sections. 
-
-[[big.cluster.config]]
-==== Big Cluster Configurations
-
-If a cluster with a lot of regions, it is possible if an eager beaver 
regionserver checks in soon after master start while all the rest in the 
cluster are laggardly, this first server to checkin will be assigned all 
regions.
-If lots of regions, this first server could buckle under the load.
-To prevent the above scenario happening up the 
[var]+hbase.master.wait.on.regionservers.mintostart+ from its default value of 
1.
-See link:https://issues.apache.org/jira/browse/HBASE-6389[HBASE-6389 Modify the
-            conditions to ensure that Master waits for sufficient number of 
Region Servers before
-            starting region assignments] for more detail. 
-
-[[backup.master.fail.fast]]
-==== If a backup Master, making primary Master fail fast
-
-If the primary Master loses its connection with ZooKeeper, it will fall into a 
loop where it keeps trying to reconnect.
-Disable this functionality if you are running more than one Master: i.e.
-a backup Master.
-Failing to do so, the dying Master may continue to receive RPCs though another 
Master has assumed the role of primary.
-See the configuration 
<<fail.fast.expired.active.master,fail.fast.expired.active.master>>. 
-
-=== Recommended Configurations
-
-[[recommended_configurations.zk]]
-==== ZooKeeper Configuration
-
-[[sect.zookeeper.session.timeout]]
-===== [var]+zookeeper.session.timeout+
-
-The default timeout is three minutes (specified in milliseconds). This means 
that if a server crashes, it will be three minutes before the Master notices 
the crash and starts recovery.
-You might like to tune the timeout down to a minute or even less so the Master 
notices failures the sooner.
-Before changing this value, be sure you have your JVM garbage collection 
configuration under control otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer (You might 
be fine with this -- you probably want recovery to start on the server if a 
RegionServer has been in GC for a long period of time).
-
-To change this configuration, edit [path]_hbase-site.xml_, copy the changed 
file around the cluster and restart.
-
-We set this value high to save our having to field noob questions up on the 
mailing lists asking why a RegionServer went down during a massive import.
-The usual cause is that their JVM is untuned and they are running into long GC 
pauses.
-Our thinking is that while users are getting familiar with HBase, we'd save 
them having to know all of its intricacies.
-Later when they've built some confidence, then they can play with 
configuration such as this. 
-
-[[zookeeper.instances]]
-===== Number of ZooKeeper Instances
-
-See <<zookeeper,zookeeper>>. 
-
-[[recommended.configurations.hdfs]]
-==== HDFS Configurations
-
-[[dfs.datanode.failed.volumes.tolerated]]
-===== dfs.datanode.failed.volumes.tolerated
-
-This is the "...number of volumes that are allowed to fail before a datanode 
stops offering service.
-By default any volume failure will cause a datanode to shutdown" from the 
[path]_hdfs-default.xml_ description.
-If you have > three or four disks, you might want to set this to 1 or if you 
have many disks, two or more. 
-
-[[hbase.regionserver.handler.count_description]]
-==== [var]+hbase.regionserver.handler.count+
-
-This setting defines the number of threads that are kept open to answer 
incoming requests to user tables.
-The rule of thumb is to keep this number low when the payload per request 
approaches the MB (big puts, scans using a large cache) and high when the 
payload is small (gets, small puts, ICVs, deletes). The total size of the 
queries in progress is limited by the setting 
"hbase.ipc.server.max.callqueue.size". 
-
-It is safe to set that number to the maximum number of incoming clients if 
their payload is small, the typical example being a cluster that serves a 
website since puts aren't typically buffered and most of the operations are 
gets. 
-
-The reason why it is dangerous to keep this setting high is that the aggregate 
size of all the puts that are currently happening in a region server may impose 
too much pressure on its memory, or even trigger an OutOfMemoryError.
-A region server running on low memory will trigger its JVM's garbage collector 
to run more frequently up to a point where GC pauses become noticeable (the 
reason being that all the memory used to keep all the requests' payloads cannot 
be trashed, no matter how hard the garbage collector tries). After some time, 
the overall cluster throughput is affected since every request that hits that 
region server will take longer, which exacerbates the problem even more. 
-
-You can get a sense of whether you have too little or too many handlers by 
<<rpc.logging,rpc.logging>> on an individual RegionServer then tailing its logs 
(Queued requests consume memory). 
-
-[[big_memory]]
-==== Configuration for large memory machines
-
-HBase ships with a reasonable, conservative configuration that will work on 
nearly all machine types that people might want to test with.
-If you have larger machines -- HBase has 8G and larger heap -- you might the 
following configuration options helpful.
-TODO. 
-
-[[config.compression]]
-==== Compression
-
-You should consider enabling ColumnFamily compression.
-There are several options that are near-frictionless and in most all cases 
boost performance by reducing the size of StoreFiles and thus reducing I/O. 
-
-See <<compression,compression>> for more information.
-
-[[config.wals]]
-==== Configuring the size and number of WAL files
-
-HBase uses <<wal,wal>> to recover the memstore data that has not been flushed 
to disk in case of an RS failure.
-These WAL files should be configured to be slightly smaller than HDFS block 
(by default, HDFS block is 64Mb and WAL file is ~60Mb).
-
-HBase also has a limit on number of WAL files, designed to ensure there's 
never too much data that needs to be replayed during recovery.
-This limit needs to be set according to memstore configuration, so that all 
the necessary data would fit.
-It is recommended to allocated enough WAL files to store at least that much 
data (when all memstores are close to full). For example, with 16Gb RS heap, 
default memstore settings (0.4), and default WAL file size (~60Mb), 
16Gb*0.4/60, the starting point for WAL file count is ~109.
-However, as all memstores are not expected to be full all the time, less WAL 
files can be allocated.
-
-[[disable.splitting]]
-==== Managed Splitting
-
-HBase generally handles splitting your regions, based upon the settings in 
your [path]_hbase-default.xml_ and [path]_hbase-site.xml_          
configuration files.
-Important settings include [var]+hbase.regionserver.region.split.policy+, 
[var]+hbase.hregion.max.filesize+, [var]+hbase.regionserver.regionSplitLimit+.
-A simplistic view of splitting is that when a region grows to 
[var]+hbase.hregion.max.filesize+, it is split.
-For most use patterns, most of the time, you should use automatic splitting.
-See <<manual_region_splitting_decisions,manual region splitting decisions>> 
for more information about manual region splitting.
-
-Instead of allowing HBase to split your regions automatically, you can choose 
to manage the splitting yourself.
-This feature was added in HBase 0.90.0.
-Manually managing splits works if you know your keyspace well, otherwise let 
HBase figure where to split for you.
-Manual splitting can mitigate region creation and movement under load.
-It also makes it so region boundaries are known and invariant (if you disable 
region splitting). If you use manual splits, it is easier doing staggered, 
time-based major compactions spread out your network IO load.
-
-.Disable Automatic Splitting
-To disable automatic splitting, set [var]+hbase.hregion.max.filesize+ to a 
very large value, such as [literal]+100 GB+ It is not recommended to set it to 
its absolute maximum value of [literal]+Long.MAX_VALUE+.
-
-.Automatic Splitting Is Recommended
-[NOTE]
-====
-If you disable automatic splits to diagnose a problem or during a period of 
fast data growth, it is recommended to re-enable them when your situation 
becomes more stable.
-The potential benefits of managing region splits yourself are not undisputed.
-====
-
-.Determine the Optimal Number of Pre-Split Regions
-The optimal number of pre-split regions depends on your application and 
environment.
-A good rule of thumb is to start with 10 pre-split regions per server and 
watch as data grows over time.
-It is better to err on the side of too few regions and perform rolling splits 
later.
-The optimal number of regions depends upon the largest StoreFile in your 
region.
-The size of the largest StoreFile will increase with time if the amount of 
data grows.
-The goal is for the largest region to be just large enough that the compaction 
selection algorithm only compacts it during a timed major compaction.
-Otherwise, the cluster can be prone to compaction storms where a large number 
of regions under compaction at the same time.
-It is important to understand that the data growth causes compaction storms, 
and not the manual split decision.
-
-If the regions are split into too many large regions, you can increase the 
major compaction interval by configuring 
[var]+HConstants.MAJOR_COMPACTION_PERIOD+.
-HBase 0.90 introduced [class]+org.apache.hadoop.hbase.util.RegionSplitter+, 
which provides a network-IO-safe rolling split of all regions.
-
-[[managed.compactions]]
-==== Managed Compactions
-
-By default, major compactions are scheduled to run once in a 7-day period.
-Prior to HBase 0.96.x, major compactions were scheduled to happen once per day 
by default.
-
-If you need to control exactly when and how often major compaction runs, you 
can disable managed major compactions.
-See the entry for [var]+hbase.hregion.majorcompaction+ in the 
<<compaction.parameters,compaction.parameters>> table for details.
-
-.Do Not Disable Major Compactions
-[WARNING]
-====
-Major compactions are absolutely necessary for StoreFile clean-up.
-Do not disable them altogether.
-You can run major compactions manually via the HBase shell or via the 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29[HBaseAdmin
-              API].
-====
-
-For more information about compactions and the compaction file selection 
process, see <<compaction,compaction>>
-
-[[spec.ex]]
-==== Speculative Execution
-
-Speculative Execution of MapReduce tasks is on by default, and for HBase 
clusters it is generally advised to turn off Speculative Execution at a 
system-level unless you need it for a specific case, where it can be configured 
per-job.
-Set the properties [var]+mapreduce.map.speculative+ and 
[var]+mapreduce.reduce.speculative+ to false. 
-
-[[other_configuration]]
-=== Other Configurations
-
-[[balancer_config]]
-==== Balancer
-
-The balancer is a periodic operation which is run on the master to 
redistribute regions on the cluster.
-It is configured via [var]+hbase.balancer.period+ and defaults to 300000 (5 
minutes). 
-
-See <<master.processes.loadbalancer,master.processes.loadbalancer>> for more 
information on the LoadBalancer. 
-
-[[disabling.blockcache]]
-==== Disabling Blockcache
-
-Do not turn off block cache (You'd do it by setting 
[var]+hbase.block.cache.size+ to zero). Currently we do not do well if you do 
this because the regionserver will spend all its time loading hfile indices 
over and over again.
-If your working set it such that block cache does you no good, at least size 
the block cache such that hfile indices will stay up in the cache (you can get 
a rough idea on the size you need by surveying regionserver UIs; you'll see 
index block size accounted near the top of the webpage).
-
-[[nagles]]
-==== link:http://en.wikipedia.org/wiki/Nagle's_algorithm[Nagle's] or the small 
package problem
-
-If a big 40ms or so occasional delay is seen in operations against HBase, try 
the Nagles' setting.
-For example, see the user mailing list thread, 
link:http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&subj=Re+Inconsistent+scan+performance+with+caching+set+to+1[Inconsistent
 scan performance with caching set to 1]      and the issue cited therein where 
setting notcpdelay improved scan speeds.
-You might also see the graphs on the tail of 
link:https://issues.apache.org/jira/browse/HBASE-7008[HBASE-7008 Set scanner 
caching to a better default]      where our Lars Hofhansl tries various data 
sizes w/ Nagle's on and off measuring the effect.
-
-[[mttr]]
-==== Better Mean Time to Recover (MTTR)
-
-This section is about configurations that will make servers come back faster 
after a fail.
-See the Deveraj Das an Nicolas Liochon blog post 
link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction
 to HBase Mean Time to Recover (MTTR)]          for a brief introduction.
-
-The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 
forces Namenode into loop with lease recovery requests]          is messy but 
has a bunch of good discussion toward the end on low timeouts and how to effect 
faster recovery including citation of fixes added to HDFS.
-Read the Varun Sharma comments.
-The below suggested configurations are Varun's suggestions distilled and 
tested.
-Make sure you are running on a late-version HDFS so you have the fixes he 
refers too and himself adds to HDFS that help HBase MTTR (e.g.
-HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and late 
hadoop 1 has some). Set the following in the RegionServer.
-
-[source,xml]
-----
-
-<property>
-<property>
-    <name>hbase.lease.recovery.dfs.timeout</name>
-    <value>23000</value>
-    <description>How much time we allow elapse between calls to recover lease.
-    Should be larger than the dfs timeout.</description>
-</property>
-<property>
-    <name>dfs.client.socket-timeout</name>
-    <value>10000</value>
-    <description>Down the DFS timeout from 60 to 10 seconds.</description>
-</property>
-----
-
-And on the namenode/datanode side, set the following to enable 'staleness' 
introduced in HDFS-3703, HDFS-3912. 
-
-[source,xml]
-----
-
-<property>
-    <name>dfs.client.socket-timeout</name>
-    <value>10000</value>
-    <description>Down the DFS timeout from 60 to 10 seconds.</description>
-</property>
-<property>
-    <name>dfs.datanode.socket.write.timeout</name>
-    <value>10000</value>
-    <description>Down the DFS timeout from 8 * 60 to 10 seconds.</description>
-</property>
-<property>
-    <name>ipc.client.connect.timeout</name>
-    <value>3000</value>
-    <description>Down from 60 seconds to 3.</description>
-</property>
-<property>
-    <name>ipc.client.connect.max.retries.on.timeouts</name>
-    <value>2</value>
-    <description>Down from 45 seconds to 3 (2 == 3 retries).</description>
-</property>
-<property>
-    <name>dfs.namenode.avoid.read.stale.datanode</name>
-    <value>true</value>
-    <description>Enable stale state in hdfs</description>
-</property>
-<property>
-    <name>dfs.namenode.stale.datanode.interval</name>
-    <value>20000</value>
-    <description>Down from default 30 seconds</description>
-</property>
-<property>
-    <name>dfs.namenode.avoid.write.stale.datanode</name>
-    <value>true</value>
-    <description>Enable stale state in hdfs</description>
-</property>
-----
-
-[[jmx_config]]
-==== JMX
-
-JMX(Java Management Extensions) provides built-in instrumentation that enables 
you to monitor and manage the Java VM.
-To enable monitoring and management from remote systems, you need to set 
system property com.sun.management.jmxremote.port(the port number through which 
you want to enable JMX RMI connections) when you start the Java VM.
-See 
link:http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html[official
 document] for more information.
-Historically, besides above port mentioned, JMX opens 2 additional random TCP 
listening ports, which could lead to port conflict problem.(See 
link:https://issues.apache.org/jira/browse/HBASE-10289[HBASE-10289]          
for details) 
-
-As an alternative, You can use the coprocessor-based JMX implementation 
provided by HBase.
-To enable it in 0.99 or above, add below property in [path]_hbase-site.xml_: 
-
-[source,xml]
-----
-<property>
-    <name>hbase.coprocessor.regionserver.classes</name>
-    <value>org.apache.hadoop.hbase.JMXListener</value>
-</property>
-----          
-
-NOTE: DO NOT set com.sun.management.jmxremote.port for Java VM at the same 
time. 
-
-Currently it supports Master and RegionServer Java VM.
-The reason why you only configure coprocessor for 'regionserver' is that, 
starting from HBase 0.99, a Master IS also a RegionServer.
-(See link:https://issues.apache.org/jira/browse/HBASE-10569[HBASE-10569]       
   for more information.) By default, the JMX listens on TCP port 10102, you 
can further configure the port using below properties:  
-
-[source,xml]
-----
-
-<property>
-    <name>regionserver.rmi.registry.port</name>
-    <value>61130</value>
-</property>
-<property>
-    <name>regionserver.rmi.connector.port</name>
-    <value>61140</value>
-</property>
-----          
-
-The registry port can be shared with connector port in most cases, so you only 
need to configure regionserver.rmi.registry.port.
-However if you want to use SSL communication, the 2 ports must be configured 
to different values. 
-
-By default the password authentication and SSL communication is disabled.
-To enable password authentication, you need to update [path]_hbase-env.sh_     
     like below: 
-----
-export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.authenticate=true        
          \
-                       
-Dcom.sun.management.jmxremote.password.file=your_password_file   \
-                       
-Dcom.sun.management.jmxremote.access.file=your_access_file"
-
-export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE "
-export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE "
-----          
-
-See example password/access file under $JRE_HOME/lib/management. 
-
-To enable SSL communication with password authentication, follow below steps: 
-
-----
-#1. generate a key pair, stored in myKeyStore
-keytool -genkey -alias jconsole -keystore myKeyStore
-
-#2. export it to file jconsole.cert
-keytool -export -alias jconsole -keystore myKeyStore -file jconsole.cert
-
-#3. copy jconsole.cert to jconsole client machine, import it to 
jconsoleKeyStore
-keytool -import -alias jconsole -keystore jconsoleKeyStore -file jconsole.cert
-----          
-
-And then update [path]_hbase-env.sh_ like below: 
-
-----
-
-export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=true                 
        \
-                       -Djavax.net.ssl.keyStore=/home/tianq/myKeyStore         
        \
-                       
-Djavax.net.ssl.keyStorePassword=your_password_in_step_1       \
-                       -Dcom.sun.management.jmxremote.authenticate=true        
        \
-                       
-Dcom.sun.management.jmxremote.password.file=your_password file \
-                       
-Dcom.sun.management.jmxremote.access.file=your_access_file"
-
-export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE "
-export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE "
-----          
-
-Finally start jconsole on client using the key store: 
-
-----
-jconsole -J-Djavax.net.ssl.trustStore=/home/tianq/jconsoleKeyStore
-----        
-
-NOTE: for HBase 0.98, To enable the HBase JMX implementation on Master, you 
also need to add below property in [path]_hbase-site.xml_: 
-
-[source,xml]
-----
-<property>
-    <name>hbase.coprocessor.master.classes</name>
-    <value>org.apache.hadoop.hbase.JMXListener</value>
-</property>
-----          
-
-The corresponding properties for port configuration are 
master.rmi.registry.port (by default 10101) and master.rmi.connector.port(by 
default the same as registry.port) 
-
-[[dyn_config]]
-== Dynamic Configuration
-
-Since HBase 1.0.0, it is possible to change a subset of the configuration 
without requiring a server restart.
-In the hbase shell, there are new operators, +update_config+ and 
+update_all_config+ that will prompt a server or all servers to reload 
configuration.
-
-Only a subset of all configurations can currently be changed in the running 
server.
-Here is an incomplete list: +hbase.regionserver.thread.compaction.large+, 
+hbase.regionserver.thread.compaction.small+, 
+hbase.regionserver.thread.split+, +hbase.regionserver.thread.merge+, as well 
as compaction policy and configurations and adjustment to offpeak hours.
-For the full list consult the patch attached to  
link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting 
Online Config Change from 89-fb]. 
-
-ifdef::backend-docbook[]
-[index]
-== Index
-// Generated automatically by the DocBook toolchain.
-endif::backend-docbook[]

Reply via email to