[2/3] hbase git commit: updating docs from master

ndimiduk Sun, 08 May 2016 19:23:49 -0700

updating docs from master


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/a82121b2
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/a82121b2
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/a82121b2

Branch: refs/heads/branch-1.1
Commit: a82121b2ee0388cfe19c80719d095b46e72f921e
Parents: bdebdc9
Author: Nick Dimiduk <ndimi...@apache.org>
Authored: Sun May 8 19:22:50 2016 -0700
Committer: Nick Dimiduk <ndimi...@apache.org>
Committed: Sun May 8 19:22:50 2016 -0700

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/architecture.adoc   | 101 +++++++++++++++++++
 src/main/asciidoc/_chapters/configuration.adoc  |  12 +--
 src/main/asciidoc/_chapters/developer.adoc      |  69 +++++++++++++
 src/main/asciidoc/_chapters/external_apis.adoc  |  20 ++--
 .../asciidoc/_chapters/getting_started.adoc     |   4 +-
 src/main/asciidoc/_chapters/hbase-default.adoc  |  77 +-------------
 src/main/asciidoc/_chapters/ops_mgt.adoc        |   9 +-
 src/main/asciidoc/_chapters/performance.adoc    |  27 +++--
 src/main/asciidoc/_chapters/spark.adoc          |  38 ++++++-
 .../asciidoc/_chapters/troubleshooting.adoc     |  11 +-
 src/main/asciidoc/_chapters/zookeeper.adoc      |  30 +++---
 11 files changed, 272 insertions(+), 126 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/architecture.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/architecture.adoc 
b/src/main/asciidoc/_chapters/architecture.adoc
index 7cc20e5..faa1230 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -2060,6 +2060,107 @@ Why?
 
 NOTE: This information is now included in the configuration parameter table in 
<<compaction.parameters>>.
 
+[[ops.date.tiered]]
+===== Date Tiered Compaction
+
+Date tiered compaction is a date-aware store file compaction strategy that is 
beneficial for time-range scans for time-series data.
+
+[[ops.date.tiered.when]]
+===== When To Use Date Tiered Compactions
+
+Consider using Date Tiered Compaction for reads for limited time ranges, 
especially scans of recent data
+
+Don't use it for
+
+* random gets without a limited time range
+* frequent deletes and updates
+* Frequent out of order data writes creating long tails, especially writes 
with future timestamps
+* frequent bulk loads with heavily overlapping time ranges
+
+.Performance Improvements
+Performance testing has shown that the performance of time-range scans improve 
greatly for limited time ranges, especially scans of recent data.
+
+[[ops.date.tiered.enable]]
+====== Enabling Date Tiered Compaction
+
+You can enable Date Tiered compaction for a table or a column family, by 
setting its `hbase.hstore.engine.class` to 
`org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine`.
+
+You also need to set `hbase.hstore.blockingStoreFiles` to a high number, such 
as 60, if using all default settings, rather than the default value of 12). Use 
1.5~2 x projected file count if changing the parameters, Projected file count = 
windows per tier x tier count + incoming window min + files older than max age
+
+You also need to set `hbase.hstore.compaction.max` to the same value as 
`hbase.hstore.blockingStoreFiles` to unblock major compaction.
+
+.Procedure: Enable Date Tiered Compaction
+. Run one of following commands in the HBase shell.
+  Replace the table name `orders_table` with the name of your table.
++
+[source,sql]
+----
+alter 'orders_table', CONFIGURATION => {'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine', 
'hbase.hstore.blockingStoreFiles' => '60', 'hbase.hstore.compaction.min'=>'2', 
'hbase.hstore.compaction.max'=>'60'}
+alter 'orders_table', {NAME => 'blobs_cf', CONFIGURATION => 
{'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine', 
'hbase.hstore.blockingStoreFiles' => '60', 'hbase.hstore.compaction.min'=>'2', 
'hbase.hstore.compaction.max'=>'60'}}
+create 'orders_table', 'blobs_cf', CONFIGURATION => 
{'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine', 
'hbase.hstore.blockingStoreFiles' => '60', 'hbase.hstore.compaction.min'=>'2', 
'hbase.hstore.compaction.max'=>'60'}
+----
+
+. Configure other options if needed.
+  See <<ops.date.tiered.config>> for more information.
+
+.Procedure: Disable Date Tiered Compaction
+. Set the `hbase.hstore.engine.class` option to either nil or 
`org.apache.hadoop.hbase.regionserver.DefaultStoreEngine`.
+  Either option has the same effect.
+  Make sure you set the other options you changed to the original settings too.
++
+[source,sql]
+----
+alter 'orders_table', CONFIGURATION => {'hbase.hstore.engine.class' => 
'org.apache.hadoop.hbase.regionserver.DefaultStoreEngine'ï¼ 
'hbase.hstore.blockingStoreFiles' => '12', 'hbase.hstore.compaction.min'=>'6', 
'hbase.hstore.compaction.max'=>'12'}}
+----
+
+When you change the store engine either way, a major compaction will likely be 
performed on most regions.
+This is not necessary on new tables.
+
+[[ops.date.tiered.config]]
+====== Configuring Date Tiered Compaction
+
+Each of the settings for date tiered compaction should be configured at the 
table or column family, after disabling the table.
+If you use HBase shell, the general command pattern is as follows:
+
+[source,sql]
+----
+alter 'orders_table', CONFIGURATION => {'key' => 'value', ..., 'key' => 
'value'}}
+----
+
+[[ops.date.tiered.config.parameters]]
+.Tier Parameters
+
+You can configure your date tiers by changing the settings for the following 
parameters:
+
+.Date Tier Parameters
+[cols="1,1a", frame="all", options="header"]
+|===
+| Setting
+| Notes
+
+|`hbase.hstore.compaction.date.tiered.max.storefile.age.millis`
+|Files with max-timestamp smaller than this will no longer be 
compacted.Default at Long.MAX_VALUE.
+
+| `hbase.hstore.compaction.date.tiered.base.window.millis`
+| Base window size in milliseconds. Default at 6 hours.
+
+| `hbase.hstore.compaction.date.tiered.windows.per.tier`
+| Number of windows per tier. Default at 4.
+
+| `hbase.hstore.compaction.date.tiered.incoming.window.min`
+| Minimal number of files to compact in the incoming window. Set it to 
expected number of files in the window to avoid wasteful compaction. Default at 
6.
+
+| `hbase.hstore.compaction.date.tiered.window.policy.class`
+| The policy to select store files within the same time window. It doesnât 
apply to the incoming window. Default at exploring compaction. This is to avoid 
wasteful compaction.
+|===
+
+[[ops.date.tiered.config.compaction.throttler]]
+.Compaction Throttler
+
+With tiered compaction all servers in the cluster will promote windows to 
higher tier at the same time, so using a compaction throttle is recommended:
+Set `hbase.regionserver.throughput.controller` to 
`org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController`.
+
+NOTE: For more information about date tiered compaction, please refer to the 
design specification at 
https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8
 [[ops.stripe]]
 ===== Experimental: Stripe Compactions
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc 
b/src/main/asciidoc/_chapters/configuration.adoc
index 49b0e7d..d705db9 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -222,8 +222,8 @@ Use the following legend to interpret this table:
 |Hadoop-0.23.x | S | X | X | X | X
 |Hadoop-2.0.x-alpha | NT | X | X | X | X
 |Hadoop-2.1.0-beta | NT | X | X | X | X
-|Hadoop-2.2.0 | NT | S | NT | NT | NT
-|Hadoop-2.3.x | NT | S | NT | NT | NT
+|Hadoop-2.2.0 | NT | S | NT | NT | X 
+|Hadoop-2.3.x | NT | S | NT | NT | X 
 |Hadoop-2.4.x | NT | S | S | S | S
 |Hadoop-2.5.x | NT | S | S | S | S
 |Hadoop-2.6.0 | X | X | X | X | X
@@ -411,7 +411,7 @@ Set [var]+JAVA_HOME+ to point at the root of your +java+ 
install.
 This is the default mode.
 Standalone mode is what is described in the <<quickstart,quickstart>> section.
 In standalone mode, HBase does not use HDFS -- it uses the local filesystem 
instead -- and it runs all HBase daemons and a local ZooKeeper all up in the 
same JVM.
-Zookeeper binds to a well known port so clients may talk to HBase.
+ZooKeeper binds to a well known port so clients may talk to HBase.
 
 [[distributed]]
 === Distributed
@@ -453,7 +453,7 @@ In addition, the cluster is configured so that multiple 
cluster nodes enlist as
 These configuration basics are all demonstrated in 
<<quickstart_fully_distributed,quickstart-fully-distributed>>.
 
 .Distributed RegionServers
-Typically, your cluster will contain multiple RegionServers all running on 
different servers, as well as primary and backup Master and Zookeeper daemons.
+Typically, your cluster will contain multiple RegionServers all running on 
different servers, as well as primary and backup Master and ZooKeeper daemons.
 The _conf/regionservers_ file on the master server contains a list of hosts 
whose RegionServers are associated with this cluster.
 Each host is on a separate line.
 All hosts listed in this file will have their RegionServer processes started 
and stopped when the master server starts or stops.
@@ -703,8 +703,8 @@ Below we show what the main configuration files -- 
_hbase-site.xml_, _regionserv
     <name>hbase.cluster.distributed</name>
     <value>true</value>
     <description>The mode the cluster will be in. Possible values are
-      false: standalone and pseudo-distributed setups with managed Zookeeper
-      true: fully-distributed with unmanaged Zookeeper Quorum (see 
hbase-env.sh)
+      false: standalone and pseudo-distributed setups with managed ZooKeeper
+      true: fully-distributed with unmanaged ZooKeeper Quorum (see 
hbase-env.sh)
     </description>
   </property>
 </configuration>

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/developer.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/developer.adoc 
b/src/main/asciidoc/_chapters/developer.adoc
index be328b0..0b284bb 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -1077,6 +1077,75 @@ As most as possible, tests should use the default 
settings for the cluster.
 When they don't, they should document it.
 This will allow to share the cluster later.
 
+[[hbase.tests.example.code]]
+==== Tests Skeleton Code
+
+Here is a test skeleton code with Categorization and a Category-based timeout 
Rule to copy and paste and use as basis for test contribution.
+[source,java]
+----
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase;
+
+import static org.junit.Assert.*;
+
+import org.apache.hadoop.hbase.testclassification.SmallTests;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.rules.TestName;
+import org.junit.rules.TestRule;
+
+/**
+ * Skeleton HBase test
+ */
+// NOTICE: See how we've 'categorized' this test. All hbase unit tests need to 
be categorized as
+// either 'small', 'medium', or 'large'. See 
http://hbase.apache.org/book.html#hbase.tests
+// for more on these categories.
+@Category(SmallTests.class)
+public class TestExample {
+  // Handy test rule that allows you subsequently get at the name of the 
current method. See
+  // down in 'test()' where we use it in the 'fail' message.
+  @Rule public TestName testName = new TestName();
+
+  // Rather than put a @Test (timeout=.... on each test so for sure the test 
times out, instead
+  // just the CategoryBasedTimeout... It will apply to each test in this test 
set, the timeout
+  // that goes w/ the particular test categorization.
+  @Rule public final TestRule timeout = 
CategoryBasedTimeout.builder().withTimeout(this.getClass()).
+        withLookingForStuckThread(true).build();
+
+  @Before
+  public void setUp() throws Exception {
+  }
+
+  @After
+  public void tearDown() throws Exception {
+  }
+
+  @Test
+  public void test() {
+    fail(testName.getMethodName() + " is not yet implemented");
+  }
+}
+----
+
 [[integration.tests]]
 === Integration Tests
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/external_apis.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/external_apis.adoc 
b/src/main/asciidoc/_chapters/external_apis.adoc
index 43a428a..9a1acdc 100644
--- a/src/main/asciidoc/_chapters/external_apis.adoc
+++ b/src/main/asciidoc/_chapters/external_apis.adoc
@@ -126,7 +126,7 @@ http://example.com:8000/<table>/schema
 .Table Deletion
 To delete a table, use a `DELETE` request with the `/schema` endpoint:
 ----
-http://example.com:8000<table>/schema
+http://example.com:8000/<table>/schema
 ----
 
 .Table Regions
@@ -142,7 +142,7 @@ http://example.com:8000/<table>/regions
 To get a single cell value, use a URL scheme like the following:
 
 ----
-http://example.com:8000<table>/<row>/<column>:<qualifier>/<timestamp>/content:raw
+http://example.com:8000/<table>/<row>/<column>:<qualifier>/<timestamp>/content:raw
 ----
 
 The column qualifier and timestamp are optional. Without them, the whole row 
will
@@ -154,7 +154,7 @@ To get multiple single values, specify multiple 
column:qualifier tuples and/or a
 and end-timestamp. You can also limit the number of versions.
 
 ----
-http://example.com:8000<table>/<row>/<column>:<qualifier>?v=<num-versions>
+http://example.com:8000/<table>/<row>/<column>:<qualifier>?v=<num-versions>
 ----
 
 .Globbing Rows
@@ -162,7 +162,7 @@ To scan a series of rows, you can use a `*` glob
 character on the <row> value to glob together multiple rows.
 
 ----
-http://example.com:8000urls/https|ad.doubleclick.net|*
+http://example.com:8000/urls/https|ad.doubleclick.net|*
 ----
 
 ==== Puts
@@ -173,8 +173,8 @@ For Puts, `PUT` and `POST` are equivalent.
 The column qualifier and the timestamp are optional.
 
 ----
-http://example.com:8000put/<table>/<row>/<column>:<qualifier>/<timestamp>
-http://example.com:8000test/testrow/test:testcolumn
+http://example.com:8000/put/<table>/<row>/<column>:<qualifier>/<timestamp>
+http://example.com:8000/test/testrow/test:testcolumn
 ----
 
 .Put Multiple Values
@@ -195,7 +195,7 @@ success (201) or failure (anything else), and on successful 
scanner creation, th
 URI is returned which should be used to address the scanner.
 
 ----
-http://example.com:8000<table>/scanner
+http://example.com:8000/<table>/scanner
 ----
 
 .Scanner Get Next
@@ -203,14 +203,14 @@ To get the next batch of cells found by the scanner, use 
the `/scanner/<scanner-
 endpoint, using the URI returned by the scanner creation endpoint. If the 
scanner
 is exhausted, HTTP status `204` is returned.
 ----
-http://example.com:8000<table>/scanner/<scanner-id>
+http://example.com:8000/<table>/scanner/<scanner-id>
 ----
 
 .Scanner Deletion
 To delete resources associated with a scanner, send a HTTP `DELETE` request to 
the
 `/scanner/<scanner-id>` endpoint.
 ----
-http://example.com:8000<table>/scanner/<scanner-id>
+http://example.com:8000/<table>/scanner/<scanner-id>
 ----
 
 [[xml_schema]]
@@ -813,4 +813,4 @@ while 1:
         break
     print java.lang.String(result.row), 
java.lang.String(result.get('title:').value)
 ----
-====
\ No newline at end of file
+====

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/getting_started.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/getting_started.adoc 
b/src/main/asciidoc/_chapters/getting_started.adoc
index 7ef91b0..276a908 100644
--- a/src/main/asciidoc/_chapters/getting_started.adoc
+++ b/src/main/asciidoc/_chapters/getting_started.adoc
@@ -288,7 +288,7 @@ $
 === Intermediate - Pseudo-Distributed Local Install
 
 After working your way through <<quickstart,quickstart>>, you can re-configure 
HBase to run in pseudo-distributed mode.
-Pseudo-distributed mode means that HBase still runs completely on a single 
host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a 
separate process.
+Pseudo-distributed mode means that HBase still runs completely on a single 
host, but each HBase daemon (HMaster, HRegionServer, and ZooKeeper) runs as a 
separate process.
 By default, unless you configure the `hbase.rootdir` property as described in 
<<quickstart,quickstart>>, your data is still stored in _/tmp/_.
 In this walk-through, we store your data in HDFS instead, assuming you have 
HDFS available.
 You can skip the HDFS configuration to continue storing your data in the local 
filesystem.
@@ -429,7 +429,7 @@ You can stop HBase the same way as in the 
<<quickstart,quickstart>> procedure, u
 
 In reality, you need a fully-distributed configuration to fully test HBase and 
to use it in real-world scenarios.
 In a distributed configuration, the cluster contains multiple nodes, each of 
which runs one or more HBase daemon.
-These include primary and backup Master instances, multiple Zookeeper nodes, 
and multiple RegionServer nodes.
+These include primary and backup Master instances, multiple ZooKeeper nodes, 
and multiple RegionServer nodes.
 
 This advanced quickstart adds two more nodes to your cluster.
 The architecture will be as follows:

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/hbase-default.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc 
b/src/main/asciidoc/_chapters/hbase-default.adoc
index 26929a3..df750e0 100644
--- a/src/main/asciidoc/_chapters/hbase-default.adoc
+++ b/src/main/asciidoc/_chapters/hbase-default.adoc
@@ -173,17 +173,6 @@ A comma-separated list of BaseHFileCleanerDelegate invoked 
by
 `org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner`
 
 
-[[hbase.master.catalog.timeout]]
-*`hbase.master.catalog.timeout`*::
-+
-.Description
-Timeout value for the Catalog Janitor from the master to
-    META.
-+
-.Default
-`600000`
-
-
 [[hbase.master.infoserver.redirect]]
 *`hbase.master.infoserver.redirect`*::
 +
@@ -442,16 +431,6 @@ Maximum size of all memstores in a region server before 
flushes are forced.
 `3600000`
 
 
-[[hbase.regionserver.catalog.timeout]]
-*`hbase.regionserver.catalog.timeout`*::
-+
-.Description
-Timeout value for the Catalog Janitor from the regionserver to META.
-+
-.Default
-`600000`
-
-
 [[hbase.regionserver.dns.interface]]
 *`hbase.regionserver.dns.interface`*::
 +
@@ -522,19 +501,6 @@ Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper
 `/hbase`
 
 
-[[zookeeper.znode.rootserver]]
-*`zookeeper.znode.rootserver`*::
-+
-.Description
-Path to ZNode holding root region location. This is written by
-      the master and read by clients and region servers. If a relative path is
-      given, the parent folder will be ${zookeeper.znode.parent}. By default,
-      this means the root location is stored at /hbase/root-region-server.
-+
-.Default
-`root-region-server`
-
-
 [[zookeeper.znode.acl.parent]]
 *`zookeeper.znode.acl.parent`*::
 +
@@ -1280,8 +1246,8 @@ Used along with bucket cache, this is a float that EITHER 
represents a percentag
 `0` when specified as a float
 
 
-[[hbase.bucketcache.sizes]]
-*`hbase.bucketcache.sizes`*::
+[[hbase.bucketcache.bucket.sizes]]
+*`hbase.bucketcache.bucket.sizes`*::
 +
 .Description
 A comma-separated list of sizes for buckets for the bucketcache
@@ -1691,20 +1657,6 @@ The maximum number of pending Thrift connections waiting 
in the queue. If
 `1000`
 
 
-[[hbase.thrift.htablepool.size.max]]
-*`hbase.thrift.htablepool.size.max`*::
-+
-.Description
-The upper bound for the table pool used in the Thrift gateways server.
-      Since this is per table name, we assume a single table and so with 1000 
default
-      worker threads max this is set to a matching number. For other workloads 
this number
-      can be adjusted as needed.
-
-+
-.Default
-`1000`
-
-
 [[hbase.regionserver.thrift.framed]]
 *`hbase.regionserver.thrift.framed`*::
 +
@@ -1761,31 +1713,6 @@ File permissions that should be used to write data
 `000`
 
 
-[[hbase.metrics.showTableName]]
-*`hbase.metrics.showTableName`*::
-+
-.Description
-Whether to include the prefix "tbl.tablename" in per-column family metrics.
-       If true, for each metric M, per-cf metrics will be reported for 
tbl.T.cf.CF.M, if false,
-       per-cf metrics will be aggregated by column-family across tables, and 
reported for cf.CF.M.
-       In both cases, the aggregated metric M across tables and cfs will be 
reported.
-+
-.Default
-`true`
-
-
-[[hbase.metrics.exposeOperationTimes]]
-*`hbase.metrics.exposeOperationTimes`*::
-+
-.Description
-Whether to report metrics about time taken performing an
-      operation on the region server.  Get, Put, Delete, Increment, and Append 
can all
-      have their times exposed through Hadoop metrics per CF and per region.
-+
-.Default
-`true`
-
-
 [[hbase.snapshot.enabled]]
 *`hbase.snapshot.enabled`*::
 +

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc 
b/src/main/asciidoc/_chapters/ops_mgt.adoc
index 53aee33..4c9c7c5 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -57,7 +57,7 @@ Some commands take arguments. Pass no args or -h for usage.
   upgrade         Upgrade hbase
   master          Run an HBase HMaster node
   regionserver    Run an HBase HRegionServer node
-  zookeeper       Run a Zookeeper server
+  zookeeper       Run a ZooKeeper server
   rest            Run an HBase REST server
   thrift          Run the HBase Thrift server
   thrift2         Run the HBase Thrift2 server
@@ -1361,7 +1361,10 @@ list_peers:: list all replication relationships known by 
this cluster
 enable_peer <ID>::
   Enable a previously-disabled replication relationship
 disable_peer <ID>::
-  Disable a replication relationship. HBase will no longer send edits to that 
peer cluster, but it still keeps track of all the new WALs that it will need to 
replicate if and when it is re-enabled.
+  Disable a replication relationship. HBase will no longer send edits to that
+  peer cluster, but it still keeps track of all the new WALs that it will need
+  to replicate if and when it is re-enabled. WALs are retained when enabling 
or disabling
+  replication as long as peers exist.
 remove_peer <ID>::
   Disable and remove a replication relationship. HBase will no longer send 
edits to that peer cluster or keep track of WALs.
 enable_table_replication <TABLE_NAME>::
@@ -1501,6 +1504,8 @@ The default behavior is augmented so that if a log is 
past its TTL, the cleaning
 If the log is not found in any queues, the log will be deleted.
 The next time the cleaning process needs to look for a log, it starts by using 
its cached list.
 
+NOTE: WALs are saved when replication is enabled or disabled as long as peers 
exist.
+
 [[rs.failover.details]]
 ==== Region Server Failover
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/performance.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/performance.adoc 
b/src/main/asciidoc/_chapters/performance.adoc
index 66dd489..efb6ace 100644
--- a/src/main/asciidoc/_chapters/performance.adoc
+++ b/src/main/asciidoc/_chapters/performance.adoc
@@ -207,6 +207,11 @@ tableDesc.addFamily(cfDesc);
 See the API documentation for
 
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html[CacheConfig].
 
+To see prefetch in operation, enable TRACE level logging on
+`org.apache.hadoop.hbase.io.hfile.HFileReaderImpl` in hbase-2.0+
+or on `org.apache.hadoop.hbase.io.hfile.HFileReaderV2` in earlier versions, 
hbase-1.x, of HBase.
+
+
 [[perf.rs.memstore.size]]
 === `hbase.regionserver.global.memstore.size`
 
@@ -361,7 +366,7 @@ Bloom filters need to be rebuilt upon deletion, so may not 
be appropriate in env
 
 Bloom filters are enabled on a Column Family.
 You can do this by using the setBloomFilterType method of HColumnDescriptor or 
using the HBase API.
-Valid values are `NONE` (the default), `ROW`, or `ROWCOL`.
+Valid values are `NONE`, `ROW` (default), or `ROWCOL`.
 See <<bloom.filters.when>> for more information on `ROW` versus `ROWCOL`.
 See also the API documentation for 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
 
@@ -382,17 +387,17 @@ You can configure the following settings in the 
_hbase-site.xml_.
 | Default
 | Description
 
-| io.hfile.bloom.enabled
+| io.storefile.bloom.enabled
 | yes
 | Set to no to kill bloom filters server-wide if something goes wrong
 
-| io.hfile.bloom.error.rate
+| io.storefile.bloom.error.rate
 | .01
 | The average false positive rate for bloom filters. Folding is used to
                   maintain the false positive rate. Expressed as a decimal 
representation of a
                   percentage.
 
-| io.hfile.bloom.max.fold
+| io.storefile.bloom.max.fold
 | 7
 | The guaranteed maximum fold rate. Changing this setting should not be
                   necessary and is not recommended.
@@ -406,7 +411,7 @@ You can configure the following settings in the 
_hbase-site.xml_.
 | Master switch to enable Delete Family Bloom filters and store them in the 
StoreFile.
 
 | io.storefile.bloom.block.size
-| 65536
+| 131072
 | Target Bloom block size. Bloom filter blocks of approximately this size
                   are interleaved with data blocks.
 
@@ -713,20 +718,20 @@ Stored in the LRU cache, if it is enabled (It's enabled 
by default).
 [[config.bloom]]
 ==== Bloom Filter Configuration
 
-===== `io.hfile.bloom.enabled` global kill switch
+===== `io.storefile.bloom.enabled` global kill switch
 
-`io.hfile.bloom.enabled` in `Configuration` serves as the kill switch in case 
something goes wrong.
+`io.storefile.bloom.enabled` in `Configuration` serves as the kill switch in 
case something goes wrong.
 Default = `true`.
 
-===== `io.hfile.bloom.error.rate`
+===== `io.storefile.bloom.error.rate`
 
-`io.hfile.bloom.error.rate` = average false positive rate.
+`io.storefile.bloom.error.rate` = average false positive rate.
 Default = 1%. Decrease rate by Â½ (e.g.
 to .5%) == +1 bit per bloom entry.
 
-===== `io.hfile.bloom.max.fold`
+===== `io.storefile.bloom.max.fold`
 
-`io.hfile.bloom.max.fold` = guaranteed minimum fold rate.
+`io.storefile.bloom.max.fold` = guaranteed minimum fold rate.
 Most people should leave this alone.
 Default = 7, or can collapse to at least 1/128th of original size.
 See the _Development Process_ section of the document 
link:https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf[BloomFilters
 in HBase] for more on what this option means.

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/spark.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/spark.adoc 
b/src/main/asciidoc/_chapters/spark.adoc
index b1bdb5d..88918aa 100644
--- a/src/main/asciidoc/_chapters/spark.adoc
+++ b/src/main/asciidoc/_chapters/spark.adoc
@@ -395,6 +395,42 @@ The HBase-Spark module includes support for Spark SQL and 
DataFrames, which allo
 you to write SparkSQL directly on HBase tables. In addition the HBase-Spark
 will push down query filtering logic to HBase.
 
+In HBaseSparkConf, four parameters related to timestamp can be set. They are 
TIMESTAMP,
+MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query 
records
+with different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP.
+In the meantime, use concrete value instead of tsSpecified and oldMs in the 
examples below.
+
+.Query with different timestamps
+====
+
+The example below shows how to load df DataFrame with different timestamps.
+tsSpecified is specified by the user.
+HBaseTableCatalog defines the HBase and Relation relation schema.
+writeCatalog defines catalog for the schema mapping.
+----
+val df = sqlContext.read
+      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, 
HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
+      .format("org.apache.hadoop.hbase.spark")
+      .load()
+----
+
+The example below shows how to load df DataFrame with different time ranges.
+oldMs is specified by the user.
+----
+val df = sqlContext.read
+      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, 
HBaseSparkConf.MIN_TIMESTAMP -> "0",
+        HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))
+      .format("org.apache.hadoop.hbase.spark")
+      .load()
+----
+
+After loading df DataFrame, users can query data.
+----
+    df.registerTempTable("table")
+    sqlContext.sql("select count(col1) from table").show
+----
+====
+
 === Predicate Push Down
 
 There are two examples of predicate push down in the HBase-Spark 
implementation.
@@ -515,4 +551,4 @@ The last major point to note in the example is the 
`sqlContext.sql` function, wh
 allows the user to ask their questions in SQL which will be pushed down to the
 DefaultSource code in the HBase-Spark module. The result of this command will 
be
 a DataFrame with the Schema of KEY_FIELD and B_FIELD.
-====
\ No newline at end of file
+====

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc 
b/src/main/asciidoc/_chapters/troubleshooting.adoc
index 8b2011d..fc9aadb 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -859,11 +859,14 @@ Snapshots::
   HBase Shell commands for managing them. For more information, see 
<<ops.snapshots>>.
 
 WAL::
-  Write-ahead logs (WALs) are stored in subdirectories of `/hbase/.logs/`, 
depending
-  on their status. Already-processed WALs are stored in 
`/hbase/.logs/oldWALs/` and
-  corrupt WALs are stored in `/hbase/.logs/.corrupt/` for examination.
-  If the size of any subdirectory of `/hbase/.logs/` is growing, examine the 
HBase
+  Write-ahead logs (WALs) are stored in subdirectories of the HBase root 
directory,
+  typically `/hbase/`, depending on their status. Already-processed WALs are 
stored
+  in `/hbase/oldWALs/` and corrupt WALs are stored in `/hbase/.corrupt/` for 
examination.
+  If the size of one of these subdirectories is growing, examine the HBase
   server logs to find the root cause for why WALs are not being processed 
correctly.
++
+If you use replication and `/hbase/oldWALs/` is using more space than you 
expect,
+remember that WALs are saved when replication is disabled, as long as there 
are peers.
 
 *Do not* manage WALs manually via HDFS.
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/a82121b2/src/main/asciidoc/_chapters/zookeeper.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/zookeeper.adoc 
b/src/main/asciidoc/_chapters/zookeeper.adoc
index 565ef98..ee76d80 100644
--- a/src/main/asciidoc/_chapters/zookeeper.adoc
+++ b/src/main/asciidoc/_chapters/zookeeper.adoc
@@ -108,7 +108,7 @@ If running zookeeper 3.5+, you can ask hbase to make use of 
the new multi operat
 .ZooKeeper Maintenance
 [CAUTION]
 ====
-Be sure to set up the data dir cleaner described under 
link:http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[Zookeeper
+Be sure to set up the data dir cleaner described under 
link:http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper
         Maintenance] else you could have 'interesting' problems a couple of 
months in; i.e.
 zookeeper could start dropping sessions if it has to run through a directory 
of hundreds of thousands of logs which is wont to do around leader reelection 
time -- a process rare but run on occasion whether because a machine is dropped 
or happens to hiccup.
 ====
@@ -120,7 +120,7 @@ To point HBase at an existing ZooKeeper cluster, one that 
is not managed by HBas
 ----
 
   ...
-  # Tell HBase whether it should manage its own instance of Zookeeper or not.
+  # Tell HBase whether it should manage its own instance of ZooKeeper or not.
   export HBASE_MANAGES_ZK=false
 ----
 
@@ -145,10 +145,10 @@ Additionally, see the 
link:http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKee
 [[zk.sasl.auth]]
 == SASL Authentication with ZooKeeper
 
-Newer releases of Apache HBase (>= 0.92) will support connecting to a 
ZooKeeper Quorum that supports SASL authentication (which is available in 
Zookeeper versions 3.4.0 or later).
+Newer releases of Apache HBase (>= 0.92) will support connecting to a 
ZooKeeper Quorum that supports SASL authentication (which is available in 
ZooKeeper versions 3.4.0 or later).
 
 This describes how to set up HBase to mutually authenticate with a ZooKeeper 
Quorum.
-ZooKeeper/HBase mutual authentication 
(link:https://issues.apache.org/jira/browse/HBASE-2418[HBASE-2418]) is required 
as part of a complete secure HBase configuration 
(link:https://issues.apache.org/jira/browse/HBASE-3025[HBASE-3025]). For 
simplicity of explication, this section ignores additional configuration 
required (Secure HDFS and Coprocessor configuration). It's recommended to begin 
with an HBase-managed Zookeeper configuration (as opposed to a standalone 
Zookeeper quorum) for ease of learning.
+ZooKeeper/HBase mutual authentication 
(link:https://issues.apache.org/jira/browse/HBASE-2418[HBASE-2418]) is required 
as part of a complete secure HBase configuration 
(link:https://issues.apache.org/jira/browse/HBASE-3025[HBASE-3025]). For 
simplicity of explication, this section ignores additional configuration 
required (Secure HDFS and Coprocessor configuration). It's recommended to begin 
with an HBase-managed ZooKeeper configuration (as opposed to a standalone 
ZooKeeper quorum) for ease of learning.
 
 === Operating System Prerequisites
 
@@ -165,7 +165,7 @@ Each user who will be an HBase client should also be given 
a Kerberos principal.
 This principal should usually have a password assigned to it (as opposed to, 
as with the HBase servers, a keytab file) which only this user knows.
 The client's principal's `maxrenewlife` should be set so that it can be 
renewed enough so that the user can complete their HBase client processes.
 For example, if a user runs a long-running HBase client process that takes at 
most 3 days, we might create this user's principal within `kadmin` with: 
`addprinc -maxrenewlife 3days`.
-The Zookeeper client and server libraries manage their own ticket refreshment 
by running threads that wake up periodically to do the refreshment.
+The ZooKeeper client and server libraries manage their own ticket refreshment 
by running threads that wake up periodically to do the refreshment.
 
 On each host that will run an HBase client (e.g. `hbase shell`), add the 
following file to the HBase home directory's _conf_ directory:
 
@@ -181,7 +181,7 @@ Client {
 
 We'll refer to this JAAS configuration file as _$CLIENT_CONF_        below.
 
-=== HBase-managed Zookeeper Configuration
+=== HBase-managed ZooKeeper Configuration
 
 On each node that will run a zookeeper, a master, or a regionserver, create a 
link:http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html[JAAS]
        configuration file in the conf directory of the node's _HBASE_HOME_     
   directory that looks like the following:
 
@@ -207,7 +207,7 @@ Client {
 
 where the _$PATH_TO_HBASE_KEYTAB_ and _$PATH_TO_ZOOKEEPER_KEYTAB_ files are 
what you created above, and `$HOST` is the hostname for that node.
 
-The `Server` section will be used by the Zookeeper quorum server, while the 
`Client` section will be used by the HBase master and regionservers.
+The `Server` section will be used by the ZooKeeper quorum server, while the 
`Client` section will be used by the HBase master and regionservers.
 The path to this file should be substituted for the text _$HBASE_SERVER_CONF_ 
in the _hbase-env.sh_ listing below.
 
 The path to this file should be substituted for the text _$CLIENT_CONF_ in the 
_hbase-env.sh_ listing below.
@@ -255,7 +255,7 @@ Modify your _hbase-site.xml_ on each node that will run 
zookeeper, master or reg
 </configuration>
 ----
 
-where `$ZK_NODES` is the comma-separated list of hostnames of the Zookeeper 
Quorum hosts.
+where `$ZK_NODES` is the comma-separated list of hostnames of the ZooKeeper 
Quorum hosts.
 
 Start your hbase cluster by running one or more of the following set of 
commands on the appropriate hosts:
 
@@ -266,7 +266,7 @@ bin/hbase master start
 bin/hbase regionserver start
 ----
 
-=== External Zookeeper Configuration
+=== External ZooKeeper Configuration
 
 Add a JAAS configuration file that looks like:
 
@@ -326,7 +326,7 @@ Modify your _hbase-site.xml_ on each node that will run a 
master or regionserver
 </configuration>
 ----
 
-where `$ZK_NODES` is the comma-separated list of hostnames of the Zookeeper 
Quorum hosts.
+where `$ZK_NODES` is the comma-separated list of hostnames of the ZooKeeper 
Quorum hosts.
 
 Also on each of these hosts, create a JAAS configuration file containing:
 
@@ -346,7 +346,7 @@ Server {
 where `$HOST` is the hostname of each Quorum host.
 We will refer to the full pathname of this file as _$ZK_SERVER_CONF_ below.
 
-Start your Zookeepers on each Zookeeper Quorum host with:
+Start your ZooKeepers on each ZooKeeper Quorum host with:
 
 [source,bourne]
 ----
@@ -362,9 +362,9 @@ bin/hbase master start
 bin/hbase regionserver start
 ----
 
-=== Zookeeper Server Authentication Log Output
+=== ZooKeeper Server Authentication Log Output
 
-If the configuration above is successful, you should see something similar to 
the following in your Zookeeper server logs:
+If the configuration above is successful, you should see something similar to 
the following in your ZooKeeper server logs:
 
 ----
 
@@ -382,9 +382,9 @@ If the configuration above is successful, you should see 
something similar to th
 11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for 
authorizationID: hbase
 ----
 
-=== Zookeeper Client Authentication Log Output
+=== ZooKeeper Client Authentication Log Output
 
-On the Zookeeper client side (HBase master or regionserver), you should see 
something similar to the following:
+On the ZooKeeper client side (HBase master or regionserver), you should see 
something similar to the following:
 
 ----

[2/3] hbase git commit: updating docs from master

Reply via email to