Repository: hbase
Updated Branches:
  refs/heads/master 1a6eea335 -> bcfc6d65a


HBASE-11781 Document new TableMapReduceUtil scanning options (Misty 
Stanley-Jones)


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/bcfc6d65
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/bcfc6d65
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/bcfc6d65

Branch: refs/heads/master
Commit: bcfc6d65af7860c83ec147f673cd1ff8970290c4
Parents: 1a6eea3
Author: Jonathan M Hsieh <jmhs...@apache.org>
Authored: Wed Sep 3 16:24:59 2014 -0700
Committer: Jonathan M Hsieh <jmhs...@apache.org>
Committed: Wed Sep 3 16:24:59 2014 -0700

----------------------------------------------------------------------
 src/main/docbkx/book.xml | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/bcfc6d65/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index 3d8216b..19dd770 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -1048,6 +1048,42 @@ $ <userinput>HADOOP_CLASSPATH=$(hbase 
mapredcp):/etc/hbase/conf hadoop jar MyApp
       </caution>
     </section>
 
+    <section>
+      <title>MapReduce Scan Caching</title>
+      <para>TableMapReduceUtil now restores the option to set scanner caching 
(the number of rows
+        which are cached before returning the result to the client) on the 
Scan object that is
+        passed in. This functionality was lost due to a bug in HBase 0.95 
(<link
+          
xlink:href="https://issues.apache.org/jira/browse/HBASE-11558";>HBASE-11558</link>),
 which
+        is fixed for HBase 0.98.5 and 0.96.3. The priority order for choosing 
the scanner caching is
+        as follows:</para>
+      <orderedlist>
+        <listitem>
+          <para>Caching settings which are set on the scan object.</para>
+        </listitem>
+        <listitem>
+          <para>Caching settings which are specified via the configuration 
option
+              <option>hbase.client.scanner.caching</option>, which can either 
be set manually in
+              <filename>hbase-site.xml</filename> or via the helper method
+              <code>TableMapReduceUtil.setScannerCaching()</code>.</para>
+        </listitem>
+        <listitem>
+          <para>The default value 
<code>HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING</code>, which is set to
+            <literal>100</literal>.</para>
+        </listitem>
+      </orderedlist>
+      <para>Optimizing the caching settings is a balance between the time the 
client waits for a
+        result and the number of sets of results the client needs to receive. 
If the caching setting
+        is too large, the client could end up waiting for a long time or the 
request could even time
+        out. If the setting is too small, the scan needs to return results in 
several pieces.
+        If you think of the scan as a shovel, a bigger cache setting is 
analogous to a bigger
+        shovel, and a smaller cache setting is equivalent to more shoveling in 
order to fill the
+        bucket.</para>
+      <para>The list of priorities mentioned above allows you to set a 
reasonable default, and
+        override it for specific operations.</para>
+      <para>See the API documentation for <link
+          
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html";
+          >Scan</link> for more details.</para>
+    </section>
 
     <section>
       <title>Bundled HBase MapReduce Jobs</title>

Reply via email to