[42/51] [partial] hbase-site git commit: Published site at .

git-site-role Tue, 21 Nov 2017 07:18:12 -0800

http://git-wip-us.apache.org/repos/asf/hbase-site/blob/1a616706/book.html
----------------------------------------------------------------------
diff --git a/book.html b/book.html
index 14183f4..300459d 100644
--- a/book.html
+++ b/book.html
@@ -138,148 +138,166 @@
 <li><a href="#hbase_mob">75. Storing Medium-sized Objects (MOB)</a></li>
 </ul>
 </li>
+<li><a href="#casestudies">Backup and Restore</a>
+<ul class="sectlevel1">
+<li><a href="#br.overview">76. Overview</a></li>
+<li><a href="#br.terminology">77. Terminology</a></li>
+<li><a href="#br.planning">78. Planning</a></li>
+<li><a href="#br.initial.setup">79. First-time configuration steps</a></li>
+<li><a href="#_backup_and_restore_commands">80. Backup and Restore 
commands</a></li>
+<li><a href="#br.administration">81. Administration of Backup Images</a></li>
+<li><a href="#br.backup.configuration">82. Configuration keys</a></li>
+<li><a href="#br.best.practices">83. Best Practices</a></li>
+<li><a href="#br.s3.backup.scenario">84. Scenario: Safeguarding Application 
Datasets on Amazon S3</a></li>
+<li><a href="#br.data.security">85. Security of Backup Data</a></li>
+<li><a href="#br.technical.details">86. Technical Details of Incremental 
Backup and Restore</a></li>
+<li><a href="#br.filesystem.growth.warning">87. A Warning on File System 
Growth</a></li>
+<li><a href="#br.backup.capacity.planning">88. Capacity Planning</a></li>
+<li><a href="#br.limitations">89. Limitations of the Backup and Restore 
Utility</a></li>
+</ul>
+</li>
 <li><a href="#hbase_apis">Apache HBase APIs</a>
 <ul class="sectlevel1">
-<li><a href="#_examples">76. Examples</a></li>
+<li><a href="#_examples">90. Examples</a></li>
 </ul>
 </li>
 <li><a href="#external_apis">Apache HBase External APIs</a>
 <ul class="sectlevel1">
-<li><a href="#_rest">77. REST</a></li>
-<li><a href="#_thrift">78. Thrift</a></li>
-<li><a href="#c">79. C/C++ Apache HBase Client</a></li>
-<li><a href="#jdo">80. Using Java Data Objects (JDO) with HBase</a></li>
-<li><a href="#scala">81. Scala</a></li>
-<li><a href="#jython">82. Jython</a></li>
+<li><a href="#_rest">91. REST</a></li>
+<li><a href="#_thrift">92. Thrift</a></li>
+<li><a href="#c">93. C/C++ Apache HBase Client</a></li>
+<li><a href="#jdo">94. Using Java Data Objects (JDO) with HBase</a></li>
+<li><a href="#scala">95. Scala</a></li>
+<li><a href="#jython">96. Jython</a></li>
 </ul>
 </li>
 <li><a href="#thrift">Thrift API and Filter Language</a>
 <ul class="sectlevel1">
-<li><a href="#thrift.filter_language">83. Filter Language</a></li>
+<li><a href="#thrift.filter_language">97. Filter Language</a></li>
 </ul>
 </li>
 <li><a href="#spark">HBase and Spark</a>
 <ul class="sectlevel1">
-<li><a href="#_basic_spark">84. Basic Spark</a></li>
-<li><a href="#_spark_streaming">85. Spark Streaming</a></li>
-<li><a href="#_bulk_load">86. Bulk Load</a></li>
-<li><a href="#_sparksql_dataframes">87. SparkSQL/DataFrames</a></li>
+<li><a href="#_basic_spark">98. Basic Spark</a></li>
+<li><a href="#_spark_streaming">99. Spark Streaming</a></li>
+<li><a href="#_bulk_load">100. Bulk Load</a></li>
+<li><a href="#_sparksql_dataframes">101. SparkSQL/DataFrames</a></li>
 </ul>
 </li>
 <li><a href="#cp">Apache HBase Coprocessors</a>
 <ul class="sectlevel1">
-<li><a href="#_coprocessor_overview">88. Coprocessor Overview</a></li>
-<li><a href="#_types_of_coprocessors">89. Types of Coprocessors</a></li>
-<li><a href="#cp_loading">90. Loading Coprocessors</a></li>
-<li><a href="#cp_example">91. Examples</a></li>
-<li><a href="#_guidelines_for_deploying_a_coprocessor">92. Guidelines For 
Deploying A Coprocessor</a></li>
-<li><a href="#_restricting_coprocessor_usage">93. Restricting Coprocessor 
Usage</a></li>
+<li><a href="#_coprocessor_overview">102. Coprocessor Overview</a></li>
+<li><a href="#_types_of_coprocessors">103. Types of Coprocessors</a></li>
+<li><a href="#cp_loading">104. Loading Coprocessors</a></li>
+<li><a href="#cp_example">105. Examples</a></li>
+<li><a href="#_guidelines_for_deploying_a_coprocessor">106. Guidelines For 
Deploying A Coprocessor</a></li>
+<li><a href="#_restricting_coprocessor_usage">107. Restricting Coprocessor 
Usage</a></li>
 </ul>
 </li>
 <li><a href="#performance">Apache HBase Performance Tuning</a>
 <ul class="sectlevel1">
-<li><a href="#perf.os">94. Operating System</a></li>
-<li><a href="#perf.network">95. Network</a></li>
-<li><a href="#jvm">96. Java</a></li>
-<li><a href="#perf.configurations">97. HBase Configurations</a></li>
-<li><a href="#perf.zookeeper">98. ZooKeeper</a></li>
-<li><a href="#perf.schema">99. Schema Design</a></li>
-<li><a href="#perf.general">100. HBase General Patterns</a></li>
-<li><a href="#perf.writing">101. Writing to HBase</a></li>
-<li><a href="#perf.reading">102. Reading from HBase</a></li>
-<li><a href="#perf.deleting">103. Deleting from HBase</a></li>
-<li><a href="#perf.hdfs">104. HDFS</a></li>
-<li><a href="#perf.ec2">105. Amazon EC2</a></li>
-<li><a href="#perf.hbase.mr.cluster">106. Collocating HBase and 
MapReduce</a></li>
-<li><a href="#perf.casestudy">107. Case Studies</a></li>
+<li><a href="#perf.os">108. Operating System</a></li>
+<li><a href="#perf.network">109. Network</a></li>
+<li><a href="#jvm">110. Java</a></li>
+<li><a href="#perf.configurations">111. HBase Configurations</a></li>
+<li><a href="#perf.zookeeper">112. ZooKeeper</a></li>
+<li><a href="#perf.schema">113. Schema Design</a></li>
+<li><a href="#perf.general">114. HBase General Patterns</a></li>
+<li><a href="#perf.writing">115. Writing to HBase</a></li>
+<li><a href="#perf.reading">116. Reading from HBase</a></li>
+<li><a href="#perf.deleting">117. Deleting from HBase</a></li>
+<li><a href="#perf.hdfs">118. HDFS</a></li>
+<li><a href="#perf.ec2">119. Amazon EC2</a></li>
+<li><a href="#perf.hbase.mr.cluster">120. Collocating HBase and 
MapReduce</a></li>
+<li><a href="#perf.casestudy">121. Case Studies</a></li>
 </ul>
 </li>
 <li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a>
 <ul class="sectlevel1">
-<li><a href="#trouble.general">108. General Guidelines</a></li>
-<li><a href="#trouble.log">109. Logs</a></li>
-<li><a href="#trouble.resources">110. Resources</a></li>
-<li><a href="#trouble.tools">111. Tools</a></li>
-<li><a href="#trouble.client">112. Client</a></li>
-<li><a href="#trouble.mapreduce">113. MapReduce</a></li>
-<li><a href="#trouble.namenode">114. NameNode</a></li>
-<li><a href="#trouble.network">115. Network</a></li>
-<li><a href="#trouble.rs">116. RegionServer</a></li>
-<li><a href="#trouble.master">117. Master</a></li>
-<li><a href="#trouble.zookeeper">118. ZooKeeper</a></li>
-<li><a href="#trouble.ec2">119. Amazon EC2</a></li>
-<li><a href="#trouble.versions">120. HBase and Hadoop version issues</a></li>
-<li><a href="#_ipc_configuration_conflicts_with_hadoop">121. IPC Configuration 
Conflicts with Hadoop</a></li>
-<li><a href="#_hbase_and_hdfs">122. HBase and HDFS</a></li>
-<li><a href="#trouble.tests">123. Running unit or integration tests</a></li>
-<li><a href="#trouble.casestudy">124. Case Studies</a></li>
-<li><a href="#trouble.crypto">125. Cryptographic Features</a></li>
-<li><a href="#_operating_system_specific_issues">126. Operating System 
Specific Issues</a></li>
-<li><a href="#_jdk_issues">127. JDK Issues</a></li>
+<li><a href="#trouble.general">122. General Guidelines</a></li>
+<li><a href="#trouble.log">123. Logs</a></li>
+<li><a href="#trouble.resources">124. Resources</a></li>
+<li><a href="#trouble.tools">125. Tools</a></li>
+<li><a href="#trouble.client">126. Client</a></li>
+<li><a href="#trouble.mapreduce">127. MapReduce</a></li>
+<li><a href="#trouble.namenode">128. NameNode</a></li>
+<li><a href="#trouble.network">129. Network</a></li>
+<li><a href="#trouble.rs">130. RegionServer</a></li>
+<li><a href="#trouble.master">131. Master</a></li>
+<li><a href="#trouble.zookeeper">132. ZooKeeper</a></li>
+<li><a href="#trouble.ec2">133. Amazon EC2</a></li>
+<li><a href="#trouble.versions">134. HBase and Hadoop version issues</a></li>
+<li><a href="#_ipc_configuration_conflicts_with_hadoop">135. IPC Configuration 
Conflicts with Hadoop</a></li>
+<li><a href="#_hbase_and_hdfs">136. HBase and HDFS</a></li>
+<li><a href="#trouble.tests">137. Running unit or integration tests</a></li>
+<li><a href="#trouble.casestudy">138. Case Studies</a></li>
+<li><a href="#trouble.crypto">139. Cryptographic Features</a></li>
+<li><a href="#_operating_system_specific_issues">140. Operating System 
Specific Issues</a></li>
+<li><a href="#_jdk_issues">141. JDK Issues</a></li>
 </ul>
 </li>
 <li><a href="#casestudies">Apache HBase Case Studies</a>
 <ul class="sectlevel1">
-<li><a href="#casestudies.overview">128. Overview</a></li>
-<li><a href="#casestudies.schema">129. Schema Design</a></li>
-<li><a href="#casestudies.perftroub">130. Performance/Troubleshooting</a></li>
+<li><a href="#casestudies.overview">142. Overview</a></li>
+<li><a href="#casestudies.schema">143. Schema Design</a></li>
+<li><a href="#casestudies.perftroub">144. Performance/Troubleshooting</a></li>
 </ul>
 </li>
 <li><a href="#ops_mgt">Apache HBase Operational Management</a>
 <ul class="sectlevel1">
-<li><a href="#tools">131. HBase Tools and Utilities</a></li>
-<li><a href="#ops.regionmgt">132. Region Management</a></li>
-<li><a href="#node.management">133. Node Management</a></li>
-<li><a href="#hbase_metrics">134. HBase Metrics</a></li>
-<li><a href="#ops.monitoring">135. HBase Monitoring</a></li>
-<li><a href="#_cluster_replication">136. Cluster Replication</a></li>
-<li><a href="#_running_multiple_workloads_on_a_single_cluster">137. Running 
Multiple Workloads On a Single Cluster</a></li>
-<li><a href="#ops.backup">138. HBase Backup</a></li>
-<li><a href="#ops.snapshots">139. HBase Snapshots</a></li>
-<li><a href="#snapshots_azure">140. Storing Snapshots in Microsoft Azure Blob 
Storage</a></li>
-<li><a href="#ops.capacity">141. Capacity Planning and Region Sizing</a></li>
-<li><a href="#table.rename">142. Table Rename</a></li>
-<li><a href="#rsgroup">143. RegionServer Grouping</a></li>
+<li><a href="#tools">145. HBase Tools and Utilities</a></li>
+<li><a href="#ops.regionmgt">146. Region Management</a></li>
+<li><a href="#node.management">147. Node Management</a></li>
+<li><a href="#hbase_metrics">148. HBase Metrics</a></li>
+<li><a href="#ops.monitoring">149. HBase Monitoring</a></li>
+<li><a href="#_cluster_replication">150. Cluster Replication</a></li>
+<li><a href="#_running_multiple_workloads_on_a_single_cluster">151. Running 
Multiple Workloads On a Single Cluster</a></li>
+<li><a href="#ops.backup">152. HBase Backup</a></li>
+<li><a href="#ops.snapshots">153. HBase Snapshots</a></li>
+<li><a href="#snapshots_azure">154. Storing Snapshots in Microsoft Azure Blob 
Storage</a></li>
+<li><a href="#ops.capacity">155. Capacity Planning and Region Sizing</a></li>
+<li><a href="#table.rename">156. Table Rename</a></li>
+<li><a href="#rsgroup">157. RegionServer Grouping</a></li>
 </ul>
 </li>
 <li><a href="#developer">Building and Developing Apache HBase</a>
 <ul class="sectlevel1">
-<li><a href="#getting.involved">144. Getting Involved</a></li>
-<li><a href="#repos">145. Apache HBase Repositories</a></li>
-<li><a href="#_ides">146. IDEs</a></li>
-<li><a href="#build">147. Building Apache HBase</a></li>
-<li><a href="#releasing">148. Releasing Apache HBase</a></li>
-<li><a href="#hbase.rc.voting">149. Voting on Release Candidates</a></li>
-<li><a href="#documentation">150. Generating the HBase Reference Guide</a></li>
-<li><a href="#hbase.org">151. Updating <a 
href="https://hbase.apache.org";>hbase.apache.org</a></a></li>
-<li><a href="#hbase.tests">152. Tests</a></li>
-<li><a href="#developing">153. Developer Guidelines</a></li>
+<li><a href="#getting.involved">158. Getting Involved</a></li>
+<li><a href="#repos">159. Apache HBase Repositories</a></li>
+<li><a href="#_ides">160. IDEs</a></li>
+<li><a href="#build">161. Building Apache HBase</a></li>
+<li><a href="#releasing">162. Releasing Apache HBase</a></li>
+<li><a href="#hbase.rc.voting">163. Voting on Release Candidates</a></li>
+<li><a href="#documentation">164. Generating the HBase Reference Guide</a></li>
+<li><a href="#hbase.org">165. Updating <a 
href="https://hbase.apache.org";>hbase.apache.org</a></a></li>
+<li><a href="#hbase.tests">166. Tests</a></li>
+<li><a href="#developing">167. Developer Guidelines</a></li>
 </ul>
 </li>
 <li><a href="#unit.tests">Unit Testing HBase Applications</a>
 <ul class="sectlevel1">
-<li><a href="#_junit">154. JUnit</a></li>
-<li><a href="#mockito">155. Mockito</a></li>
-<li><a href="#_mrunit">156. MRUnit</a></li>
-<li><a href="#_integration_testing_with_an_hbase_mini_cluster">157. 
Integration Testing with an HBase Mini-Cluster</a></li>
+<li><a href="#_junit">168. JUnit</a></li>
+<li><a href="#mockito">169. Mockito</a></li>
+<li><a href="#_mrunit">170. MRUnit</a></li>
+<li><a href="#_integration_testing_with_an_hbase_mini_cluster">171. 
Integration Testing with an HBase Mini-Cluster</a></li>
 </ul>
 </li>
 <li><a href="#protobuf">Protobuf in HBase</a>
 <ul class="sectlevel1">
-<li><a href="#_protobuf">158. Protobuf</a></li>
+<li><a href="#_protobuf">172. Protobuf</a></li>
 </ul>
 </li>
 <li><a href="#zookeeper">ZooKeeper</a>
 <ul class="sectlevel1">
-<li><a href="#_using_existing_zookeeper_ensemble">159. Using existing 
ZooKeeper ensemble</a></li>
-<li><a href="#zk.sasl.auth">160. SASL Authentication with ZooKeeper</a></li>
+<li><a href="#_using_existing_zookeeper_ensemble">173. Using existing 
ZooKeeper ensemble</a></li>
+<li><a href="#zk.sasl.auth">174. SASL Authentication with ZooKeeper</a></li>
 </ul>
 </li>
 <li><a href="#community">Community</a>
 <ul class="sectlevel1">
-<li><a href="#_decisions">161. Decisions</a></li>
-<li><a href="#community.roles">162. Community Roles</a></li>
-<li><a href="#hbase.commit.msg.format">163. Commit Message format</a></li>
+<li><a href="#_decisions">175. Decisions</a></li>
+<li><a href="#community.roles">176. Community Roles</a></li>
+<li><a href="#hbase.commit.msg.format">177. Commit Message format</a></li>
 </ul>
 </li>
 <li><a href="#_appendix">Appendix</a>
@@ -289,7 +307,7 @@
 <li><a href="#hbck.in.depth">Appendix C: hbck In Depth</a></li>
 <li><a href="#appendix_acl_matrix">Appendix D: Access Control Matrix</a></li>
 <li><a href="#compression">Appendix E: Compression and Data Block Encoding In 
HBase</a></li>
-<li><a href="#data.block.encoding.enable">164. Enable Data Block 
Encoding</a></li>
+<li><a href="#data.block.encoding.enable">178. Enable Data Block 
Encoding</a></li>
 <li><a href="#sql">Appendix F: SQL over HBase</a></li>
 <li><a href="#ycsb">Appendix G: YCSB</a></li>
 <li><a href="#_hfile_format_2">Appendix H: HFile format</a></li>
@@ -298,8 +316,8 @@
 <li><a href="#asf">Appendix K: HBase and the Apache Software 
Foundation</a></li>
 <li><a href="#orca">Appendix L: Apache HBase Orca</a></li>
 <li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in 
HBase</a></li>
-<li><a href="#tracing.client.modifications">165. Client Modifications</a></li>
-<li><a href="#tracing.client.shell">166. Tracing from HBase Shell</a></li>
+<li><a href="#tracing.client.modifications">179. Client Modifications</a></li>
+<li><a href="#tracing.client.shell">180. Tracing from HBase Shell</a></li>
 <li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li>
 </ul>
 </li>
@@ -17452,6 +17470,1275 @@ hbase&gt; major_compact 't1', 'c1â, âMOBâ</pre>
 </div>
 </div>
 </div>
+<h1 id="casestudies" class="sect0"><a class="anchor" 
href="#casestudies"></a>Backup and Restore</h1>
+<div class="sect1">
+<h2 id="br.overview"><a class="anchor" href="#br.overview"></a>76. 
Overview</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Backup and restore is a standard operation provided by many databases. An 
effective backup and restore
+strategy helps ensure that users can recover data in case of unexpected 
failures. The HBase backup and restore
+feature helps ensure that enterprises using HBase as a canonical data 
repository can recover from catastrophic
+failures. Another important feature is the ability to restore the database to 
a particular
+point-in-time, commonly referred to as a snapshot.</p>
+</div>
+<div class="paragraph">
+<p>The HBase backup and restore feature provides the ability to create full 
backups and incremental backups on
+tables in an HBase cluster. The full backup is the foundation on which 
incremental backups are applied
+to build iterative snapshots. Incremental backups can be run on a schedule to 
capture changes over time,
+for example by using a Cron task. Incremental backups are more cost-effective 
than full backups because they only capture
+the changes since the last backup and they also enable administrators to 
restore the database to any prior incremental backup. Furthermore, the
+utilities also enable table-level data backup-and-recovery if you do not want 
to restore the entire dataset
+of the backup.</p>
+</div>
+<div class="paragraph">
+<p>The backup and restore feature supplements the HBase Replication feature. 
While HBase replication is ideal for
+creating "hot" copies of the data (where the replicated data is immediately 
available for query), the backup and
+restore feature is ideal for creating "cold" copies of data (where a manual 
step must be taken to restore the system).
+Previously, users only had the ability to create full backups via the 
ExportSnapshot functionality. The incremental
+backup implementation is the novel improvement over the previous "art" 
provided by ExportSnapshot.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.terminology"><a class="anchor" href="#br.terminology"></a>77. 
Terminology</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The backup and restore feature introduces new terminology which can be used 
to understand how control flows through the
+system.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>A backup</em>: A logical unit of data and metadata which can restore a 
table to its state at a specific point in time.</p>
+</li>
+<li>
+<p><em>Full backup</em>: a type of backup which wholly encapsulates the 
contents of the table at a point in time.</p>
+</li>
+<li>
+<p><em>Incremental backup</em>: a type of backup which contains the changes in 
a table since a full backup.</p>
+</li>
+<li>
+<p><em>Backup set</em>: A user-defined name which references one or more 
tables over which a backup can be executed.</p>
+</li>
+<li>
+<p><em>Backup ID</em>: A unique names which identifies one backup from the 
rest, e.g. <code>backupId_1467823988425</code></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.planning"><a class="anchor" href="#br.planning"></a>78. 
Planning</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>There are some common strategies which can be used to implement backup and 
restore in your environment. The following section
+shows how these strategies are implemented and identifies potential tradeoffs 
with each.</p>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+This backup and restore tools has not been tested on Transparent Data 
Encryption (TDE) enabled HDFS clusters.
+This is related to the open issue <a 
href="https://issues.apache.org/jira/browse/HBASE-16178";>HBASE-16178</a>.
+</td>
+</tr>
+</table>
+</div>
+<div class="sect2">
+<h3 id="br.intracluster.backup"><a class="anchor" 
href="#br.intracluster.backup"></a>78.1. Backup within a cluster</h3>
+<div class="paragraph">
+<p>This strategy stores the backups on the same cluster as where the backup 
was taken. This approach is only appropriate for testing
+as it does not provide any additional safety on top of what the software 
itself already provides.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/backup-intra-cluster.png" alt="backup intra cluster">
+</div>
+<div class="title">Figure 4. Intra-Cluster Backup</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.dedicated.cluster.backup"><a class="anchor" 
href="#br.dedicated.cluster.backup"></a>78.2. Backup using a dedicated 
cluster</h3>
+<div class="paragraph">
+<p>This strategy provides greater fault tolerance and provides a path towards 
disaster recovery. In this setting, you will
+store the backup on a separate HDFS cluster by supplying the backup 
destination clusterâs HDFS URL to the backup utility.
+You should consider backing up to a different physical location, such as a 
different data center.</p>
+</div>
+<div class="paragraph">
+<p>Typically, a backup-dedicated HDFS cluster uses a more economical hardware 
profile to save money.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/backup-dedicated-cluster.png" alt="backup dedicated cluster">
+</div>
+<div class="title">Figure 5. Dedicated HDFS Cluster Backup</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.cloud.or.vendor.backup"><a class="anchor" 
href="#br.cloud.or.vendor.backup"></a>78.3. Backup to the Cloud or a storage 
vendor appliance</h3>
+<div class="paragraph">
+<p>Another approach to safeguarding HBase incremental backups is to store the 
data on provisioned, secure servers that belong
+to third-party vendors and that are located off-site. The vendor can be a 
public cloud provider or a storage vendor who uses
+a Hadoop-compatible file system, such as S3 and other HDFS-compatible 
destinations.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/backup-cloud-appliance.png" alt="backup cloud appliance">
+</div>
+<div class="title">Figure 6. Backup to Cloud or Vendor Storage Solutions</div>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+The HBase backup utility does not support backup to multiple destinations. A 
workaround is to manually create copies
+of the backup files from HDFS or S3.
+</td>
+</tr>
+</table>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.initial.setup"><a class="anchor" href="#br.initial.setup"></a>79. 
First-time configuration steps</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This section contains the necessary configuration changes that must be made 
in order to use the backup and restore feature.
+As this feature makes significant use of YARN&#8217;s MapReduce framework to 
parallelize these I/O heavy operations, configuration
+changes extend outside of just <code>hbase-site.xml</code>.</p>
+</div>
+<div class="sect2">
+<h3 id="_allow_the_hbase_system_user_in_yarn"><a class="anchor" 
href="#_allow_the_hbase_system_user_in_yarn"></a>79.1. Allow the "hbase" system 
user in YARN</h3>
+<div class="paragraph">
+<p>The YARN <strong>container-executor.cfg</strong> configuration file must 
have the following property setting: <em>allowed.system.users=hbase</em>. No 
spaces
+are allowed in entries of this configuration file.</p>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+Skipping this step will result in runtime errors when executing the first 
backup tasks.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p><strong>Example of a valid container-executor.cfg file for backup and 
restore:</strong></p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code 
data-lang="java">yarn.nodemanager.log-dirs=/var/log/hadoop/mapred
+yarn.nodemanager.linux-container-executor.group=yarn
+banned.users=hdfs,yarn,mapred,bin
+allowed.system.users=hbase
+min.user.id=<span class="integer">500</span></code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_hbase_specific_changes"><a class="anchor" 
href="#_hbase_specific_changes"></a>79.2. HBase specific changes</h3>
+<div class="paragraph">
+<p>Add the following properties to hbase-site.xml and restart HBase if it is 
already running.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+The ",&#8230;&#8203;" is an ellipsis meant to imply that this is a 
comma-separated list of values, not literal text which should be added to 
hbase-site.xml.
+</td>
+</tr>
+</table>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java">&lt;property&gt;
+  &lt;name&gt;hbase.backup.enable&lt;/name&gt;
+  &lt;value&gt;<span class="predefined-constant">true</span>&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.master.logcleaner.plugins&lt;/name&gt;
+  
&lt;value&gt;org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.procedure.master.classes&lt;/name&gt;
+  
&lt;value&gt;org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager,...&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.procedure.regionserver.classes&lt;/name&gt;
+  
&lt;value&gt;org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager,...&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.coprocessor.region.classes&lt;/name&gt;
+  &lt;value&gt;org.apache.hadoop.hbase.backup.BackupObserver,...&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.master.hfilecleaner.plugins&lt;/name&gt;
+  
&lt;value&gt;org.apache.hadoop.hbase.backup.BackupHFileCleaner,...&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_backup_and_restore_commands"><a class="anchor" 
href="#_backup_and_restore_commands"></a>80. Backup and Restore commands</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This covers the command-line utilities that administrators would run to 
create, restore, and merge backups. Tools to
+inspect details on specific backup sessions is covered in the next section, <a 
href="#br.administration">Administration of Backup Images</a>.</p>
+</div>
+<div class="paragraph">
+<p>Run the command <code>hbase backup help &lt;command&gt;</code> to access 
the online help that provides basic information about a command
+and its options. The below information is captured in this help message for 
each command.</p>
+</div>
+<div class="sect2">
+<h3 id="br.creating.complete.backup"><a class="anchor" 
href="#br.creating.complete.backup"></a>80.1. Creating a Backup Image</h3>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>For HBase clusters also using Apache Phoenix: include the SQL system 
catalog tables in the backup. In the event that you
+need to restore the HBase backup, access to the system catalog tables enable 
you to resume Phoenix interoperability with the
+restored data.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>The first step in running the backup and restore utilities is to perform a 
full backup and to store the data in a separate image
+from the source. At a minimum, you must do this to get a baseline before you 
can rely on incremental backups.</p>
+</div>
+<div class="paragraph">
+<p>Run the following command as HBase superuser:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java">hbase backup create 
&lt;type&gt; &lt;backup_path&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>After the command finishes running, the console prints a SUCCESS or FAILURE 
status message. The SUCCESS message includes a <em>backup</em> ID.
+The backup ID is the Unix time (also known as Epoch time) that the HBase 
master received the backup request from the client.</p>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>Record the backup ID that appears at the end of a successful backup. In 
case the source cluster fails and you need to recover the
+dataset with a restore operation, having the backup ID readily available can 
save time.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="sect3">
+<h4 id="br.create.positional.cli.arguments"><a class="anchor" 
href="#br.create.positional.cli.arguments"></a>80.1.1. Positional Command-Line 
Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>type</em></dt>
+<dd>
+<p>The type of backup to execute: <em>full</em> or <em>incremental</em>. As a 
reminder, an <em>incremental</em> backup requires a <em>full</em> backup to
+already exist.</p>
+</dd>
+<dt class="hdlist1"><em>backup_path</em></dt>
+<dd>
+<p>The <em>backup_path</em> argument specifies the full filesystem URI of 
where to store the backup image. Valid prefixes are
+are <em>hdfs:</em>, <em>webhdfs:</em>, <em>gpfs:</em>, and <em>s3fs:</em>.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.create.named.cli.arguments"><a class="anchor" 
href="#br.create.named.cli.arguments"></a>80.1.2. Named Command-Line 
Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>-t &lt;table_name[,table_name]&gt;</em></dt>
+<dd>
+<p>A comma-separated list of tables to back up. If no tables are specified, 
all tables are backed up. No regular-expression or
+wildcard support is present; all table names must be explicitly listed. See <a 
href="#br.using.backup.sets">Backup Sets</a> for more
+information about peforming operations on collections of tables. Mutually 
exclusive with the <em>-s</em> option; one of these
+named options are required.</p>
+</dd>
+<dt class="hdlist1"><em>-s &lt;backup_set_name&gt;</em></dt>
+<dd>
+<p>Identify tables to backup based on a backup set. See <a 
href="#br.using.backup.sets">Using Backup Sets</a> for the purpose and usage
+of backup sets. Mutually exclusive with the <em>-t</em> option.</p>
+</dd>
+<dt class="hdlist1"><em>-w &lt;number_workers&gt;</em></dt>
+<dd>
+<p>(Optional) Specifies the number of parallel workers to copy data to backup 
destination. Backups are currently executed by MapReduce jobs
+so this value corresponds to the number of Mappers that will be spawned by the 
job.</p>
+</dd>
+<dt class="hdlist1"><em>-b &lt;bandwidth_per_worker&gt;</em></dt>
+<dd>
+<p>(Optional) Specifies the bandwidth of each worker in MB per second.</p>
+</dd>
+<dt class="hdlist1"><em>-d</em></dt>
+<dd>
+<p>(Optional) Enables "DEBUG" mode which prints additional logging about the 
backup creation.</p>
+</dd>
+<dt class="hdlist1"><em>-q &lt;name&gt;</em></dt>
+<dd>
+<p>(Optional) Allows specification of the name of a YARN queue which the 
MapReduce job to create the backup should be executed in. This option
+is useful to prevent backup tasks from stealing resources away from other 
MapReduce jobs of high importance.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.usage.examples"><a class="anchor" 
href="#br.usage.examples"></a>80.1.3. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup create full hdfs:<span 
class="comment">//host5:8020/data/backup -t SALES2,SALES3 -w 
3</span></code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This command creates a full backup image of two tables, SALES2 and SALES3, 
in the HDFS instance who NameNode is host5:8020
+in the path <em>/data/backup</em>. The <em>-w</em> option specifies that no 
more than three parallel works complete the operation.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.restoring.backup"><a class="anchor" 
href="#br.restoring.backup"></a>80.2. Restoring a Backup Image</h3>
+<div class="paragraph">
+<p>Run the following command as an HBase superuser. You can only restore a 
backup on a running HBase cluster because the data must be
+redistributed the RegionServers for the operation to complete successfully.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java">hbase restore 
&lt;backup_path&gt; &lt;backup_id&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.restore.positional.args"><a class="anchor" 
href="#br.restore.positional.args"></a>80.2.1. Positional Command-Line 
Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_path</em></dt>
+<dd>
+<p>The <em>backup_path</em> argument specifies the full filesystem URI of 
where to store the backup image. Valid prefixes are
+are <em>hdfs:</em>, <em>webhdfs:</em>, <em>gpfs:</em>, and <em>s3fs:</em>.</p>
+</dd>
+<dt class="hdlist1"><em>backup_id</em></dt>
+<dd>
+<p>The backup ID that uniquely identifies the backup image to be restored.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.restore.named.args"><a class="anchor" 
href="#br.restore.named.args"></a>80.2.2. Named Command-Line Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>-t &lt;table_name[,table_name]&gt;</em></dt>
+<dd>
+<p>A comma-separated list of tables to restore. See <a 
href="#br.using.backup.sets">Backup Sets</a> for more
+information about peforming operations on collections of tables. Mutually 
exclusive with the <em>-s</em> option; one of these
+named options are required.</p>
+</dd>
+<dt class="hdlist1"><em>-s &lt;backup_set_name&gt;</em></dt>
+<dd>
+<p>Identify tables to backup based on a backup set. See <a 
href="#br.using.backup.sets">Using Backup Sets</a> for the purpose and usage
+of backup sets. Mutually exclusive with the <em>-t</em> option.</p>
+</dd>
+<dt class="hdlist1"><em>-q &lt;name&gt;</em></dt>
+<dd>
+<p>(Optional) Allows specification of the name of a YARN queue which the 
MapReduce job to create the backup should be executed in. This option
+is useful to prevent backup tasks from stealing resources away from other 
MapReduce jobs of high importance.</p>
+</dd>
+<dt class="hdlist1"><em>-c</em></dt>
+<dd>
+<p>(Optional) Perform a dry-run of the restore. The actions are checked, but 
not executed.</p>
+</dd>
+<dt class="hdlist1"><em>-m &lt;target_tables&gt;</em></dt>
+<dd>
+<p>(Optional) A comma-separated list of tables to restore into. If this option 
is not provided, the original table name is used. When
+this option is provided, there must be an equal number of entries provided in 
the <code>-t</code> option.</p>
+</dd>
+<dt class="hdlist1"><em>-o</em></dt>
+<dd>
+<p>(Optional) Overwrites the target table for the restore if the table already 
exists.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.restore.usage"><a class="anchor" 
href="#br.restore.usage"></a>80.2.3. Example of Usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java">hbase backup restore 
/tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This command restores two tables of an incremental backup image. In this 
example:
+â¢ <code>/tmp/backup_incremental</code> is the path to the directory 
containing the backup image.
+â¢ <code>backupId_1467823988425</code> is the backup ID.
+â¢ <code>mytable1</code> and <code>mytable2</code> are the names of tables in 
the backup image to be restored.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.merge.backup"><a class="anchor" href="#br.merge.backup"></a>80.3. 
Merging Incremental Backup Images</h3>
+<div class="paragraph">
+<p>This command can be used to merge two or more incremental backup images 
into a single incremental
+backup image. This can be used to consolidate multiple, small incremental 
backup images into a single
+larger incremental backup image. This command could be used to merge hourly 
incremental backups
+into a daily incremental backup image, or daily incremental backups into a 
weekly incremental backup.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup merge &lt;backup_ids&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.merge.backup.positional.cli.arguments"><a class="anchor" 
href="#br.merge.backup.positional.cli.arguments"></a>80.3.1. Positional 
Command-Line Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_ids</em></dt>
+<dd>
+<p>A comma-separated list of incremental backup image IDs that are to be 
combined into a single image.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.merge.backup.named.cli.arguments"><a class="anchor" 
href="#br.merge.backup.named.cli.arguments"></a>80.3.2. Named Command-Line 
Arguments</h4>
+<div class="paragraph">
+<p>None.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.merge.backup.example"><a class="anchor" 
href="#br.merge.backup.example"></a>80.3.3. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup merge 
backupId_1467823988425,backupId_1467827588425</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.using.backup.sets"><a class="anchor" 
href="#br.using.backup.sets"></a>80.4. Using Backup Sets</h3>
+<div class="paragraph">
+<p>Backup sets can ease the administration of HBase data backups and restores 
by reducing the amount of repetitive input
+of table names. You can group tables into a named backup set with the 
<code>hbase backup set add</code> command. You can then use
+the -set option to invoke the name of a backup set in the <code>hbase backup 
create</code> or <code>hbase backup restore</code> rather than list
+individually every table in the group. You can have multiple backup sets.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Note the differentiation between the <code>hbase backup set add</code> command 
and the <em>-set</em> option. The <code>hbase backup set add</code>
+command must be run before using the <code>-set</code> option in a different 
command because backup sets must be named and defined
+before using backup sets as a shortcut.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>If you run the <code>hbase backup set add</code> command and specify a 
backup set name that does not yet exist on your system, a new set
+is created. If you run the command with the name of an existing backup set 
name, then the tables that you specify are added
+to the set.</p>
+</div>
+<div class="paragraph">
+<p>In this command, the backup set name is case-sensitive.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+The metadata of backup sets are stored within HBase. If you do not have access 
to the original HBase cluster with the
+backup set metadata, then you must specify individual table names to restore 
the data.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>To create a backup set, run the following command as the HBase 
superuser:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup set &lt;subcommand&gt; 
&lt;backup_set_name&gt; &lt;tables&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.set.subcommands"><a class="anchor" 
href="#br.set.subcommands"></a>80.4.1. Backup Set Subcommands</h4>
+<div class="paragraph">
+<p>The following list details subcommands of the hbase backup set command.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+You must enter one (and no more than one) of the following subcommands after 
hbase backup set to complete an operation.
+Also, the backup set name is case-sensitive in the command-line utility.
+</td>
+</tr>
+</table>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>add</em></dt>
+<dd>
+<p>Adds table[s] to a backup set. Specify a <em>backup_set_name</em> value 
after this argument to create a backup set.</p>
+</dd>
+<dt class="hdlist1"><em>remove</em></dt>
+<dd>
+<p>Removes tables from the set. Specify the tables to remove in the tables 
argument.</p>
+</dd>
+<dt class="hdlist1"><em>list</em></dt>
+<dd>
+<p>Lists all backup sets.</p>
+</dd>
+<dt class="hdlist1"><em>describe</em></dt>
+<dd>
+<p>Displays a description of a backup set. The information includes whether 
the set has full
+or incremental backups, start and end times of the backups, and a list of the 
tables in the set. This subcommand must precede
+a valid value for the <em>backup_set_name</em> value.</p>
+</dd>
+<dt class="hdlist1"><em>delete</em></dt>
+<dd>
+<p>Deletes a backup set. Enter the value for the <em>backup_set_name</em> 
option directly after the <code>hbase backup set delete</code> command.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.set.positional.cli.arguments"><a class="anchor" 
href="#br.set.positional.cli.arguments"></a>80.4.2. Positional Command-Line 
Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_set_name</em></dt>
+<dd>
+<p>Use to assign or invoke a backup set name. The backup set name must contain 
only printable characters and cannot have any spaces.</p>
+</dd>
+<dt class="hdlist1"><em>tables</em></dt>
+<dd>
+<p>List of tables (or a single table) to include in the backup set. Enter the 
table names as a comma-separated list. If no tables
+are specified, all tables are included in the set.</p>
+</dd>
+</dl>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+Maintain a log or other record of the case-sensitive backup set names and the 
corresponding tables in each set on a separate
+or remote cluster, backup strategy. This information can help you in case of 
failure on the primary cluster.
+</td>
+</tr>
+</table>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.set.usage"><a class="anchor" href="#br.set.usage"></a>80.4.3. 
Example of Usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup set add Q1Data TEAM3,TEAM_4</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Depending on the environment, this command results in <em>one</em> of the 
following actions:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>If the <code>Q1Data</code> backup set does not exist, a backup set 
containing tables <code>TEAM_3</code> and <code>TEAM_4</code> is created.</p>
+</li>
+<li>
+<p>If the <code>Q1Data</code> backup set exists already, the tables 
<code>TEAM_3</code> and <code>TEAM_4</code> are added to the 
<code>Q1Data</code> backup set.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.administration"><a class="anchor" href="#br.administration"></a>81. 
Administration of Backup Images</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The <code>hbase backup</code> command has several subcommands that help 
with administering backup images as they accumulate. Most production
+environments require recurring backups, so it is necessary to have utilities 
to help manage the data of the backup repository.
+Some subcommands enable you to find information that can help identify backups 
that are relevant in a search for particular data.
+You can also delete backup images.</p>
+</div>
+<div class="paragraph">
+<p>The following list details each <code>hbase backup subcommand</code> that 
can help administer backups. Run the full command-subcommand line as
+the HBase superuser.</p>
+</div>
+<div class="sect2">
+<h3 id="br.managing.backup.progress"><a class="anchor" 
href="#br.managing.backup.progress"></a>81.1. Managing Backup Progress</h3>
+<div class="paragraph">
+<p>You can monitor a running backup in another terminal session by running the 
<em>hbase backup progress</em> command and specifying the backup ID as an 
argument.</p>
+</div>
+<div class="paragraph">
+<p>For example, run the following command as hbase superuser to view the 
progress of a backup</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup progress &lt;backup_id&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.progress.positional.cli.arguments"><a class="anchor" 
href="#br.progress.positional.cli.arguments"></a>81.1.1. Positional 
Command-Line Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_id</em></dt>
+<dd>
+<p>Specifies the backup that you want to monitor by seeing the progress 
information. The backupId is case-sensitive.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.progress.named.cli.arguments"><a class="anchor" 
href="#br.progress.named.cli.arguments"></a>81.1.2. Named Command-Line 
Arguments</h4>
+<div class="paragraph">
+<p>None.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.progress.example"><a class="anchor" 
href="#br.progress.example"></a>81.1.3. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java">hbase backup progress 
backupId_1467823988425</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.managing.backup.history"><a class="anchor" 
href="#br.managing.backup.history"></a>81.2. Managing Backup History</h3>
+<div class="paragraph">
+<p>This command displays a log of backup sessions. The information for each 
session includes backup ID, type (full or incremental), the tables
+in the backup, status, and start and end time. Specify the number of backup 
sessions to display with the optional -n argument.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup history &lt;backup_id&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.history.positional.cli.arguments"><a class="anchor" 
href="#br.history.positional.cli.arguments"></a>81.2.1. Positional Command-Line 
Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_id</em></dt>
+<dd>
+<p>Specifies the backup that you want to monitor by seeing the progress 
information. The backupId is case-sensitive.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.history.named.cli.arguments"><a class="anchor" 
href="#br.history.named.cli.arguments"></a>81.2.2. Named Command-Line 
Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>-n &lt;num_records&gt;</em></dt>
+<dd>
+<p>(Optional) The maximum number of backup records (Default: 10).</p>
+</dd>
+<dt class="hdlist1"><em>-p &lt;backup_root_path&gt;</em></dt>
+<dd>
+<p>The full filesystem URI of where backup images are stored.</p>
+</dd>
+<dt class="hdlist1"><em>-s &lt;backup_set_name&gt;</em></dt>
+<dd>
+<p>The name of the backup set to obtain history for. Mutually exclusive with 
the <em>-t</em> option.</p>
+</dd>
+<dt class="hdlist1"><em>-t</em> &lt;table_name&gt;</dt>
+<dd>
+<p>The name of table to obtain history for. Mutually exclusive with the 
<em>-s</em> option.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.history.backup.example"><a class="anchor" 
href="#br.history.backup.example"></a>81.2.3. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup history
+<span class="error">$</span> hbase backup history -n <span 
class="integer">20</span>
+<span class="error">$</span> hbase backup history -t 
WebIndexRecords</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.describe.backup"><a class="anchor" 
href="#br.describe.backup"></a>81.3. Describing a Backup Image</h3>
+<div class="paragraph">
+<p>This command can be used to obtain information about a specific backup 
image.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup describe &lt;backup_id&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.describe.backup.positional.cli.arguments"><a class="anchor" 
href="#br.describe.backup.positional.cli.arguments"></a>81.3.1. Positional 
Command-Line Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_id</em></dt>
+<dd>
+<p>The ID of the backup image to describe.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.describe.backup.named.cli.arguments"><a class="anchor" 
href="#br.describe.backup.named.cli.arguments"></a>81.3.2. Named Command-Line 
Arguments</h4>
+<div class="paragraph">
+<p>None.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.describe.backup.example"><a class="anchor" 
href="#br.describe.backup.example"></a>81.3.3. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup describe backupId_1467823988425</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.delete.backup"><a class="anchor" href="#br.delete.backup"></a>81.4. 
Deleting a Backup Image</h3>
+<div class="paragraph">
+<p>This command can be used to delete a backup image which is no longer 
needed.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup delete &lt;backup_id&gt;</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.delete.backup.positional.cli.arguments"><a class="anchor" 
href="#br.delete.backup.positional.cli.arguments"></a>81.4.1. Positional 
Command-Line Arguments</h4>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>backup_id</em></dt>
+<dd>
+<p>The ID to the backup image which should be deleted.</p>
+</dd>
+</dl>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.delete.backup.named.cli.arguments"><a class="anchor" 
href="#br.delete.backup.named.cli.arguments"></a>81.4.2. Named Command-Line 
Arguments</h4>
+<div class="paragraph">
+<p>None.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.delete.backup.example"><a class="anchor" 
href="#br.delete.backup.example"></a>81.4.3. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup delete backupId_1467823988425</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.repair.backup"><a class="anchor" href="#br.repair.backup"></a>81.5. 
Backup Repair Command</h3>
+<div class="paragraph">
+<p>This command attempts to correct any inconsistencies in persisted backup 
metadata which exists as
+the result of software errors or unhandled failure scenarios. While the backup 
implementation tries
+to correct all errors on its own, this tool may be necessary in the cases 
where the system cannot
+automatically recover on its own.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup repair</code></pre>
+</div>
+</div>
+<div class="sect3">
+<h4 id="br.repair.backup.positional.cli.arguments"><a class="anchor" 
href="#br.repair.backup.positional.cli.arguments"></a>81.5.1. Positional 
Command-Line Arguments</h4>
+<div class="paragraph">
+<p>None.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="br.repair.backup.named.cli.arguments"><a class="anchor" 
href="#br.repair.backup.named.cli.arguments"></a>81.6. Named Command-Line 
Arguments</h3>
+<div class="paragraph">
+<p>None.</p>
+</div>
+<div class="sect3">
+<h4 id="br.repair.backup.example"><a class="anchor" 
href="#br.repair.backup.example"></a>81.6.1. Example usage</h4>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup repair</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.backup.configuration"><a class="anchor" 
href="#br.backup.configuration"></a>82. Configuration keys</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The backup and restore feature includes both required and optional 
configuration keys.</p>
+</div>
+<div class="sect2">
+<h3 id="_required_properties"><a class="anchor" 
href="#_required_properties"></a>82.1. Required properties</h3>
+<div class="paragraph">
+<p><em>hbase.backup.enable</em>: Controls whether or not the feature is 
enabled (Default: <code>false</code>). Set this value to <code>true</code>.</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.master.logcleaner.plugins</em>: A comma-separated list of classes 
invoked when cleaning logs in the HBase Master. Set
+this value to 
<code>org.apache.hadoop.hbase.backup.master.BackupLogCleaner</code> or append 
it to the current value.</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.procedure.master.classes</em>: A comma-separated list of classes 
invoked with the Procedure framework in the Master. Set
+this value to 
<code>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager</code>
 or append it to the current value.</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.procedure.regionserver.classes</em>: A comma-separated list of 
classes invoked with the Procedure framework in the RegionServer.
+Set this value to 
<code>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager</code>
 or append it to the current value.</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.coprocessor.region.classes</em>: A comma-separated list of 
RegionObservers deployed on tables. Set this value to
+<code>org.apache.hadoop.hbase.backup.BackupObserver</code> or append it to the 
current value.</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.master.hfilecleaner.plugins</em>: A comma-separated list of 
HFileCleaners deployed on the Master. Set this value
+to <code>org.apache.hadoop.hbase.backup.BackupHFileCleaner</code> or append it 
to the current value.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_optional_properties"><a class="anchor" 
href="#_optional_properties"></a>82.2. Optional properties</h3>
+<div class="paragraph">
+<p><em>hbase.backup.system.ttl</em>: The time-to-live in seconds of data in 
the <code>hbase:backup</code> tables (default: forever). This property
+is only relevant prior to the creation of the <code>hbase:backup</code> table. 
Use the <code>alter</code> command in the HBase shell to modify the TTL
+when this table already exists. See the <a 
href="#br.filesystem.growth.warning">below section</a> for more details on the 
impact of this
+configuration property.</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.backup.attempts.max</em>: The number of attempts to perform when 
taking hbase table snapshots (default: 10).</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.backup.attempts.pause.ms</em>: The amount of time to wait between 
failed snapshot attempts in milliseconds (default: 10000).</p>
+</div>
+<div class="paragraph">
+<p><em>hbase.backup.logroll.timeout.millis</em>: The amount of time (in 
milliseconds) to wait for RegionServers to execute a WAL rolling
+in the Master&#8217;s procedure framework (default: 30000).</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.best.practices"><a class="anchor" href="#br.best.practices"></a>83. 
Best Practices</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_formulate_a_restore_strategy_and_test_it"><a class="anchor" 
href="#_formulate_a_restore_strategy_and_test_it"></a>83.1. Formulate a restore 
strategy and test it.</h3>
+<div class="paragraph">
+<p>Before you rely on a backup and restore strategy for your production 
environment, identify how backups must be performed,
+and more importantly, how restores must be performed. Test the plan to ensure 
that it is workable.
+At a minimum, store backup data from a production cluster on a different 
cluster or server. To further safeguard the data,
+use a backup location that is at a different physical location.</p>
+</div>
+<div class="paragraph">
+<p>If you have a unrecoverable loss of data on your primary production cluster 
as a result of computer system issues, you may
+be able to restore the data from a different cluster or server at the same 
site. However, a disaster that destroys the whole
+site renders locally stored backups useless. Consider storing the backup data 
and necessary resources (both computing capacity
+and operator expertise) to restore the data at a site sufficiently remote from 
the production site. In the case of a catastrophe
+at the whole primary site (fire, earthquake, etc.), the remote backup site can 
be very valuable.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_secure_a_full_backup_image_first"><a class="anchor" 
href="#_secure_a_full_backup_image_first"></a>83.2. Secure a full backup image 
first.</h3>
+<div class="paragraph">
+<p>As a baseline, you must complete a full backup of HBase data at least once 
before you can rely on incremental backups. The full
+backup should be stored outside of the source cluster. To ensure complete 
dataset recovery, you must run the restore utility
+with the option to restore baseline full backup. The full backup is the 
foundation of your dataset. Incremental backup data
+is applied on top of the full backup during the restore operation to return 
you to the point in time when backup was last taken.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 
id="_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"><a
 class="anchor" 
href="#_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"></a>83.3.
 Define and use backup sets for groups of tables that are logical subsets of 
the entire dataset.</h3>
+<div class="paragraph">
+<p>You can group tables into an object called a backup set. A backup set can 
save time when you have a particular group of tables
+that you expect to repeatedly back up or restore.</p>
+</div>
+<div class="paragraph">
+<p>When you create a backup set, you type table names to include in the group. 
The backup set includes not only groups of related
+tables, but also retains the HBase backup metadata. Afterwards, you can invoke 
the backup set name to indicate what tables apply
+to the command execution instead of entering all the table names 
individually.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 
id="_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"><a
 class="anchor" 
href="#_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"></a>83.4.
 Document the backup and restore strategy, and ideally log information about 
each backup.</h3>
+<div class="paragraph">
+<p>Document the whole process so that the knowledge base can transfer to new 
administrators after employee turnover. As an extra
+safety precaution, also log the calendar date, time, and other relevant 
details about the data of each backup. This metadata
+can potentially help locate a particular dataset in case of source cluster 
failure or primary site disaster. Maintain duplicate
+copies of all documentation: one copy at the production cluster site and 
another at the backup location or wherever it can be
+accessed by an administrator remotely from the production cluster.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.s3.backup.scenario"><a class="anchor" 
href="#br.s3.backup.scenario"></a>84. Scenario: Safeguarding Application 
Datasets on Amazon S3</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This scenario describes how a hypothetical retail business uses backups to 
safeguard application data and then restore the dataset
+after failure.</p>
+</div>
+<div class="paragraph">
+<p>The HBase administration team uses backup sets to store data from a group 
of tables that have interrelated information for an
+application called green. In this example, one table contains transaction 
records and the other contains customer details. The
+two tables need to be backed up and be recoverable as a group.</p>
+</div>
+<div class="paragraph">
+<p>The admin team also wants to ensure daily backups occur automatically.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/backup-app-components.png" alt="backup app components">
+</div>
+<div class="title">Figure 7. Tables Composing The Backup Set</div>
+</div>
+<div class="paragraph">
+<p>The following is an outline of the steps and examples of commands that are 
used to backup the data for the <em>green</em> application and
+to recover the data later. All commands are run when logged in as HBase 
superuser.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>A backup set called <em>green_set</em> is created as an alias for both the 
transactions table and the customer table. The backup set can
+be used for all operations to avoid typing each table name. The backup set 
name is case-sensitive and should be formed with only
+printable characters and without spaces.</p>
+</li>
+</ol>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> hbase backup set add green_set transactions
+<span class="error">$</span> hbase backup set add green_set 
customer</code></pre>
+</div>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>The first backup of green_set data must be a full backup. The following 
command example shows how credentials are passed to Amazon
+S3 and specifies the file system with the s3a: prefix.</p>
+</li>
+</ol>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> ACCESS_KEY=ABCDEFGHIJKLMNOPQRST
+<span class="error">$</span> SECRET_KEY=<span 
class="integer">123456789</span>abcdefghijklmnopqrstuvwxyzABCD
+<span class="error">$</span> sudo -u hbase hbase backup create full\
+  s3a:<span class="comment">//$ACCESS_KEY:SECRET_KEY@prodhbasebackups/backups 
-s green_set</span></code></pre>
+</div>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Incremental backups should be run according to a schedule that ensures 
essential data recovery in the event of a catastrophe. At
+this retail company, the HBase admin team decides that automated daily backups 
secures the data sufficiently. The team decides that
+they can implement this by modifying an existing Cron job that is defined in 
<code>/etc/crontab</code>. Consequently, IT modifies the Cron job
+by adding the following line:</p>
+</li>
+</ol>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="annotation">@daily</span> hbase hbase backup create incremental 
s3a:<span class="comment">//$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups -s 
green_set</span></code></pre>
+</div>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>A catastrophic IT incident disables the production cluster that the green 
application uses. An HBase system administrator of the
+backup cluster must restore the <em>green_set</em> dataset to the point in 
time closest to the recovery objective.</p>
+</li>
+</ol>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+If the administrator of the backup HBase cluster has the backup ID with 
relevant details in accessible records, the following
+search with the <code>hdfs dfs -ls</code> command and manually scanning the 
backup ID list can be bypassed. Consider continuously maintaining
+and protecting a detailed log of backup IDs outside the production cluster in 
your environment.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>The HBase administrator runs the following command on the directory where 
backups are stored to print the list of successful backup
+IDs on the console:</p>
+</div>
+<div class="paragraph">
+<p><code>hdfs dfs -ls -t /prodhbasebackups/backups</code></p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>The admin scans the list to see which backup was created at a date and time 
closest to the recovery objective. To do this, the
+admin converts the calendar timestamp of the recovery point in time to Unix 
time because backup IDs are uniquely identified with
+Unix time. The backup IDs are listed in reverse chronological order, meaning 
the most recent successful backup appears first.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>The admin notices that the following line in the command output corresponds 
with the <em>green_set</em> backup that needs to be restored:</p>
+</div>
+<div class="paragraph">
+<p><code>/prodhbasebackups/backups/backup_1467823988425</code></p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>The admin restores green_set invoking the backup ID and the -overwrite 
option. The -overwrite option truncates all existing data
+in the destination and populates the tables with data from the backup dataset. 
Without this flag, the backup data is appended to the
+existing data in the destination. In this case, the admin decides to overwrite 
the data because it is corrupted.</p>
+</li>
+</ol>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span 
class="error">$</span> sudo -u hbase hbase restore -s green_set \
+  s3a:<span class="comment">//$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups 
backup_1467823988425 \ -overwrite</span></code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.data.security"><a class="anchor" href="#br.data.security"></a>85. 
Security of Backup Data</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>With this feature which makes copying data to remote locations, it&#8217;s 
important to take a moment to clearly state the procedural
+concerns that exist around data security. Like the HBase replication feature, 
backup and restore provides the constructs to automatically
+copy data from within a corporate boundary to some system outside of that 
boundary. It is imperative when storing sensitive data that with backup and 
restore, much
+less any feature which extracts data from HBase, the locations to which data 
is being sent has undergone a security audit to ensure
+that only authenticated users are allowed to access that data.</p>
+</div>
+<div class="paragraph">
+<p>For example, with the above example of backing up data to S3, it is of the 
utmost importance that the proper permissions are assigned
+to the S3 bucket to ensure that only a minimum set of authorized users are 
allowed to access this data. Because the data is no longer
+being accessed via HBase, and its authentication and authorization controls, 
we must ensure that the filesystem storing that data is
+providing a comparable level of security. This is a manual step which users 
<strong>must</strong> implement on their own.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.technical.details"><a class="anchor" 
href="#br.technical.details"></a>86. Technical Details of Incremental Backup 
and Restore</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>HBase incremental backups enable more efficient capture of HBase table 
images than previous attempts at serial backup and restore
+solutions, such as those that only used HBase Export and Import APIs. 
Incremental backups use Write Ahead Logs (WALs) to capture
+the data changes since the previous backup was created. A WAL roll (create new 
WALs) is executed across all RegionServers to track
+the WALs that need to be in the backup.</p>
+</div>
+<div class="paragraph">
+<p>After the incremental backup image is created, the source backup files 
usually are on same node as the data source. A process similar
+to the DistCp (distributed copy) tool is used to move the source backup files 
to the target file systems. When a table restore operation
+starts, a two-step process is initiated. First, the full backup is restored 
from the full backup image. Second, all WAL files from
+incremental backups between the last full backup and the incremental backup 
being restored are converted to HFiles, which the HBase
+Bulk Load utility automatically imports as restored data in the table.</p>
+</div>
+<div class="paragraph">
+<p>You can only restore on a live HBase cluster because the data must be 
redistributed to complete the restore operation successfully.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.filesystem.growth.warning"><a class="anchor" 
href="#br.filesystem.growth.warning"></a>87. A Warning on File System 
Growth</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>As a reminder, incremental backups are implemented via retaining the 
write-ahead logs which HBase primarily uses for data durability.
+Thus, to ensure that all data needing to be included in a backup is still 
available in the system, the HBase backup and restore feature
+retains all write-ahead logs since the last backup until the next incremental 
backup is executed.</p>
+</div>
+<div class="paragraph">
+<p>Like HBase Snapshots, this can have an expectedly large impact on the HDFS 
usage of HBase for high volume tables. Take care in enabling
+and using the backup and restore feature, specifically with a mind to removing 
backup sessions when they are not actively being used.</p>
+</div>
+<div class="paragraph">
+<p>The only automated, upper-bound on retained write-ahead logs for backup and 
restore is based on the TTL of the <code>hbase:backup</code> system table which,
+as of the time this document is written, is infinite (backup table entries are 
never automatically deleted). This requires that administrators
+perform backups on a schedule whose frequency is relative to the amount of 
available space on HDFS (e.g. less available HDFS space requires
+more aggressive backup merges and deletions). As a reminder, the TTL can be 
altered on the <code>hbase:backup</code> table using the <code>alter</code> 
command
+in the HBase shell. Modifying the configuration property 
<code>hbase.backup.system.ttl</code> in hbase-site.xml after the system table 
exists has no effect.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.backup.capacity.planning"><a class="anchor" 
href="#br.backup.capacity.planning"></a>88. Capacity Planning</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>When designing a distributed system deployment, it is critical that some 
basic mathmatical rigor is executed to ensure sufficient computational
+capacity is available given the data and software requirements of the system. 
For this feature, the availability of network capacity is the largest
+bottleneck when estimating the performance of some implementation of backup 
and restore. The second most costly function is the speed at which
+data can be read/written.</p>
+</div>
+<div class="sect2">
+<h3 id="_full_backups"><a class="anchor" href="#_full_backups"></a>88.1. Full 
Backups</h3>
+<div class="paragraph">
+<p>To estimate the duration of a full backup, we have to understand the 
general actions which are invoked:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Write-ahead log roll on each RegionServer: ones to tens of seconds per 
RegionServer in parallel. Relative to the load on each RegionServer.</p>
+</li>
+<li>
+<p>Take an HBase snapshot of the table(s): tens of seconds. Relative to the 
number of regions and files that comprise the table.</p>
+</li>
+<li>
+<p>Export the snapshot to the destination: see below. Relative to the size of 
the data and the network bandwidth to the destination.</p>
+</li>
+</ul>
+</div>
+<div id="br.export.snapshot.cost" class="paragraph">
+<p>To approximate how long the final step will take, we have to make some 
assumptions on hardware. Be aware that these will <strong>not</strong> be 
accurate for your
+system&#8201;&#8212;&#8201;these are numbers that your or your administrator 
know for your system. Let&#8217;s say the speed of reading data from HDFS on a 
single node is
+capped at 80MB/s (across all Mappers that run on that host), a modern network 
interface controller (NIC) supports 10Gb/s, the top-of-rack switch can
+handle 40Gb/s, and the WAN between your clusters is 10Gb/s. This means that 
you can only ship data to your remote at a speed of 
1.25GB/s&#8201;&#8212;&#8201;meaning
+that 16 nodes (<code>1.25 * 1024 / 80 = 16</code>) participating in the 
ExportSnapshot should be able to fully saturate the link between clusters. With 
more
+nodes in the cluster, we can still saturate the network but at a lesser impact 
on any one node which helps ensure local SLAs are made. If the size
+of the snapshot is 10TB, this would full backup would take in the ballpark of 
2.5 hours (<code>10 * 1024 / 1.25 / (60 * 60) = 2.23hrs</code>)</p>
+</div>
+<div class="paragraph">
+<p>As a general statement, it is very likely that the WAN bandwidth between 
your local cluster and the remote storage is the largest
+bottleneck to the speed of a full backup.</p>
+</div>
+<div class="paragraph">
+<p>When the concern is restricting the computational impact of backups to a 
"production system", the above formulas can be reused with the optional
+command-line arguments to <code>hbase backup create</code>: <code>-b</code>, 
<code>-w</code>, <code>-q</code>. The <code>-b</code> option defines the 
bandwidth at which each worker (Mapper) would
+write data. The <code>-w</code> argument limits the number of workers that 
would be spawned in the DistCp job. The <code>-q</code> allows the user to 
specify a YARN
+queue which can limit the specific nodes where the workers will be 
spawned&#8201;&#8212;&#8201;this can quarantine the backup workers performing 
the copy to
+a set of non-critical nodes. Relating the <code>-b</code> and <code>-w</code> 
options to our earlier equations: <code>-b</code> would be used to restrict 
each node from reading
+data at the full 80MB/s and <code>-w</code> is used to limit the job from 
spawning 16 worker tasks.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_incremental_backup"><a class="anchor" 
href="#_incremental_backup"></a>88.2. Incremental Backup</h3>
+<div class="paragraph">
+<p>Like we did for full backups, we have to understand the incremental backup 
process to approximate its runtime and cost.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Identify new write-ahead logs since last full or incremental backup: 
negligible. Apriori knowledge from the backup system table(s).</p>
+</li>
+<li>
+<p>Read, filter, and write "minimized" HFiles equivalent to the WALs: 
dominated by the speed of writing data. Relative to write speed of HDFS.</p>
+</li>
+<li>
+<p>DistCp the HFiles to the destination: <a 
href="#br.export.snapshot.cost">see above</a>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the second step, the dominating cost of this operation would be the 
re-writing the data (under the assumption that a majority of the
+data in the WAL is preserved). In this case, we can assume an aggregate write 
speed of 30MB/s per node. Continuing our 16-node cluster example,
+this would require approximately 15 minutes to perform this step for 50GB of 
data (50 * 1024 / 60 / 60 = 14.2). The amount of time to start the
+DistCp MapReduce job would likely dominate the actual time taken to copy the 
data (50 / 1.25 = 40 seconds) and can be ignored.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="br.limitations"><a class="anchor" href="#br.limitations"></a>89. 
Limitations of the Backup and Restore Utility</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p><strong>Serial backup operations</strong></p>
+</div>
+<div class="paragraph">
+<p>Backup operations cannot be run concurrently. An operation includes actions 
like create, delete, restore, and merge. Only one active backup session is 
supported. <a 
href="https://issues.apache.org/jira/browse/HBASE-16391";>HBASE-16391</a>
+will introduce multiple-backup sessions support.</p>
+</div>
+<div class="paragraph">
+<p><strong>No means to cancel backups</strong></p>
+</div>
+<div class="paragraph">
+<p>Both backup and restore operations cannot be canceled. (<a 
href="https://issues.apache.org/jira/browse/HBASE-15997";>HBASE-15997</a>, <a 
href="https://issues.apache.org/jira/browse/HBASE-15998";>HBASE-15998</a>).
+The workaround to cancel a backup would be to kill the client-side backup 
command (<code>control-C</code>), ensure all relevant MapReduce jobs have 
exited, and then
+run the <code>hbase backup repair</code> command to ensure the system backup 
metadata is consistent.</p>
+</div>
+<div class="paragraph">
+<p><strong>Backups can only be saved to a single location</strong></p>
+</div>
+<div class="paragraph">
+<p>Copying backup information to multiple locations is an exercise left to the 
user. <a 
href="https://issues.apache.org/jira/browse/HBASE-15476";>HBASE-15476</a> will
+introduce the ability to specify multiple-backup destinations 
intrinsically.</p>
+</div>
+<div class="paragraph">
+<p><strong>HBase superuser access is required</strong></p>
+</div>
+<div class="paragraph">
+<p>Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore, 
can pose a problem for shared HBase installations. Current mitigations would 
require
+coordination with system administrators to build and deploy a backup and 
restore strategy (<a 
href="https://issues.apache.org/jira/browse/HBASE-14138";>HBASE-14138</a>).</p>
+</div>
+<div class="paragraph">
+<p><strong>Backup restoration is an online operation</strong></p>
+</div>
+<div class="paragraph">
+<p>To perform a restore from a backup, it requires that the HBase cluster is 
online as a caveat of the current implementation (<a 
href="https://issues.apache.org/jira/browse/HBASE-16573";>HBASE-16573</a>).</p>
+</div>
+<div class="paragraph">
+<p><strong>Some operations may fail and require re-run</strong></p>
+</div>
+<div class="paragraph">
+<p>The HBase backup feature is primarily client driven. While there is the 
standard HBase retry logic built into the HBase Connection, persistent errors 
in executing operations
+may propagate back to the client (e.g. snapshot failure due to region splits). 
The backup implementation should be moved from client-side into the ProcedureV2 
framework
+in the future which would provide additional robustness around 
transient/retryable failures. The <code>hbase backup repair</code> command is 
meant to correct states which the system
+cannot automatically detect and recover from.</p>
+</div>
+<div class="paragraph">
+<p><strong>Avoidance of declaration of public API</strong></p>
+</div>
+<div class="paragraph">
+<p>While the Java API to interact with this feature exists and its 
implementation is separated from an interface, insufficient rigor has been 
applied to determine if
+it is exactly what we intend to ship to users. As such, it is marked as for a 
<code>Private</code> audience with the expectation that, as users begin to try 
the feature, there
+will be modifications that would necessitate breaking compatibility (<a 
href="https://issues.apache.org/jira/browse/HBASE-17517";>HBASE-17517</a>).</p>
+</div>
+<div class="paragraph">
+<p><strong>Lack of global metrics for backup and restore</strong></p>
+</div>
+<div class="paragraph">
+<p>Individual backup and restore operations contain metrics about the amount 
of work the operation included, but there is no centralized location (e.g. the 
Master UI)
+which present information for consumption (<a 
href="https://issues.apache.org/jira/browse/HBASE-16565";>HBASE-16565</a>).</p>
+</div>
+</div>
+</div>
 <h1 id="hbase_apis" class="sect0"><a class="anchor" 
href="#hbase_apis"></a>Apache HBase APIs</h1>
 <div class="openblock partintro">
 <div class="content">
@@ -17467,7 +18754,7 @@ See <a href="#external_apis">Apache HBase External 
APIs</a> for more information
 </div>
 </div>
 <div class="sect1">
-<h2 id="_examples"><a class="anchor" href="#_examples"></a>76. Examples</h2>
+<h2 id="_examples"><a class="anchor" href="#_examples"></a>90. Examples</h2>
 <div class="sectionbody">
 <div class="exampleblock">
 <div class="title">Example 40. Create, modify and delete a Table Using 
Java</div>
@@ -17578,7 +18865,7 @@ through custom protocols. For information on using the 
native HBase APIs, refer
 </div>
 </div>
 <div class="sect1">
-<h2 id="_rest"><a class="anchor" href="#_rest"></a>77. REST</h2>
+<h2 id="_rest"><a class="anchor" href="#_rest"></a>91. REST</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Representational State Transfer (REST) was introduced in 2000 in the 
doctoral
@@ -17594,7 +18881,7 @@ There is also a nice series of blogs on
 by Jesse Anderson.</p>
 </div>
 <div class="sect2">
-<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" 
href="#_starting_and_stopping_the_rest_server"></a>77.1. Starting and Stopping 
the REST Server</h3>
+<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" 
href="#_starting_and_stopping_the_rest_server"></a>91.1. Starting and Stopping 
the REST Server</h3>
 <div class="paragraph">
 <p>The included REST server can run as a daemon which starts an embedded Jetty
 servlet container and deploys the servlet into it. Use one of the following 
commands
@@ -17621,7 +18908,7 @@ following command if you were running it in the 
background.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" 
href="#_configuring_the_rest_server_and_client"></a>77.2. Configuring the REST 
Server and Client</h3>
+<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" 
href="#_configuring_the_rest_server_and_client"></a>91.2. Configuring the REST 
Server and Client</h3>
 <div class="paragraph">
 <p>For information about configuring the REST server and client for SSL, as 
well as <code>doAs</code>
 impersonation for the REST server, see <a 
href="#security.gateway.thrift">Configure the Thrift Gateway to Authenticate on 
Behalf of the Client</a> and other portions
@@ -17629,7 +18916,7 @@ of the <a href="#security">Securing Apache HBase</a> 
chapter.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_using_rest_endpoints"><a class="anchor" 
href="#_using_rest_endpoints"></a>77.3. Using REST Endpoints</h3>
+<h3 id="_using_rest_endpoints"><a class="anchor" 
href="#_using_rest_endpoints"></a>91.3. Using REST Endpoints</h3>
 <div class="paragraph">
 <p>The following examples use the placeholder server http://example.com:8000, 
and
 the following commands can all be run using <code>curl</code> or 
<code>wget</code> commands. You can request
@@ -17996,7 +19283,7 @@ curl -vi -X PUT \
 </table>
 </div>
 <div class="sect2">
-<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>77.4. REST XML 
Schema</h3>
+<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>91.4. REST XML 
Schema</h3>
 <div class="listingblock">
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="xml"><span 
class="tag">&lt;schema</span> <span class="attribute-name">xmlns</span>=<span 
class="string"><span class="delimiter">&quot;</span><span 
class="content">http://www.w3.org/2001/XMLSchema</span><span 
class="delimiter">&quot;</span></span> <span 
class="attribute-name">xmlns:tns</span>=<span class="string"><span 
class="delimiter">&quot;</span><span class="content">RESTSchema</span><span 
class="delimiter">&quot;</span></span><span class="tag">&gt;</span>
@@ -18154,7 +19441,7 @@ curl -vi -X PUT \
 </div>
 </div>
 <div class="sect2">
-<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>77.5. 
REST Protobufs Schema</h3>
+<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>91.5. 
REST Protobufs Schema</h3>
 <div class="listingblock">
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="json"><span 
class="error">m</span><span class="error">e</span><span 
class="error">s</span><span class="error">s</span><span 
class="error">a</span><span class="error">g</span><span class="error">e</span> 
<span class="error">V</span><span class="error">e</span><span 
class="error">r</span><span class="error">s</span><span 
class="error">i</span><span class="error">o</span><span class="error">n</span> {
@@ -18262,7 +19549,7 @@ curl -vi -X PUT \
 </div>
 </div>
 <div class="sect1">
-<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>78. Thrift</h2>
+<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>92. Thrift</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Documentation about Thrift has moved to <a href="#thrift">Thrift API and 
Filter Language</a>.</p>
@@ -18270,7 +19557,7 @@ curl -vi -X PUT \
 </div>
 </div>
 <div class="sect1">
-<h2 id="c"><a class="anchor" href="#c"></a>79. C/C++ Apache HBase Client</h2>
+<h2 id="c"><a class="anchor" href="#c"></a>93. C/C++ Apache HBase Client</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>FB&#8217;s Chip Turner wrote a pure C/C++ client.
@@ -18282,7 +19569,7 @@ curl -vi -X PUT \
 </div>
 </div>
 <div class="sect1">
-<h2 id="jdo"><a class="anchor" href="#jdo"></a>80. Using Java Data Objects 
(JDO) with HBase</h2>
+<h2 id="jdo"><a class="anchor" href="#jdo"></a>94. Using Java Data Objects 
(JDO) with HBase</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><a href="https://db.apache.org/jdo/";>Java Data Objects (JDO)</a> is a 
standard way to
@@ -18439,10 +19726,10 @@ a row, get a column value, perform a query, and do 
some additional HBase operati
 </div>
 </div>
 <div class="sect1">
-<h2 id="scala"><a class="anchor" href="#scala"></a>81. Scala</h2>
+<h2 id="scala"><a class="anchor" href="#scala"></a>95. Scala</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="_setting_the_classpath"><a class="anchor" 
href="#_setting_the_classpath"></a>81.1. Setting the Classpath</h3>
+<h3 id="_setting_the_classpath"><a class="anchor" 
href="#_setting_the_classpath"></a>95.1. Setting the Classpath</h3>
 <div class="paragraph">
 <p>To use Scala with HBase, your CLASSPATH must include HBase&#8217;s 
classpath as well as
 the Scala JARs required by your code. First, use the following command on a 
server
@@ -18467,7 +19754,7 @@ your project.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>81.2. 
Scala SBT File</h3>
+<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>95.2. 
Scala SBT File</h3>
 <div class="paragraph">
 <p>Your <code>build.sbt</code> file needs the following <code>resolvers</code> 
and <code>libraryDependencies</code> to work
 with HBase.</p>
@@ -18486,7 +19773,7 @@ libraryDependencies ++= Seq(
 </div>
 </div>
 <div class="sect2">
-<h3 id="_example_scala_code"><a class="anchor" 
href="#_example_scala_code"></a>81.3. Example Scala Code</h3>
+<h3 id="_example_scala_code"><a class="anchor" 
href="#_example_scala_code"></a>95.3. Example Scala Code</h3>
 <div class="paragraph">
 <p>This example lists HBase tables, creates a new table, and adds a row to 
it.</p>
 </div>
@@ -18524,10 +19811,10 @@ println(Bytes.toString(value))</code></pre>
 </div>
 </div>
 <div class="sect1">
-<h2 id="jython"><a class="anchor" href="#jython"></a>82. Jython</h2>
+<h2 id="jython"><a class="anchor" href="#jython"></a>96. Jython</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="_setting_the_classpath_2"><a class="anchor" 
href="#_setting_the_classpath_2"></a>82.1. Setting the Classpath</h3>
+<h3 id="_setting_the_classpath_2"><a class="anchor" 
href="#_setting_the_classpath_2"></a>96.1. Setting the Classpath</h3>
 <div class="paragraph">
 <p>To use Jython with HBase, your CLASSPATH must include HBase&#8217;s 
classpath as well as
 the Jython JARs required by your code. First, use the following command on a 
server
@@ -18556,7 +19843,7 @@ $ bin/hbase org.python.util.jython</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_jython_code_examples"><a class="anchor" 
href="#_jython_code_examples"></a>82.2. Jython Code Examples</h3>
+<h3 id="_jython_code_examples"><a class="anchor" 
href="#_jython_code_examples"></a>96.2. Jython Code Examples</h3>
 <div class="exampleblock">
 <div class="title">Example 42. Table Creation, Population, Get, and Delete 
with Jython</div>
 <div class="content">
@@ -18665,7 +19952,7 @@ The Thrift API relies on client and server 
processes.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="thrift.filter_language"><a class="anchor" 
href="#thrift.filter_language"></a>83. Filter Language</h2>
+<h2 id="thrift.filter_language"><a class="anchor" 
href="#thrift.filter_language"></a>97. Filter Language</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Thrift Filter Language was introduced in HBase 0.92.
@@ -18676,7 +19963,7 @@ You can find out more about shell integration by using 
the <code>scan help</code
 <p>You specify a filter as a string, which is parsed on the server to 
construct the filter.</p>
 </div>
 <div class="sect2">
-<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>83.1. 
General Filter String Syntax</h3>
+<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>97.1. 
General Filter Stri


<TRUNCATED>

[42/51] [partial] hbase-site git commit: Published site at .

Reply via email to