http://git-wip-us.apache.org/repos/asf/hbase-site/blob/1a616706/book.html ---------------------------------------------------------------------- diff --git a/book.html b/book.html index 14183f4..300459d 100644 --- a/book.html +++ b/book.html @@ -138,148 +138,166 @@ <li><a href="#hbase_mob">75. Storing Medium-sized Objects (MOB)</a></li> </ul> </li> +<li><a href="#casestudies">Backup and Restore</a> +<ul class="sectlevel1"> +<li><a href="#br.overview">76. Overview</a></li> +<li><a href="#br.terminology">77. Terminology</a></li> +<li><a href="#br.planning">78. Planning</a></li> +<li><a href="#br.initial.setup">79. First-time configuration steps</a></li> +<li><a href="#_backup_and_restore_commands">80. Backup and Restore commands</a></li> +<li><a href="#br.administration">81. Administration of Backup Images</a></li> +<li><a href="#br.backup.configuration">82. Configuration keys</a></li> +<li><a href="#br.best.practices">83. Best Practices</a></li> +<li><a href="#br.s3.backup.scenario">84. Scenario: Safeguarding Application Datasets on Amazon S3</a></li> +<li><a href="#br.data.security">85. Security of Backup Data</a></li> +<li><a href="#br.technical.details">86. Technical Details of Incremental Backup and Restore</a></li> +<li><a href="#br.filesystem.growth.warning">87. A Warning on File System Growth</a></li> +<li><a href="#br.backup.capacity.planning">88. Capacity Planning</a></li> +<li><a href="#br.limitations">89. Limitations of the Backup and Restore Utility</a></li> +</ul> +</li> <li><a href="#hbase_apis">Apache HBase APIs</a> <ul class="sectlevel1"> -<li><a href="#_examples">76. Examples</a></li> +<li><a href="#_examples">90. Examples</a></li> </ul> </li> <li><a href="#external_apis">Apache HBase External APIs</a> <ul class="sectlevel1"> -<li><a href="#_rest">77. REST</a></li> -<li><a href="#_thrift">78. Thrift</a></li> -<li><a href="#c">79. C/C++ Apache HBase Client</a></li> -<li><a href="#jdo">80. Using Java Data Objects (JDO) with HBase</a></li> -<li><a href="#scala">81. Scala</a></li> -<li><a href="#jython">82. Jython</a></li> +<li><a href="#_rest">91. REST</a></li> +<li><a href="#_thrift">92. Thrift</a></li> +<li><a href="#c">93. C/C++ Apache HBase Client</a></li> +<li><a href="#jdo">94. Using Java Data Objects (JDO) with HBase</a></li> +<li><a href="#scala">95. Scala</a></li> +<li><a href="#jython">96. Jython</a></li> </ul> </li> <li><a href="#thrift">Thrift API and Filter Language</a> <ul class="sectlevel1"> -<li><a href="#thrift.filter_language">83. Filter Language</a></li> +<li><a href="#thrift.filter_language">97. Filter Language</a></li> </ul> </li> <li><a href="#spark">HBase and Spark</a> <ul class="sectlevel1"> -<li><a href="#_basic_spark">84. Basic Spark</a></li> -<li><a href="#_spark_streaming">85. Spark Streaming</a></li> -<li><a href="#_bulk_load">86. Bulk Load</a></li> -<li><a href="#_sparksql_dataframes">87. SparkSQL/DataFrames</a></li> +<li><a href="#_basic_spark">98. Basic Spark</a></li> +<li><a href="#_spark_streaming">99. Spark Streaming</a></li> +<li><a href="#_bulk_load">100. Bulk Load</a></li> +<li><a href="#_sparksql_dataframes">101. SparkSQL/DataFrames</a></li> </ul> </li> <li><a href="#cp">Apache HBase Coprocessors</a> <ul class="sectlevel1"> -<li><a href="#_coprocessor_overview">88. Coprocessor Overview</a></li> -<li><a href="#_types_of_coprocessors">89. Types of Coprocessors</a></li> -<li><a href="#cp_loading">90. Loading Coprocessors</a></li> -<li><a href="#cp_example">91. Examples</a></li> -<li><a href="#_guidelines_for_deploying_a_coprocessor">92. Guidelines For Deploying A Coprocessor</a></li> -<li><a href="#_restricting_coprocessor_usage">93. Restricting Coprocessor Usage</a></li> +<li><a href="#_coprocessor_overview">102. Coprocessor Overview</a></li> +<li><a href="#_types_of_coprocessors">103. Types of Coprocessors</a></li> +<li><a href="#cp_loading">104. Loading Coprocessors</a></li> +<li><a href="#cp_example">105. Examples</a></li> +<li><a href="#_guidelines_for_deploying_a_coprocessor">106. Guidelines For Deploying A Coprocessor</a></li> +<li><a href="#_restricting_coprocessor_usage">107. Restricting Coprocessor Usage</a></li> </ul> </li> <li><a href="#performance">Apache HBase Performance Tuning</a> <ul class="sectlevel1"> -<li><a href="#perf.os">94. Operating System</a></li> -<li><a href="#perf.network">95. Network</a></li> -<li><a href="#jvm">96. Java</a></li> -<li><a href="#perf.configurations">97. HBase Configurations</a></li> -<li><a href="#perf.zookeeper">98. ZooKeeper</a></li> -<li><a href="#perf.schema">99. Schema Design</a></li> -<li><a href="#perf.general">100. HBase General Patterns</a></li> -<li><a href="#perf.writing">101. Writing to HBase</a></li> -<li><a href="#perf.reading">102. Reading from HBase</a></li> -<li><a href="#perf.deleting">103. Deleting from HBase</a></li> -<li><a href="#perf.hdfs">104. HDFS</a></li> -<li><a href="#perf.ec2">105. Amazon EC2</a></li> -<li><a href="#perf.hbase.mr.cluster">106. Collocating HBase and MapReduce</a></li> -<li><a href="#perf.casestudy">107. Case Studies</a></li> +<li><a href="#perf.os">108. Operating System</a></li> +<li><a href="#perf.network">109. Network</a></li> +<li><a href="#jvm">110. Java</a></li> +<li><a href="#perf.configurations">111. HBase Configurations</a></li> +<li><a href="#perf.zookeeper">112. ZooKeeper</a></li> +<li><a href="#perf.schema">113. Schema Design</a></li> +<li><a href="#perf.general">114. HBase General Patterns</a></li> +<li><a href="#perf.writing">115. Writing to HBase</a></li> +<li><a href="#perf.reading">116. Reading from HBase</a></li> +<li><a href="#perf.deleting">117. Deleting from HBase</a></li> +<li><a href="#perf.hdfs">118. HDFS</a></li> +<li><a href="#perf.ec2">119. Amazon EC2</a></li> +<li><a href="#perf.hbase.mr.cluster">120. Collocating HBase and MapReduce</a></li> +<li><a href="#perf.casestudy">121. Case Studies</a></li> </ul> </li> <li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a> <ul class="sectlevel1"> -<li><a href="#trouble.general">108. General Guidelines</a></li> -<li><a href="#trouble.log">109. Logs</a></li> -<li><a href="#trouble.resources">110. Resources</a></li> -<li><a href="#trouble.tools">111. Tools</a></li> -<li><a href="#trouble.client">112. Client</a></li> -<li><a href="#trouble.mapreduce">113. MapReduce</a></li> -<li><a href="#trouble.namenode">114. NameNode</a></li> -<li><a href="#trouble.network">115. Network</a></li> -<li><a href="#trouble.rs">116. RegionServer</a></li> -<li><a href="#trouble.master">117. Master</a></li> -<li><a href="#trouble.zookeeper">118. ZooKeeper</a></li> -<li><a href="#trouble.ec2">119. Amazon EC2</a></li> -<li><a href="#trouble.versions">120. HBase and Hadoop version issues</a></li> -<li><a href="#_ipc_configuration_conflicts_with_hadoop">121. IPC Configuration Conflicts with Hadoop</a></li> -<li><a href="#_hbase_and_hdfs">122. HBase and HDFS</a></li> -<li><a href="#trouble.tests">123. Running unit or integration tests</a></li> -<li><a href="#trouble.casestudy">124. Case Studies</a></li> -<li><a href="#trouble.crypto">125. Cryptographic Features</a></li> -<li><a href="#_operating_system_specific_issues">126. Operating System Specific Issues</a></li> -<li><a href="#_jdk_issues">127. JDK Issues</a></li> +<li><a href="#trouble.general">122. General Guidelines</a></li> +<li><a href="#trouble.log">123. Logs</a></li> +<li><a href="#trouble.resources">124. Resources</a></li> +<li><a href="#trouble.tools">125. Tools</a></li> +<li><a href="#trouble.client">126. Client</a></li> +<li><a href="#trouble.mapreduce">127. MapReduce</a></li> +<li><a href="#trouble.namenode">128. NameNode</a></li> +<li><a href="#trouble.network">129. Network</a></li> +<li><a href="#trouble.rs">130. RegionServer</a></li> +<li><a href="#trouble.master">131. Master</a></li> +<li><a href="#trouble.zookeeper">132. ZooKeeper</a></li> +<li><a href="#trouble.ec2">133. Amazon EC2</a></li> +<li><a href="#trouble.versions">134. HBase and Hadoop version issues</a></li> +<li><a href="#_ipc_configuration_conflicts_with_hadoop">135. IPC Configuration Conflicts with Hadoop</a></li> +<li><a href="#_hbase_and_hdfs">136. HBase and HDFS</a></li> +<li><a href="#trouble.tests">137. Running unit or integration tests</a></li> +<li><a href="#trouble.casestudy">138. Case Studies</a></li> +<li><a href="#trouble.crypto">139. Cryptographic Features</a></li> +<li><a href="#_operating_system_specific_issues">140. Operating System Specific Issues</a></li> +<li><a href="#_jdk_issues">141. JDK Issues</a></li> </ul> </li> <li><a href="#casestudies">Apache HBase Case Studies</a> <ul class="sectlevel1"> -<li><a href="#casestudies.overview">128. Overview</a></li> -<li><a href="#casestudies.schema">129. Schema Design</a></li> -<li><a href="#casestudies.perftroub">130. Performance/Troubleshooting</a></li> +<li><a href="#casestudies.overview">142. Overview</a></li> +<li><a href="#casestudies.schema">143. Schema Design</a></li> +<li><a href="#casestudies.perftroub">144. Performance/Troubleshooting</a></li> </ul> </li> <li><a href="#ops_mgt">Apache HBase Operational Management</a> <ul class="sectlevel1"> -<li><a href="#tools">131. HBase Tools and Utilities</a></li> -<li><a href="#ops.regionmgt">132. Region Management</a></li> -<li><a href="#node.management">133. Node Management</a></li> -<li><a href="#hbase_metrics">134. HBase Metrics</a></li> -<li><a href="#ops.monitoring">135. HBase Monitoring</a></li> -<li><a href="#_cluster_replication">136. Cluster Replication</a></li> -<li><a href="#_running_multiple_workloads_on_a_single_cluster">137. Running Multiple Workloads On a Single Cluster</a></li> -<li><a href="#ops.backup">138. HBase Backup</a></li> -<li><a href="#ops.snapshots">139. HBase Snapshots</a></li> -<li><a href="#snapshots_azure">140. Storing Snapshots in Microsoft Azure Blob Storage</a></li> -<li><a href="#ops.capacity">141. Capacity Planning and Region Sizing</a></li> -<li><a href="#table.rename">142. Table Rename</a></li> -<li><a href="#rsgroup">143. RegionServer Grouping</a></li> +<li><a href="#tools">145. HBase Tools and Utilities</a></li> +<li><a href="#ops.regionmgt">146. Region Management</a></li> +<li><a href="#node.management">147. Node Management</a></li> +<li><a href="#hbase_metrics">148. HBase Metrics</a></li> +<li><a href="#ops.monitoring">149. HBase Monitoring</a></li> +<li><a href="#_cluster_replication">150. Cluster Replication</a></li> +<li><a href="#_running_multiple_workloads_on_a_single_cluster">151. Running Multiple Workloads On a Single Cluster</a></li> +<li><a href="#ops.backup">152. HBase Backup</a></li> +<li><a href="#ops.snapshots">153. HBase Snapshots</a></li> +<li><a href="#snapshots_azure">154. Storing Snapshots in Microsoft Azure Blob Storage</a></li> +<li><a href="#ops.capacity">155. Capacity Planning and Region Sizing</a></li> +<li><a href="#table.rename">156. Table Rename</a></li> +<li><a href="#rsgroup">157. RegionServer Grouping</a></li> </ul> </li> <li><a href="#developer">Building and Developing Apache HBase</a> <ul class="sectlevel1"> -<li><a href="#getting.involved">144. Getting Involved</a></li> -<li><a href="#repos">145. Apache HBase Repositories</a></li> -<li><a href="#_ides">146. IDEs</a></li> -<li><a href="#build">147. Building Apache HBase</a></li> -<li><a href="#releasing">148. Releasing Apache HBase</a></li> -<li><a href="#hbase.rc.voting">149. Voting on Release Candidates</a></li> -<li><a href="#documentation">150. Generating the HBase Reference Guide</a></li> -<li><a href="#hbase.org">151. Updating <a href="https://hbase.apache.org">hbase.apache.org</a></a></li> -<li><a href="#hbase.tests">152. Tests</a></li> -<li><a href="#developing">153. Developer Guidelines</a></li> +<li><a href="#getting.involved">158. Getting Involved</a></li> +<li><a href="#repos">159. Apache HBase Repositories</a></li> +<li><a href="#_ides">160. IDEs</a></li> +<li><a href="#build">161. Building Apache HBase</a></li> +<li><a href="#releasing">162. Releasing Apache HBase</a></li> +<li><a href="#hbase.rc.voting">163. Voting on Release Candidates</a></li> +<li><a href="#documentation">164. Generating the HBase Reference Guide</a></li> +<li><a href="#hbase.org">165. Updating <a href="https://hbase.apache.org">hbase.apache.org</a></a></li> +<li><a href="#hbase.tests">166. Tests</a></li> +<li><a href="#developing">167. Developer Guidelines</a></li> </ul> </li> <li><a href="#unit.tests">Unit Testing HBase Applications</a> <ul class="sectlevel1"> -<li><a href="#_junit">154. JUnit</a></li> -<li><a href="#mockito">155. Mockito</a></li> -<li><a href="#_mrunit">156. MRUnit</a></li> -<li><a href="#_integration_testing_with_an_hbase_mini_cluster">157. Integration Testing with an HBase Mini-Cluster</a></li> +<li><a href="#_junit">168. JUnit</a></li> +<li><a href="#mockito">169. Mockito</a></li> +<li><a href="#_mrunit">170. MRUnit</a></li> +<li><a href="#_integration_testing_with_an_hbase_mini_cluster">171. Integration Testing with an HBase Mini-Cluster</a></li> </ul> </li> <li><a href="#protobuf">Protobuf in HBase</a> <ul class="sectlevel1"> -<li><a href="#_protobuf">158. Protobuf</a></li> +<li><a href="#_protobuf">172. Protobuf</a></li> </ul> </li> <li><a href="#zookeeper">ZooKeeper</a> <ul class="sectlevel1"> -<li><a href="#_using_existing_zookeeper_ensemble">159. Using existing ZooKeeper ensemble</a></li> -<li><a href="#zk.sasl.auth">160. SASL Authentication with ZooKeeper</a></li> +<li><a href="#_using_existing_zookeeper_ensemble">173. Using existing ZooKeeper ensemble</a></li> +<li><a href="#zk.sasl.auth">174. SASL Authentication with ZooKeeper</a></li> </ul> </li> <li><a href="#community">Community</a> <ul class="sectlevel1"> -<li><a href="#_decisions">161. Decisions</a></li> -<li><a href="#community.roles">162. Community Roles</a></li> -<li><a href="#hbase.commit.msg.format">163. Commit Message format</a></li> +<li><a href="#_decisions">175. Decisions</a></li> +<li><a href="#community.roles">176. Community Roles</a></li> +<li><a href="#hbase.commit.msg.format">177. Commit Message format</a></li> </ul> </li> <li><a href="#_appendix">Appendix</a> @@ -289,7 +307,7 @@ <li><a href="#hbck.in.depth">Appendix C: hbck In Depth</a></li> <li><a href="#appendix_acl_matrix">Appendix D: Access Control Matrix</a></li> <li><a href="#compression">Appendix E: Compression and Data Block Encoding In HBase</a></li> -<li><a href="#data.block.encoding.enable">164. Enable Data Block Encoding</a></li> +<li><a href="#data.block.encoding.enable">178. Enable Data Block Encoding</a></li> <li><a href="#sql">Appendix F: SQL over HBase</a></li> <li><a href="#ycsb">Appendix G: YCSB</a></li> <li><a href="#_hfile_format_2">Appendix H: HFile format</a></li> @@ -298,8 +316,8 @@ <li><a href="#asf">Appendix K: HBase and the Apache Software Foundation</a></li> <li><a href="#orca">Appendix L: Apache HBase Orca</a></li> <li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in HBase</a></li> -<li><a href="#tracing.client.modifications">165. Client Modifications</a></li> -<li><a href="#tracing.client.shell">166. Tracing from HBase Shell</a></li> +<li><a href="#tracing.client.modifications">179. Client Modifications</a></li> +<li><a href="#tracing.client.shell">180. Tracing from HBase Shell</a></li> <li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li> </ul> </li> @@ -17452,6 +17470,1275 @@ hbase> major_compact 't1', 'c1â, âMOBâ</pre> </div> </div> </div> +<h1 id="casestudies" class="sect0"><a class="anchor" href="#casestudies"></a>Backup and Restore</h1> +<div class="sect1"> +<h2 id="br.overview"><a class="anchor" href="#br.overview"></a>76. Overview</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>Backup and restore is a standard operation provided by many databases. An effective backup and restore +strategy helps ensure that users can recover data in case of unexpected failures. The HBase backup and restore +feature helps ensure that enterprises using HBase as a canonical data repository can recover from catastrophic +failures. Another important feature is the ability to restore the database to a particular +point-in-time, commonly referred to as a snapshot.</p> +</div> +<div class="paragraph"> +<p>The HBase backup and restore feature provides the ability to create full backups and incremental backups on +tables in an HBase cluster. The full backup is the foundation on which incremental backups are applied +to build iterative snapshots. Incremental backups can be run on a schedule to capture changes over time, +for example by using a Cron task. Incremental backups are more cost-effective than full backups because they only capture +the changes since the last backup and they also enable administrators to restore the database to any prior incremental backup. Furthermore, the +utilities also enable table-level data backup-and-recovery if you do not want to restore the entire dataset +of the backup.</p> +</div> +<div class="paragraph"> +<p>The backup and restore feature supplements the HBase Replication feature. While HBase replication is ideal for +creating "hot" copies of the data (where the replicated data is immediately available for query), the backup and +restore feature is ideal for creating "cold" copies of data (where a manual step must be taken to restore the system). +Previously, users only had the ability to create full backups via the ExportSnapshot functionality. The incremental +backup implementation is the novel improvement over the previous "art" provided by ExportSnapshot.</p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.terminology"><a class="anchor" href="#br.terminology"></a>77. Terminology</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>The backup and restore feature introduces new terminology which can be used to understand how control flows through the +system.</p> +</div> +<div class="ulist"> +<ul> +<li> +<p><em>A backup</em>: A logical unit of data and metadata which can restore a table to its state at a specific point in time.</p> +</li> +<li> +<p><em>Full backup</em>: a type of backup which wholly encapsulates the contents of the table at a point in time.</p> +</li> +<li> +<p><em>Incremental backup</em>: a type of backup which contains the changes in a table since a full backup.</p> +</li> +<li> +<p><em>Backup set</em>: A user-defined name which references one or more tables over which a backup can be executed.</p> +</li> +<li> +<p><em>Backup ID</em>: A unique names which identifies one backup from the rest, e.g. <code>backupId_1467823988425</code></p> +</li> +</ul> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.planning"><a class="anchor" href="#br.planning"></a>78. Planning</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>There are some common strategies which can be used to implement backup and restore in your environment. The following section +shows how these strategies are implemented and identifies potential tradeoffs with each.</p> +</div> +<div class="admonitionblock warning"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-warning" title="Warning"></i> +</td> +<td class="content"> +This backup and restore tools has not been tested on Transparent Data Encryption (TDE) enabled HDFS clusters. +This is related to the open issue <a href="https://issues.apache.org/jira/browse/HBASE-16178">HBASE-16178</a>. +</td> +</tr> +</table> +</div> +<div class="sect2"> +<h3 id="br.intracluster.backup"><a class="anchor" href="#br.intracluster.backup"></a>78.1. Backup within a cluster</h3> +<div class="paragraph"> +<p>This strategy stores the backups on the same cluster as where the backup was taken. This approach is only appropriate for testing +as it does not provide any additional safety on top of what the software itself already provides.</p> +</div> +<div class="imageblock"> +<div class="content"> +<img src="images/backup-intra-cluster.png" alt="backup intra cluster"> +</div> +<div class="title">Figure 4. Intra-Cluster Backup</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.dedicated.cluster.backup"><a class="anchor" href="#br.dedicated.cluster.backup"></a>78.2. Backup using a dedicated cluster</h3> +<div class="paragraph"> +<p>This strategy provides greater fault tolerance and provides a path towards disaster recovery. In this setting, you will +store the backup on a separate HDFS cluster by supplying the backup destination clusterâs HDFS URL to the backup utility. +You should consider backing up to a different physical location, such as a different data center.</p> +</div> +<div class="paragraph"> +<p>Typically, a backup-dedicated HDFS cluster uses a more economical hardware profile to save money.</p> +</div> +<div class="imageblock"> +<div class="content"> +<img src="images/backup-dedicated-cluster.png" alt="backup dedicated cluster"> +</div> +<div class="title">Figure 5. Dedicated HDFS Cluster Backup</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.cloud.or.vendor.backup"><a class="anchor" href="#br.cloud.or.vendor.backup"></a>78.3. Backup to the Cloud or a storage vendor appliance</h3> +<div class="paragraph"> +<p>Another approach to safeguarding HBase incremental backups is to store the data on provisioned, secure servers that belong +to third-party vendors and that are located off-site. The vendor can be a public cloud provider or a storage vendor who uses +a Hadoop-compatible file system, such as S3 and other HDFS-compatible destinations.</p> +</div> +<div class="imageblock"> +<div class="content"> +<img src="images/backup-cloud-appliance.png" alt="backup cloud appliance"> +</div> +<div class="title">Figure 6. Backup to Cloud or Vendor Storage Solutions</div> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +The HBase backup utility does not support backup to multiple destinations. A workaround is to manually create copies +of the backup files from HDFS or S3. +</td> +</tr> +</table> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.initial.setup"><a class="anchor" href="#br.initial.setup"></a>79. First-time configuration steps</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>This section contains the necessary configuration changes that must be made in order to use the backup and restore feature. +As this feature makes significant use of YARN’s MapReduce framework to parallelize these I/O heavy operations, configuration +changes extend outside of just <code>hbase-site.xml</code>.</p> +</div> +<div class="sect2"> +<h3 id="_allow_the_hbase_system_user_in_yarn"><a class="anchor" href="#_allow_the_hbase_system_user_in_yarn"></a>79.1. Allow the "hbase" system user in YARN</h3> +<div class="paragraph"> +<p>The YARN <strong>container-executor.cfg</strong> configuration file must have the following property setting: <em>allowed.system.users=hbase</em>. No spaces +are allowed in entries of this configuration file.</p> +</div> +<div class="admonitionblock warning"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-warning" title="Warning"></i> +</td> +<td class="content"> +Skipping this step will result in runtime errors when executing the first backup tasks. +</td> +</tr> +</table> +</div> +<div class="paragraph"> +<p><strong>Example of a valid container-executor.cfg file for backup and restore:</strong></p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java">yarn.nodemanager.log-dirs=/var/log/hadoop/mapred +yarn.nodemanager.linux-container-executor.group=yarn +banned.users=hdfs,yarn,mapred,bin +allowed.system.users=hbase +min.user.id=<span class="integer">500</span></code></pre> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="_hbase_specific_changes"><a class="anchor" href="#_hbase_specific_changes"></a>79.2. HBase specific changes</h3> +<div class="paragraph"> +<p>Add the following properties to hbase-site.xml and restart HBase if it is already running.</p> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +The ",…​" is an ellipsis meant to imply that this is a comma-separated list of values, not literal text which should be added to hbase-site.xml. +</td> +</tr> +</table> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><property> + <name>hbase.backup.enable</name> + <value><span class="predefined-constant">true</span></value> +</property> +<property> + <name>hbase.master.logcleaner.plugins</name> + <value>org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...</value> +</property> +<property> + <name>hbase.procedure.master.classes</name> + <value>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager,...</value> +</property> +<property> + <name>hbase.procedure.regionserver.classes</name> + <value>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager,...</value> +</property> +<property> + <name>hbase.coprocessor.region.classes</name> + <value>org.apache.hadoop.hbase.backup.BackupObserver,...</value> +</property> +<property> + <name>hbase.master.hfilecleaner.plugins</name> + <value>org.apache.hadoop.hbase.backup.BackupHFileCleaner,...</value> +</property></code></pre> +</div> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="_backup_and_restore_commands"><a class="anchor" href="#_backup_and_restore_commands"></a>80. Backup and Restore commands</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>This covers the command-line utilities that administrators would run to create, restore, and merge backups. Tools to +inspect details on specific backup sessions is covered in the next section, <a href="#br.administration">Administration of Backup Images</a>.</p> +</div> +<div class="paragraph"> +<p>Run the command <code>hbase backup help <command></code> to access the online help that provides basic information about a command +and its options. The below information is captured in this help message for each command.</p> +</div> +<div class="sect2"> +<h3 id="br.creating.complete.backup"><a class="anchor" href="#br.creating.complete.backup"></a>80.1. Creating a Backup Image</h3> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +<div class="paragraph"> +<p>For HBase clusters also using Apache Phoenix: include the SQL system catalog tables in the backup. In the event that you +need to restore the HBase backup, access to the system catalog tables enable you to resume Phoenix interoperability with the +restored data.</p> +</div> +</td> +</tr> +</table> +</div> +<div class="paragraph"> +<p>The first step in running the backup and restore utilities is to perform a full backup and to store the data in a separate image +from the source. At a minimum, you must do this to get a baseline before you can rely on incremental backups.</p> +</div> +<div class="paragraph"> +<p>Run the following command as HBase superuser:</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java">hbase backup create <type> <backup_path></code></pre> +</div> +</div> +<div class="paragraph"> +<p>After the command finishes running, the console prints a SUCCESS or FAILURE status message. The SUCCESS message includes a <em>backup</em> ID. +The backup ID is the Unix time (also known as Epoch time) that the HBase master received the backup request from the client.</p> +</div> +<div class="admonitionblock tip"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-tip" title="Tip"></i> +</td> +<td class="content"> +<div class="paragraph"> +<p>Record the backup ID that appears at the end of a successful backup. In case the source cluster fails and you need to recover the +dataset with a restore operation, having the backup ID readily available can save time.</p> +</div> +</td> +</tr> +</table> +</div> +<div class="sect3"> +<h4 id="br.create.positional.cli.arguments"><a class="anchor" href="#br.create.positional.cli.arguments"></a>80.1.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>type</em></dt> +<dd> +<p>The type of backup to execute: <em>full</em> or <em>incremental</em>. As a reminder, an <em>incremental</em> backup requires a <em>full</em> backup to +already exist.</p> +</dd> +<dt class="hdlist1"><em>backup_path</em></dt> +<dd> +<p>The <em>backup_path</em> argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are +are <em>hdfs:</em>, <em>webhdfs:</em>, <em>gpfs:</em>, and <em>s3fs:</em>.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.create.named.cli.arguments"><a class="anchor" href="#br.create.named.cli.arguments"></a>80.1.2. Named Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>-t <table_name[,table_name]></em></dt> +<dd> +<p>A comma-separated list of tables to back up. If no tables are specified, all tables are backed up. No regular-expression or +wildcard support is present; all table names must be explicitly listed. See <a href="#br.using.backup.sets">Backup Sets</a> for more +information about peforming operations on collections of tables. Mutually exclusive with the <em>-s</em> option; one of these +named options are required.</p> +</dd> +<dt class="hdlist1"><em>-s <backup_set_name></em></dt> +<dd> +<p>Identify tables to backup based on a backup set. See <a href="#br.using.backup.sets">Using Backup Sets</a> for the purpose and usage +of backup sets. Mutually exclusive with the <em>-t</em> option.</p> +</dd> +<dt class="hdlist1"><em>-w <number_workers></em></dt> +<dd> +<p>(Optional) Specifies the number of parallel workers to copy data to backup destination. Backups are currently executed by MapReduce jobs +so this value corresponds to the number of Mappers that will be spawned by the job.</p> +</dd> +<dt class="hdlist1"><em>-b <bandwidth_per_worker></em></dt> +<dd> +<p>(Optional) Specifies the bandwidth of each worker in MB per second.</p> +</dd> +<dt class="hdlist1"><em>-d</em></dt> +<dd> +<p>(Optional) Enables "DEBUG" mode which prints additional logging about the backup creation.</p> +</dd> +<dt class="hdlist1"><em>-q <name></em></dt> +<dd> +<p>(Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option +is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.usage.examples"><a class="anchor" href="#br.usage.examples"></a>80.1.3. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup create full hdfs:<span class="comment">//host5:8020/data/backup -t SALES2,SALES3 -w 3</span></code></pre> +</div> +</div> +<div class="paragraph"> +<p>This command creates a full backup image of two tables, SALES2 and SALES3, in the HDFS instance who NameNode is host5:8020 +in the path <em>/data/backup</em>. The <em>-w</em> option specifies that no more than three parallel works complete the operation.</p> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.restoring.backup"><a class="anchor" href="#br.restoring.backup"></a>80.2. Restoring a Backup Image</h3> +<div class="paragraph"> +<p>Run the following command as an HBase superuser. You can only restore a backup on a running HBase cluster because the data must be +redistributed the RegionServers for the operation to complete successfully.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java">hbase restore <backup_path> <backup_id></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.restore.positional.args"><a class="anchor" href="#br.restore.positional.args"></a>80.2.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_path</em></dt> +<dd> +<p>The <em>backup_path</em> argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are +are <em>hdfs:</em>, <em>webhdfs:</em>, <em>gpfs:</em>, and <em>s3fs:</em>.</p> +</dd> +<dt class="hdlist1"><em>backup_id</em></dt> +<dd> +<p>The backup ID that uniquely identifies the backup image to be restored.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.restore.named.args"><a class="anchor" href="#br.restore.named.args"></a>80.2.2. Named Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>-t <table_name[,table_name]></em></dt> +<dd> +<p>A comma-separated list of tables to restore. See <a href="#br.using.backup.sets">Backup Sets</a> for more +information about peforming operations on collections of tables. Mutually exclusive with the <em>-s</em> option; one of these +named options are required.</p> +</dd> +<dt class="hdlist1"><em>-s <backup_set_name></em></dt> +<dd> +<p>Identify tables to backup based on a backup set. See <a href="#br.using.backup.sets">Using Backup Sets</a> for the purpose and usage +of backup sets. Mutually exclusive with the <em>-t</em> option.</p> +</dd> +<dt class="hdlist1"><em>-q <name></em></dt> +<dd> +<p>(Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option +is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance.</p> +</dd> +<dt class="hdlist1"><em>-c</em></dt> +<dd> +<p>(Optional) Perform a dry-run of the restore. The actions are checked, but not executed.</p> +</dd> +<dt class="hdlist1"><em>-m <target_tables></em></dt> +<dd> +<p>(Optional) A comma-separated list of tables to restore into. If this option is not provided, the original table name is used. When +this option is provided, there must be an equal number of entries provided in the <code>-t</code> option.</p> +</dd> +<dt class="hdlist1"><em>-o</em></dt> +<dd> +<p>(Optional) Overwrites the target table for the restore if the table already exists.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.restore.usage"><a class="anchor" href="#br.restore.usage"></a>80.2.3. Example of Usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java">hbase backup restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2</code></pre> +</div> +</div> +<div class="paragraph"> +<p>This command restores two tables of an incremental backup image. In this example: +⢠<code>/tmp/backup_incremental</code> is the path to the directory containing the backup image. +⢠<code>backupId_1467823988425</code> is the backup ID. +⢠<code>mytable1</code> and <code>mytable2</code> are the names of tables in the backup image to be restored.</p> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.merge.backup"><a class="anchor" href="#br.merge.backup"></a>80.3. Merging Incremental Backup Images</h3> +<div class="paragraph"> +<p>This command can be used to merge two or more incremental backup images into a single incremental +backup image. This can be used to consolidate multiple, small incremental backup images into a single +larger incremental backup image. This command could be used to merge hourly incremental backups +into a daily incremental backup image, or daily incremental backups into a weekly incremental backup.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup merge <backup_ids></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.merge.backup.positional.cli.arguments"><a class="anchor" href="#br.merge.backup.positional.cli.arguments"></a>80.3.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_ids</em></dt> +<dd> +<p>A comma-separated list of incremental backup image IDs that are to be combined into a single image.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.merge.backup.named.cli.arguments"><a class="anchor" href="#br.merge.backup.named.cli.arguments"></a>80.3.2. Named Command-Line Arguments</h4> +<div class="paragraph"> +<p>None.</p> +</div> +</div> +<div class="sect3"> +<h4 id="br.merge.backup.example"><a class="anchor" href="#br.merge.backup.example"></a>80.3.3. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup merge backupId_1467823988425,backupId_1467827588425</code></pre> +</div> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.using.backup.sets"><a class="anchor" href="#br.using.backup.sets"></a>80.4. Using Backup Sets</h3> +<div class="paragraph"> +<p>Backup sets can ease the administration of HBase data backups and restores by reducing the amount of repetitive input +of table names. You can group tables into a named backup set with the <code>hbase backup set add</code> command. You can then use +the -set option to invoke the name of a backup set in the <code>hbase backup create</code> or <code>hbase backup restore</code> rather than list +individually every table in the group. You can have multiple backup sets.</p> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +Note the differentiation between the <code>hbase backup set add</code> command and the <em>-set</em> option. The <code>hbase backup set add</code> +command must be run before using the <code>-set</code> option in a different command because backup sets must be named and defined +before using backup sets as a shortcut. +</td> +</tr> +</table> +</div> +<div class="paragraph"> +<p>If you run the <code>hbase backup set add</code> command and specify a backup set name that does not yet exist on your system, a new set +is created. If you run the command with the name of an existing backup set name, then the tables that you specify are added +to the set.</p> +</div> +<div class="paragraph"> +<p>In this command, the backup set name is case-sensitive.</p> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +The metadata of backup sets are stored within HBase. If you do not have access to the original HBase cluster with the +backup set metadata, then you must specify individual table names to restore the data. +</td> +</tr> +</table> +</div> +<div class="paragraph"> +<p>To create a backup set, run the following command as the HBase superuser:</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup set <subcommand> <backup_set_name> <tables></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.set.subcommands"><a class="anchor" href="#br.set.subcommands"></a>80.4.1. Backup Set Subcommands</h4> +<div class="paragraph"> +<p>The following list details subcommands of the hbase backup set command.</p> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +You must enter one (and no more than one) of the following subcommands after hbase backup set to complete an operation. +Also, the backup set name is case-sensitive in the command-line utility. +</td> +</tr> +</table> +</div> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>add</em></dt> +<dd> +<p>Adds table[s] to a backup set. Specify a <em>backup_set_name</em> value after this argument to create a backup set.</p> +</dd> +<dt class="hdlist1"><em>remove</em></dt> +<dd> +<p>Removes tables from the set. Specify the tables to remove in the tables argument.</p> +</dd> +<dt class="hdlist1"><em>list</em></dt> +<dd> +<p>Lists all backup sets.</p> +</dd> +<dt class="hdlist1"><em>describe</em></dt> +<dd> +<p>Displays a description of a backup set. The information includes whether the set has full +or incremental backups, start and end times of the backups, and a list of the tables in the set. This subcommand must precede +a valid value for the <em>backup_set_name</em> value.</p> +</dd> +<dt class="hdlist1"><em>delete</em></dt> +<dd> +<p>Deletes a backup set. Enter the value for the <em>backup_set_name</em> option directly after the <code>hbase backup set delete</code> command.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.set.positional.cli.arguments"><a class="anchor" href="#br.set.positional.cli.arguments"></a>80.4.2. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_set_name</em></dt> +<dd> +<p>Use to assign or invoke a backup set name. The backup set name must contain only printable characters and cannot have any spaces.</p> +</dd> +<dt class="hdlist1"><em>tables</em></dt> +<dd> +<p>List of tables (or a single table) to include in the backup set. Enter the table names as a comma-separated list. If no tables +are specified, all tables are included in the set.</p> +</dd> +</dl> +</div> +<div class="admonitionblock tip"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-tip" title="Tip"></i> +</td> +<td class="content"> +Maintain a log or other record of the case-sensitive backup set names and the corresponding tables in each set on a separate +or remote cluster, backup strategy. This information can help you in case of failure on the primary cluster. +</td> +</tr> +</table> +</div> +</div> +<div class="sect3"> +<h4 id="br.set.usage"><a class="anchor" href="#br.set.usage"></a>80.4.3. Example of Usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup set add Q1Data TEAM3,TEAM_4</code></pre> +</div> +</div> +<div class="paragraph"> +<p>Depending on the environment, this command results in <em>one</em> of the following actions:</p> +</div> +<div class="ulist"> +<ul> +<li> +<p>If the <code>Q1Data</code> backup set does not exist, a backup set containing tables <code>TEAM_3</code> and <code>TEAM_4</code> is created.</p> +</li> +<li> +<p>If the <code>Q1Data</code> backup set exists already, the tables <code>TEAM_3</code> and <code>TEAM_4</code> are added to the <code>Q1Data</code> backup set.</p> +</li> +</ul> +</div> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.administration"><a class="anchor" href="#br.administration"></a>81. Administration of Backup Images</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>The <code>hbase backup</code> command has several subcommands that help with administering backup images as they accumulate. Most production +environments require recurring backups, so it is necessary to have utilities to help manage the data of the backup repository. +Some subcommands enable you to find information that can help identify backups that are relevant in a search for particular data. +You can also delete backup images.</p> +</div> +<div class="paragraph"> +<p>The following list details each <code>hbase backup subcommand</code> that can help administer backups. Run the full command-subcommand line as +the HBase superuser.</p> +</div> +<div class="sect2"> +<h3 id="br.managing.backup.progress"><a class="anchor" href="#br.managing.backup.progress"></a>81.1. Managing Backup Progress</h3> +<div class="paragraph"> +<p>You can monitor a running backup in another terminal session by running the <em>hbase backup progress</em> command and specifying the backup ID as an argument.</p> +</div> +<div class="paragraph"> +<p>For example, run the following command as hbase superuser to view the progress of a backup</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup progress <backup_id></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.progress.positional.cli.arguments"><a class="anchor" href="#br.progress.positional.cli.arguments"></a>81.1.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_id</em></dt> +<dd> +<p>Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.progress.named.cli.arguments"><a class="anchor" href="#br.progress.named.cli.arguments"></a>81.1.2. Named Command-Line Arguments</h4> +<div class="paragraph"> +<p>None.</p> +</div> +</div> +<div class="sect3"> +<h4 id="br.progress.example"><a class="anchor" href="#br.progress.example"></a>81.1.3. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java">hbase backup progress backupId_1467823988425</code></pre> +</div> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.managing.backup.history"><a class="anchor" href="#br.managing.backup.history"></a>81.2. Managing Backup History</h3> +<div class="paragraph"> +<p>This command displays a log of backup sessions. The information for each session includes backup ID, type (full or incremental), the tables +in the backup, status, and start and end time. Specify the number of backup sessions to display with the optional -n argument.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup history <backup_id></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.history.positional.cli.arguments"><a class="anchor" href="#br.history.positional.cli.arguments"></a>81.2.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_id</em></dt> +<dd> +<p>Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.history.named.cli.arguments"><a class="anchor" href="#br.history.named.cli.arguments"></a>81.2.2. Named Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>-n <num_records></em></dt> +<dd> +<p>(Optional) The maximum number of backup records (Default: 10).</p> +</dd> +<dt class="hdlist1"><em>-p <backup_root_path></em></dt> +<dd> +<p>The full filesystem URI of where backup images are stored.</p> +</dd> +<dt class="hdlist1"><em>-s <backup_set_name></em></dt> +<dd> +<p>The name of the backup set to obtain history for. Mutually exclusive with the <em>-t</em> option.</p> +</dd> +<dt class="hdlist1"><em>-t</em> <table_name></dt> +<dd> +<p>The name of table to obtain history for. Mutually exclusive with the <em>-s</em> option.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.history.backup.example"><a class="anchor" href="#br.history.backup.example"></a>81.2.3. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup history +<span class="error">$</span> hbase backup history -n <span class="integer">20</span> +<span class="error">$</span> hbase backup history -t WebIndexRecords</code></pre> +</div> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.describe.backup"><a class="anchor" href="#br.describe.backup"></a>81.3. Describing a Backup Image</h3> +<div class="paragraph"> +<p>This command can be used to obtain information about a specific backup image.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup describe <backup_id></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.describe.backup.positional.cli.arguments"><a class="anchor" href="#br.describe.backup.positional.cli.arguments"></a>81.3.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_id</em></dt> +<dd> +<p>The ID of the backup image to describe.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.describe.backup.named.cli.arguments"><a class="anchor" href="#br.describe.backup.named.cli.arguments"></a>81.3.2. Named Command-Line Arguments</h4> +<div class="paragraph"> +<p>None.</p> +</div> +</div> +<div class="sect3"> +<h4 id="br.describe.backup.example"><a class="anchor" href="#br.describe.backup.example"></a>81.3.3. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup describe backupId_1467823988425</code></pre> +</div> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.delete.backup"><a class="anchor" href="#br.delete.backup"></a>81.4. Deleting a Backup Image</h3> +<div class="paragraph"> +<p>This command can be used to delete a backup image which is no longer needed.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup delete <backup_id></code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.delete.backup.positional.cli.arguments"><a class="anchor" href="#br.delete.backup.positional.cli.arguments"></a>81.4.1. Positional Command-Line Arguments</h4> +<div class="dlist"> +<dl> +<dt class="hdlist1"><em>backup_id</em></dt> +<dd> +<p>The ID to the backup image which should be deleted.</p> +</dd> +</dl> +</div> +</div> +<div class="sect3"> +<h4 id="br.delete.backup.named.cli.arguments"><a class="anchor" href="#br.delete.backup.named.cli.arguments"></a>81.4.2. Named Command-Line Arguments</h4> +<div class="paragraph"> +<p>None.</p> +</div> +</div> +<div class="sect3"> +<h4 id="br.delete.backup.example"><a class="anchor" href="#br.delete.backup.example"></a>81.4.3. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup delete backupId_1467823988425</code></pre> +</div> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.repair.backup"><a class="anchor" href="#br.repair.backup"></a>81.5. Backup Repair Command</h3> +<div class="paragraph"> +<p>This command attempts to correct any inconsistencies in persisted backup metadata which exists as +the result of software errors or unhandled failure scenarios. While the backup implementation tries +to correct all errors on its own, this tool may be necessary in the cases where the system cannot +automatically recover on its own.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup repair</code></pre> +</div> +</div> +<div class="sect3"> +<h4 id="br.repair.backup.positional.cli.arguments"><a class="anchor" href="#br.repair.backup.positional.cli.arguments"></a>81.5.1. Positional Command-Line Arguments</h4> +<div class="paragraph"> +<p>None.</p> +</div> +</div> +</div> +<div class="sect2"> +<h3 id="br.repair.backup.named.cli.arguments"><a class="anchor" href="#br.repair.backup.named.cli.arguments"></a>81.6. Named Command-Line Arguments</h3> +<div class="paragraph"> +<p>None.</p> +</div> +<div class="sect3"> +<h4 id="br.repair.backup.example"><a class="anchor" href="#br.repair.backup.example"></a>81.6.1. Example usage</h4> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup repair</code></pre> +</div> +</div> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.backup.configuration"><a class="anchor" href="#br.backup.configuration"></a>82. Configuration keys</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>The backup and restore feature includes both required and optional configuration keys.</p> +</div> +<div class="sect2"> +<h3 id="_required_properties"><a class="anchor" href="#_required_properties"></a>82.1. Required properties</h3> +<div class="paragraph"> +<p><em>hbase.backup.enable</em>: Controls whether or not the feature is enabled (Default: <code>false</code>). Set this value to <code>true</code>.</p> +</div> +<div class="paragraph"> +<p><em>hbase.master.logcleaner.plugins</em>: A comma-separated list of classes invoked when cleaning logs in the HBase Master. Set +this value to <code>org.apache.hadoop.hbase.backup.master.BackupLogCleaner</code> or append it to the current value.</p> +</div> +<div class="paragraph"> +<p><em>hbase.procedure.master.classes</em>: A comma-separated list of classes invoked with the Procedure framework in the Master. Set +this value to <code>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager</code> or append it to the current value.</p> +</div> +<div class="paragraph"> +<p><em>hbase.procedure.regionserver.classes</em>: A comma-separated list of classes invoked with the Procedure framework in the RegionServer. +Set this value to <code>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager</code> or append it to the current value.</p> +</div> +<div class="paragraph"> +<p><em>hbase.coprocessor.region.classes</em>: A comma-separated list of RegionObservers deployed on tables. Set this value to +<code>org.apache.hadoop.hbase.backup.BackupObserver</code> or append it to the current value.</p> +</div> +<div class="paragraph"> +<p><em>hbase.master.hfilecleaner.plugins</em>: A comma-separated list of HFileCleaners deployed on the Master. Set this value +to <code>org.apache.hadoop.hbase.backup.BackupHFileCleaner</code> or append it to the current value.</p> +</div> +</div> +<div class="sect2"> +<h3 id="_optional_properties"><a class="anchor" href="#_optional_properties"></a>82.2. Optional properties</h3> +<div class="paragraph"> +<p><em>hbase.backup.system.ttl</em>: The time-to-live in seconds of data in the <code>hbase:backup</code> tables (default: forever). This property +is only relevant prior to the creation of the <code>hbase:backup</code> table. Use the <code>alter</code> command in the HBase shell to modify the TTL +when this table already exists. See the <a href="#br.filesystem.growth.warning">below section</a> for more details on the impact of this +configuration property.</p> +</div> +<div class="paragraph"> +<p><em>hbase.backup.attempts.max</em>: The number of attempts to perform when taking hbase table snapshots (default: 10).</p> +</div> +<div class="paragraph"> +<p><em>hbase.backup.attempts.pause.ms</em>: The amount of time to wait between failed snapshot attempts in milliseconds (default: 10000).</p> +</div> +<div class="paragraph"> +<p><em>hbase.backup.logroll.timeout.millis</em>: The amount of time (in milliseconds) to wait for RegionServers to execute a WAL rolling +in the Master’s procedure framework (default: 30000).</p> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.best.practices"><a class="anchor" href="#br.best.practices"></a>83. Best Practices</h2> +<div class="sectionbody"> +<div class="sect2"> +<h3 id="_formulate_a_restore_strategy_and_test_it"><a class="anchor" href="#_formulate_a_restore_strategy_and_test_it"></a>83.1. Formulate a restore strategy and test it.</h3> +<div class="paragraph"> +<p>Before you rely on a backup and restore strategy for your production environment, identify how backups must be performed, +and more importantly, how restores must be performed. Test the plan to ensure that it is workable. +At a minimum, store backup data from a production cluster on a different cluster or server. To further safeguard the data, +use a backup location that is at a different physical location.</p> +</div> +<div class="paragraph"> +<p>If you have a unrecoverable loss of data on your primary production cluster as a result of computer system issues, you may +be able to restore the data from a different cluster or server at the same site. However, a disaster that destroys the whole +site renders locally stored backups useless. Consider storing the backup data and necessary resources (both computing capacity +and operator expertise) to restore the data at a site sufficiently remote from the production site. In the case of a catastrophe +at the whole primary site (fire, earthquake, etc.), the remote backup site can be very valuable.</p> +</div> +</div> +<div class="sect2"> +<h3 id="_secure_a_full_backup_image_first"><a class="anchor" href="#_secure_a_full_backup_image_first"></a>83.2. Secure a full backup image first.</h3> +<div class="paragraph"> +<p>As a baseline, you must complete a full backup of HBase data at least once before you can rely on incremental backups. The full +backup should be stored outside of the source cluster. To ensure complete dataset recovery, you must run the restore utility +with the option to restore baseline full backup. The full backup is the foundation of your dataset. Incremental backup data +is applied on top of the full backup during the restore operation to return you to the point in time when backup was last taken.</p> +</div> +</div> +<div class="sect2"> +<h3 id="_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"><a class="anchor" href="#_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"></a>83.3. Define and use backup sets for groups of tables that are logical subsets of the entire dataset.</h3> +<div class="paragraph"> +<p>You can group tables into an object called a backup set. A backup set can save time when you have a particular group of tables +that you expect to repeatedly back up or restore.</p> +</div> +<div class="paragraph"> +<p>When you create a backup set, you type table names to include in the group. The backup set includes not only groups of related +tables, but also retains the HBase backup metadata. Afterwards, you can invoke the backup set name to indicate what tables apply +to the command execution instead of entering all the table names individually.</p> +</div> +</div> +<div class="sect2"> +<h3 id="_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"><a class="anchor" href="#_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"></a>83.4. Document the backup and restore strategy, and ideally log information about each backup.</h3> +<div class="paragraph"> +<p>Document the whole process so that the knowledge base can transfer to new administrators after employee turnover. As an extra +safety precaution, also log the calendar date, time, and other relevant details about the data of each backup. This metadata +can potentially help locate a particular dataset in case of source cluster failure or primary site disaster. Maintain duplicate +copies of all documentation: one copy at the production cluster site and another at the backup location or wherever it can be +accessed by an administrator remotely from the production cluster.</p> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.s3.backup.scenario"><a class="anchor" href="#br.s3.backup.scenario"></a>84. Scenario: Safeguarding Application Datasets on Amazon S3</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>This scenario describes how a hypothetical retail business uses backups to safeguard application data and then restore the dataset +after failure.</p> +</div> +<div class="paragraph"> +<p>The HBase administration team uses backup sets to store data from a group of tables that have interrelated information for an +application called green. In this example, one table contains transaction records and the other contains customer details. The +two tables need to be backed up and be recoverable as a group.</p> +</div> +<div class="paragraph"> +<p>The admin team also wants to ensure daily backups occur automatically.</p> +</div> +<div class="imageblock"> +<div class="content"> +<img src="images/backup-app-components.png" alt="backup app components"> +</div> +<div class="title">Figure 7. Tables Composing The Backup Set</div> +</div> +<div class="paragraph"> +<p>The following is an outline of the steps and examples of commands that are used to backup the data for the <em>green</em> application and +to recover the data later. All commands are run when logged in as HBase superuser.</p> +</div> +<div class="olist arabic"> +<ol class="arabic"> +<li> +<p>A backup set called <em>green_set</em> is created as an alias for both the transactions table and the customer table. The backup set can +be used for all operations to avoid typing each table name. The backup set name is case-sensitive and should be formed with only +printable characters and without spaces.</p> +</li> +</ol> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup set add green_set transactions +<span class="error">$</span> hbase backup set add green_set customer</code></pre> +</div> +</div> +<div class="olist arabic"> +<ol class="arabic"> +<li> +<p>The first backup of green_set data must be a full backup. The following command example shows how credentials are passed to Amazon +S3 and specifies the file system with the s3a: prefix.</p> +</li> +</ol> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> ACCESS_KEY=ABCDEFGHIJKLMNOPQRST +<span class="error">$</span> SECRET_KEY=<span class="integer">123456789</span>abcdefghijklmnopqrstuvwxyzABCD +<span class="error">$</span> sudo -u hbase hbase backup create full\ + s3a:<span class="comment">//$ACCESS_KEY:SECRET_KEY@prodhbasebackups/backups -s green_set</span></code></pre> +</div> +</div> +<div class="olist arabic"> +<ol class="arabic"> +<li> +<p>Incremental backups should be run according to a schedule that ensures essential data recovery in the event of a catastrophe. At +this retail company, the HBase admin team decides that automated daily backups secures the data sufficiently. The team decides that +they can implement this by modifying an existing Cron job that is defined in <code>/etc/crontab</code>. Consequently, IT modifies the Cron job +by adding the following line:</p> +</li> +</ol> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="annotation">@daily</span> hbase hbase backup create incremental s3a:<span class="comment">//$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups -s green_set</span></code></pre> +</div> +</div> +<div class="olist arabic"> +<ol class="arabic"> +<li> +<p>A catastrophic IT incident disables the production cluster that the green application uses. An HBase system administrator of the +backup cluster must restore the <em>green_set</em> dataset to the point in time closest to the recovery objective.</p> +</li> +</ol> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +If the administrator of the backup HBase cluster has the backup ID with relevant details in accessible records, the following +search with the <code>hdfs dfs -ls</code> command and manually scanning the backup ID list can be bypassed. Consider continuously maintaining +and protecting a detailed log of backup IDs outside the production cluster in your environment. +</td> +</tr> +</table> +</div> +<div class="paragraph"> +<p>The HBase administrator runs the following command on the directory where backups are stored to print the list of successful backup +IDs on the console:</p> +</div> +<div class="paragraph"> +<p><code>hdfs dfs -ls -t /prodhbasebackups/backups</code></p> +</div> +<div class="olist arabic"> +<ol class="arabic"> +<li> +<p>The admin scans the list to see which backup was created at a date and time closest to the recovery objective. To do this, the +admin converts the calendar timestamp of the recovery point in time to Unix time because backup IDs are uniquely identified with +Unix time. The backup IDs are listed in reverse chronological order, meaning the most recent successful backup appears first.</p> +</li> +</ol> +</div> +<div class="paragraph"> +<p>The admin notices that the following line in the command output corresponds with the <em>green_set</em> backup that needs to be restored:</p> +</div> +<div class="paragraph"> +<p><code>/prodhbasebackups/backups/backup_1467823988425</code></p> +</div> +<div class="olist arabic"> +<ol class="arabic"> +<li> +<p>The admin restores green_set invoking the backup ID and the -overwrite option. The -overwrite option truncates all existing data +in the destination and populates the tables with data from the backup dataset. Without this flag, the backup data is appended to the +existing data in the destination. In this case, the admin decides to overwrite the data because it is corrupted.</p> +</li> +</ol> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> sudo -u hbase hbase restore -s green_set \ + s3a:<span class="comment">//$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups backup_1467823988425 \ -overwrite</span></code></pre> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.data.security"><a class="anchor" href="#br.data.security"></a>85. Security of Backup Data</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>With this feature which makes copying data to remote locations, it’s important to take a moment to clearly state the procedural +concerns that exist around data security. Like the HBase replication feature, backup and restore provides the constructs to automatically +copy data from within a corporate boundary to some system outside of that boundary. It is imperative when storing sensitive data that with backup and restore, much +less any feature which extracts data from HBase, the locations to which data is being sent has undergone a security audit to ensure +that only authenticated users are allowed to access that data.</p> +</div> +<div class="paragraph"> +<p>For example, with the above example of backing up data to S3, it is of the utmost importance that the proper permissions are assigned +to the S3 bucket to ensure that only a minimum set of authorized users are allowed to access this data. Because the data is no longer +being accessed via HBase, and its authentication and authorization controls, we must ensure that the filesystem storing that data is +providing a comparable level of security. This is a manual step which users <strong>must</strong> implement on their own.</p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.technical.details"><a class="anchor" href="#br.technical.details"></a>86. Technical Details of Incremental Backup and Restore</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore +solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture +the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track +the WALs that need to be in the backup.</p> +</div> +<div class="paragraph"> +<p>After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar +to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation +starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from +incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase +Bulk Load utility automatically imports as restored data in the table.</p> +</div> +<div class="paragraph"> +<p>You can only restore on a live HBase cluster because the data must be redistributed to complete the restore operation successfully.</p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.filesystem.growth.warning"><a class="anchor" href="#br.filesystem.growth.warning"></a>87. A Warning on File System Growth</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>As a reminder, incremental backups are implemented via retaining the write-ahead logs which HBase primarily uses for data durability. +Thus, to ensure that all data needing to be included in a backup is still available in the system, the HBase backup and restore feature +retains all write-ahead logs since the last backup until the next incremental backup is executed.</p> +</div> +<div class="paragraph"> +<p>Like HBase Snapshots, this can have an expectedly large impact on the HDFS usage of HBase for high volume tables. Take care in enabling +and using the backup and restore feature, specifically with a mind to removing backup sessions when they are not actively being used.</p> +</div> +<div class="paragraph"> +<p>The only automated, upper-bound on retained write-ahead logs for backup and restore is based on the TTL of the <code>hbase:backup</code> system table which, +as of the time this document is written, is infinite (backup table entries are never automatically deleted). This requires that administrators +perform backups on a schedule whose frequency is relative to the amount of available space on HDFS (e.g. less available HDFS space requires +more aggressive backup merges and deletions). As a reminder, the TTL can be altered on the <code>hbase:backup</code> table using the <code>alter</code> command +in the HBase shell. Modifying the configuration property <code>hbase.backup.system.ttl</code> in hbase-site.xml after the system table exists has no effect.</p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.backup.capacity.planning"><a class="anchor" href="#br.backup.capacity.planning"></a>88. Capacity Planning</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>When designing a distributed system deployment, it is critical that some basic mathmatical rigor is executed to ensure sufficient computational +capacity is available given the data and software requirements of the system. For this feature, the availability of network capacity is the largest +bottleneck when estimating the performance of some implementation of backup and restore. The second most costly function is the speed at which +data can be read/written.</p> +</div> +<div class="sect2"> +<h3 id="_full_backups"><a class="anchor" href="#_full_backups"></a>88.1. Full Backups</h3> +<div class="paragraph"> +<p>To estimate the duration of a full backup, we have to understand the general actions which are invoked:</p> +</div> +<div class="ulist"> +<ul> +<li> +<p>Write-ahead log roll on each RegionServer: ones to tens of seconds per RegionServer in parallel. Relative to the load on each RegionServer.</p> +</li> +<li> +<p>Take an HBase snapshot of the table(s): tens of seconds. Relative to the number of regions and files that comprise the table.</p> +</li> +<li> +<p>Export the snapshot to the destination: see below. Relative to the size of the data and the network bandwidth to the destination.</p> +</li> +</ul> +</div> +<div id="br.export.snapshot.cost" class="paragraph"> +<p>To approximate how long the final step will take, we have to make some assumptions on hardware. Be aware that these will <strong>not</strong> be accurate for your +system — these are numbers that your or your administrator know for your system. Let’s say the speed of reading data from HDFS on a single node is +capped at 80MB/s (across all Mappers that run on that host), a modern network interface controller (NIC) supports 10Gb/s, the top-of-rack switch can +handle 40Gb/s, and the WAN between your clusters is 10Gb/s. This means that you can only ship data to your remote at a speed of 1.25GB/s — meaning +that 16 nodes (<code>1.25 * 1024 / 80 = 16</code>) participating in the ExportSnapshot should be able to fully saturate the link between clusters. With more +nodes in the cluster, we can still saturate the network but at a lesser impact on any one node which helps ensure local SLAs are made. If the size +of the snapshot is 10TB, this would full backup would take in the ballpark of 2.5 hours (<code>10 * 1024 / 1.25 / (60 * 60) = 2.23hrs</code>)</p> +</div> +<div class="paragraph"> +<p>As a general statement, it is very likely that the WAN bandwidth between your local cluster and the remote storage is the largest +bottleneck to the speed of a full backup.</p> +</div> +<div class="paragraph"> +<p>When the concern is restricting the computational impact of backups to a "production system", the above formulas can be reused with the optional +command-line arguments to <code>hbase backup create</code>: <code>-b</code>, <code>-w</code>, <code>-q</code>. The <code>-b</code> option defines the bandwidth at which each worker (Mapper) would +write data. The <code>-w</code> argument limits the number of workers that would be spawned in the DistCp job. The <code>-q</code> allows the user to specify a YARN +queue which can limit the specific nodes where the workers will be spawned — this can quarantine the backup workers performing the copy to +a set of non-critical nodes. Relating the <code>-b</code> and <code>-w</code> options to our earlier equations: <code>-b</code> would be used to restrict each node from reading +data at the full 80MB/s and <code>-w</code> is used to limit the job from spawning 16 worker tasks.</p> +</div> +</div> +<div class="sect2"> +<h3 id="_incremental_backup"><a class="anchor" href="#_incremental_backup"></a>88.2. Incremental Backup</h3> +<div class="paragraph"> +<p>Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost.</p> +</div> +<div class="ulist"> +<ul> +<li> +<p>Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s).</p> +</li> +<li> +<p>Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS.</p> +</li> +<li> +<p>DistCp the HFiles to the destination: <a href="#br.export.snapshot.cost">see above</a>.</p> +</li> +</ul> +</div> +<div class="paragraph"> +<p>For the second step, the dominating cost of this operation would be the re-writing the data (under the assumption that a majority of the +data in the WAL is preserved). In this case, we can assume an aggregate write speed of 30MB/s per node. Continuing our 16-node cluster example, +this would require approximately 15 minutes to perform this step for 50GB of data (50 * 1024 / 60 / 60 = 14.2). The amount of time to start the +DistCp MapReduce job would likely dominate the actual time taken to copy the data (50 / 1.25 = 40 seconds) and can be ignored.</p> +</div> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="br.limitations"><a class="anchor" href="#br.limitations"></a>89. Limitations of the Backup and Restore Utility</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p><strong>Serial backup operations</strong></p> +</div> +<div class="paragraph"> +<p>Backup operations cannot be run concurrently. An operation includes actions like create, delete, restore, and merge. Only one active backup session is supported. <a href="https://issues.apache.org/jira/browse/HBASE-16391">HBASE-16391</a> +will introduce multiple-backup sessions support.</p> +</div> +<div class="paragraph"> +<p><strong>No means to cancel backups</strong></p> +</div> +<div class="paragraph"> +<p>Both backup and restore operations cannot be canceled. (<a href="https://issues.apache.org/jira/browse/HBASE-15997">HBASE-15997</a>, <a href="https://issues.apache.org/jira/browse/HBASE-15998">HBASE-15998</a>). +The workaround to cancel a backup would be to kill the client-side backup command (<code>control-C</code>), ensure all relevant MapReduce jobs have exited, and then +run the <code>hbase backup repair</code> command to ensure the system backup metadata is consistent.</p> +</div> +<div class="paragraph"> +<p><strong>Backups can only be saved to a single location</strong></p> +</div> +<div class="paragraph"> +<p>Copying backup information to multiple locations is an exercise left to the user. <a href="https://issues.apache.org/jira/browse/HBASE-15476">HBASE-15476</a> will +introduce the ability to specify multiple-backup destinations intrinsically.</p> +</div> +<div class="paragraph"> +<p><strong>HBase superuser access is required</strong></p> +</div> +<div class="paragraph"> +<p>Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore, can pose a problem for shared HBase installations. Current mitigations would require +coordination with system administrators to build and deploy a backup and restore strategy (<a href="https://issues.apache.org/jira/browse/HBASE-14138">HBASE-14138</a>).</p> +</div> +<div class="paragraph"> +<p><strong>Backup restoration is an online operation</strong></p> +</div> +<div class="paragraph"> +<p>To perform a restore from a backup, it requires that the HBase cluster is online as a caveat of the current implementation (<a href="https://issues.apache.org/jira/browse/HBASE-16573">HBASE-16573</a>).</p> +</div> +<div class="paragraph"> +<p><strong>Some operations may fail and require re-run</strong></p> +</div> +<div class="paragraph"> +<p>The HBase backup feature is primarily client driven. While there is the standard HBase retry logic built into the HBase Connection, persistent errors in executing operations +may propagate back to the client (e.g. snapshot failure due to region splits). The backup implementation should be moved from client-side into the ProcedureV2 framework +in the future which would provide additional robustness around transient/retryable failures. The <code>hbase backup repair</code> command is meant to correct states which the system +cannot automatically detect and recover from.</p> +</div> +<div class="paragraph"> +<p><strong>Avoidance of declaration of public API</strong></p> +</div> +<div class="paragraph"> +<p>While the Java API to interact with this feature exists and its implementation is separated from an interface, insufficient rigor has been applied to determine if +it is exactly what we intend to ship to users. As such, it is marked as for a <code>Private</code> audience with the expectation that, as users begin to try the feature, there +will be modifications that would necessitate breaking compatibility (<a href="https://issues.apache.org/jira/browse/HBASE-17517">HBASE-17517</a>).</p> +</div> +<div class="paragraph"> +<p><strong>Lack of global metrics for backup and restore</strong></p> +</div> +<div class="paragraph"> +<p>Individual backup and restore operations contain metrics about the amount of work the operation included, but there is no centralized location (e.g. the Master UI) +which present information for consumption (<a href="https://issues.apache.org/jira/browse/HBASE-16565">HBASE-16565</a>).</p> +</div> +</div> +</div> <h1 id="hbase_apis" class="sect0"><a class="anchor" href="#hbase_apis"></a>Apache HBase APIs</h1> <div class="openblock partintro"> <div class="content"> @@ -17467,7 +18754,7 @@ See <a href="#external_apis">Apache HBase External APIs</a> for more information </div> </div> <div class="sect1"> -<h2 id="_examples"><a class="anchor" href="#_examples"></a>76. Examples</h2> +<h2 id="_examples"><a class="anchor" href="#_examples"></a>90. Examples</h2> <div class="sectionbody"> <div class="exampleblock"> <div class="title">Example 40. Create, modify and delete a Table Using Java</div> @@ -17578,7 +18865,7 @@ through custom protocols. For information on using the native HBase APIs, refer </div> </div> <div class="sect1"> -<h2 id="_rest"><a class="anchor" href="#_rest"></a>77. REST</h2> +<h2 id="_rest"><a class="anchor" href="#_rest"></a>91. REST</h2> <div class="sectionbody"> <div class="paragraph"> <p>Representational State Transfer (REST) was introduced in 2000 in the doctoral @@ -17594,7 +18881,7 @@ There is also a nice series of blogs on by Jesse Anderson.</p> </div> <div class="sect2"> -<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" href="#_starting_and_stopping_the_rest_server"></a>77.1. Starting and Stopping the REST Server</h3> +<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" href="#_starting_and_stopping_the_rest_server"></a>91.1. Starting and Stopping the REST Server</h3> <div class="paragraph"> <p>The included REST server can run as a daemon which starts an embedded Jetty servlet container and deploys the servlet into it. Use one of the following commands @@ -17621,7 +18908,7 @@ following command if you were running it in the background.</p> </div> </div> <div class="sect2"> -<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" href="#_configuring_the_rest_server_and_client"></a>77.2. Configuring the REST Server and Client</h3> +<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" href="#_configuring_the_rest_server_and_client"></a>91.2. Configuring the REST Server and Client</h3> <div class="paragraph"> <p>For information about configuring the REST server and client for SSL, as well as <code>doAs</code> impersonation for the REST server, see <a href="#security.gateway.thrift">Configure the Thrift Gateway to Authenticate on Behalf of the Client</a> and other portions @@ -17629,7 +18916,7 @@ of the <a href="#security">Securing Apache HBase</a> chapter.</p> </div> </div> <div class="sect2"> -<h3 id="_using_rest_endpoints"><a class="anchor" href="#_using_rest_endpoints"></a>77.3. Using REST Endpoints</h3> +<h3 id="_using_rest_endpoints"><a class="anchor" href="#_using_rest_endpoints"></a>91.3. Using REST Endpoints</h3> <div class="paragraph"> <p>The following examples use the placeholder server http://example.com:8000, and the following commands can all be run using <code>curl</code> or <code>wget</code> commands. You can request @@ -17996,7 +19283,7 @@ curl -vi -X PUT \ </table> </div> <div class="sect2"> -<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>77.4. REST XML Schema</h3> +<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>91.4. REST XML Schema</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="xml"><span class="tag"><schema</span> <span class="attribute-name">xmlns</span>=<span class="string"><span class="delimiter">"</span><span class="content">http://www.w3.org/2001/XMLSchema</span><span class="delimiter">"</span></span> <span class="attribute-name">xmlns:tns</span>=<span class="string"><span class="delimiter">"</span><span class="content">RESTSchema</span><span class="delimiter">"</span></span><span class="tag">></span> @@ -18154,7 +19441,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect2"> -<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>77.5. REST Protobufs Schema</h3> +<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>91.5. REST Protobufs Schema</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="json"><span class="error">m</span><span class="error">e</span><span class="error">s</span><span class="error">s</span><span class="error">a</span><span class="error">g</span><span class="error">e</span> <span class="error">V</span><span class="error">e</span><span class="error">r</span><span class="error">s</span><span class="error">i</span><span class="error">o</span><span class="error">n</span> { @@ -18262,7 +19549,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>78. Thrift</h2> +<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>92. Thrift</h2> <div class="sectionbody"> <div class="paragraph"> <p>Documentation about Thrift has moved to <a href="#thrift">Thrift API and Filter Language</a>.</p> @@ -18270,7 +19557,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="c"><a class="anchor" href="#c"></a>79. C/C++ Apache HBase Client</h2> +<h2 id="c"><a class="anchor" href="#c"></a>93. C/C++ Apache HBase Client</h2> <div class="sectionbody"> <div class="paragraph"> <p>FB’s Chip Turner wrote a pure C/C++ client. @@ -18282,7 +19569,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="jdo"><a class="anchor" href="#jdo"></a>80. Using Java Data Objects (JDO) with HBase</h2> +<h2 id="jdo"><a class="anchor" href="#jdo"></a>94. Using Java Data Objects (JDO) with HBase</h2> <div class="sectionbody"> <div class="paragraph"> <p><a href="https://db.apache.org/jdo/">Java Data Objects (JDO)</a> is a standard way to @@ -18439,10 +19726,10 @@ a row, get a column value, perform a query, and do some additional HBase operati </div> </div> <div class="sect1"> -<h2 id="scala"><a class="anchor" href="#scala"></a>81. Scala</h2> +<h2 id="scala"><a class="anchor" href="#scala"></a>95. Scala</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_setting_the_classpath"><a class="anchor" href="#_setting_the_classpath"></a>81.1. Setting the Classpath</h3> +<h3 id="_setting_the_classpath"><a class="anchor" href="#_setting_the_classpath"></a>95.1. Setting the Classpath</h3> <div class="paragraph"> <p>To use Scala with HBase, your CLASSPATH must include HBase’s classpath as well as the Scala JARs required by your code. First, use the following command on a server @@ -18467,7 +19754,7 @@ your project.</p> </div> </div> <div class="sect2"> -<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>81.2. Scala SBT File</h3> +<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>95.2. Scala SBT File</h3> <div class="paragraph"> <p>Your <code>build.sbt</code> file needs the following <code>resolvers</code> and <code>libraryDependencies</code> to work with HBase.</p> @@ -18486,7 +19773,7 @@ libraryDependencies ++= Seq( </div> </div> <div class="sect2"> -<h3 id="_example_scala_code"><a class="anchor" href="#_example_scala_code"></a>81.3. Example Scala Code</h3> +<h3 id="_example_scala_code"><a class="anchor" href="#_example_scala_code"></a>95.3. Example Scala Code</h3> <div class="paragraph"> <p>This example lists HBase tables, creates a new table, and adds a row to it.</p> </div> @@ -18524,10 +19811,10 @@ println(Bytes.toString(value))</code></pre> </div> </div> <div class="sect1"> -<h2 id="jython"><a class="anchor" href="#jython"></a>82. Jython</h2> +<h2 id="jython"><a class="anchor" href="#jython"></a>96. Jython</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_setting_the_classpath_2"><a class="anchor" href="#_setting_the_classpath_2"></a>82.1. Setting the Classpath</h3> +<h3 id="_setting_the_classpath_2"><a class="anchor" href="#_setting_the_classpath_2"></a>96.1. Setting the Classpath</h3> <div class="paragraph"> <p>To use Jython with HBase, your CLASSPATH must include HBase’s classpath as well as the Jython JARs required by your code. First, use the following command on a server @@ -18556,7 +19843,7 @@ $ bin/hbase org.python.util.jython</p> </div> </div> <div class="sect2"> -<h3 id="_jython_code_examples"><a class="anchor" href="#_jython_code_examples"></a>82.2. Jython Code Examples</h3> +<h3 id="_jython_code_examples"><a class="anchor" href="#_jython_code_examples"></a>96.2. Jython Code Examples</h3> <div class="exampleblock"> <div class="title">Example 42. Table Creation, Population, Get, and Delete with Jython</div> <div class="content"> @@ -18665,7 +19952,7 @@ The Thrift API relies on client and server processes.</p> </div> </div> <div class="sect1"> -<h2 id="thrift.filter_language"><a class="anchor" href="#thrift.filter_language"></a>83. Filter Language</h2> +<h2 id="thrift.filter_language"><a class="anchor" href="#thrift.filter_language"></a>97. Filter Language</h2> <div class="sectionbody"> <div class="paragraph"> <p>Thrift Filter Language was introduced in HBase 0.92. @@ -18676,7 +19963,7 @@ You can find out more about shell integration by using the <code>scan help</code <p>You specify a filter as a string, which is parsed on the server to construct the filter.</p> </div> <div class="sect2"> -<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>83.1. General Filter String Syntax</h3> +<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>97.1. General Filter Stri
<TRUNCATED>