I believe it's https://aws.amazon.com/blogs/big-data/tips-for-migrating-to-apache-hbase-on-amazon-s3-from-hdfs/
On Wed, Aug 23, 2017 at 5:26 PM Ted Yu <[email protected]> wrote: > bq. The following diagram summarizes the steps for each option > > I don't see diagram. > > Is this writing published somewhere ? > > On Wed, Aug 23, 2017 at 3:21 AM, RuthEvans <[email protected]> wrote: > > > Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase > > <https://tekslate.com/> on Amazon S3. Running HBase on S3 gives you > > several added benefits, including lower costs, data durability, and > easier > > scalability. > > > > HBase provides several options that you can use to migrate and back up > > HBase > > tables. The steps to migrate to HBase on S3 are similar to the steps for > > HBase on the Apache Hadoop Distributed File System (HDFS). However, the > > migration can be easier if you are aware of some minor differences and a > > few > > “gotchas.” > > > > In this post, I describe how to use some of the common HBase migration > > options to get started with HBase on S3. > > > > HBase migration options > > Selecting the right migration method and tools is an important step in > > ensuring a successful HBase table migration. However, choosing the right > > ones is not always an easy task. > > > > The following HBase helps you migrate to HBase on S3: > > > > Snapshots > > Export and Import > > CopyTable > > The following diagram summarizes the steps for each option. > > > > > > > > > > Various factors determine the HBase migration method that you use. For > > example, EMR offers HBase version 1.2.3 as the earliest version that you > > can > > run on S3. Therefore, the HBase version that you’re migrating from can be > > an > > important factor in helping you decide. For more information about HBase > > versions and compatibility, see the HBase version number and > compatibility > > documentation in the Apache HBase Reference Guide. > > > > If you’re migrating from an older version of HBase (for example, HBase > > 0.94), you should test your application to make sure it’s compatible with > > newer HBase API versions. You don’t want to spend several hours > migrating a > > large table only to find out that your application and API have issues > with > > a different HBase version. > > > > The good news is that HBase provides utilities that you can use to > migrate > > only part of a table. This lets you test your existing HBase applications > > without having to fully migrate entire HBase tables. For example, you can > > use the Export, Import, or CopyTable utilities to migrate a small part of > > your table to HBase on S3. After you confirm that your application works > > with newer HBase versions, you can proceed with migrating the entire > table > > using HBase <https://tekslate.com/> snapshots. > > > > Option 1: Migrate to HBase on S3 using snapshots > > You can create table backups easily by using HBase snapshots. HBase also > > provides the ExportSnapshot utility, which lets you export snapshots to a > > different location, like S3. In this section, I discuss how you can > combine > > snapshots with ExportSnapshot to migrate tables to HBase on S3. > > > > For details about how you can use HBase snapshots to perform table > backups, > > see Using HBase Snapshots in the Amazon EMR Release Guide and HBase > > Snapshots in the Apache HBase Reference Guide. These resources provide > > additional settings and configurations that you can use with snapshots > and > > ExportSnapshot. > > > > The following example shows how to use snapshots to migrate HBase tables > to > > HBase on S3. > > > > Note: Earlier HBase versions, like HBase 0.94, have a different snapshot > > structure than HBase 1.x, which is what you’re migrating to. If you’re > > migrating from HBase 0.94 using snapshots, you get a > > TableInfoMissingException error when you try to restore the table. For > > details about migrating from HBase 0.94 using snapshots, see the > Migrating > > from HBase 0.94 section. > > > > From the source HBase cluster, create a snapshot of your table: > > $ echo "snapshot '<table_name>', '<snapshot_name>'" | hbase shell > > Export the snapshot to an S3 bucket: > > $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > > <snapshot_name> -copy-to s3://<HBase_on_S3_root_dir>/ > > For the -copy-to parameter in the ExportSnapshot utility, specify the S3 > > location that you are using for the HBase root directory of your EMR > > cluster. If your cluster is already up and running, you can find its S3 > > hbase.rootdir value by viewing the cluster’s Configurations in the EMR > > console, or by using the AWS CLI. Here’s the command to find that value: > > > > $ aws emr describe-cluster --cluster-id <cluster_id> | grep hbase.rootdir > > Launch an EMR cluster that uses the S3 storage option with HBase (skip > this > > step if you already have one up and running). For detailed steps, see > > Creating a Cluster with HBase Using the Console in the Amazon EMR Release > > Guide. When launching the cluster, ensure that the HBase root directory > is > > set to the same S3 location as your exported snapshots (that is, the > > location used in the -copy-to parameter in the previous step). > > Restore or clone the HBase table from that snapshot. > > To restore the table and keep the same table name as the source table, > use > > restore_snapshot: > > $ echo "restore_snapshot '<SNAPSHOT_NAME>'"| hbase shell > > To restore the table into a different table name, use clone_snapshot: > > $ echo "clone_snapshot '<snapshot_name>', '<table_name>'" | hbase shell > > Migrating from HBase 0.94 using snapshots > > If you’re migrating from HBase version 0.94 using the snapshot method, > you > > get an error if you try to restore from the snapshot. This is because the > > structure of a snapshot in HBase 0.94 is different from the snapshot > > structure in HBase 1.x. > > > > The following steps show how to fix an HBase 0.94 snapshot so that it can > > be > > restored to an HBase on S3 table. > > > > Complete steps 1—3 in the previous example to create and export a > snapshot. > > From your destination cluster, follow these steps to repair the snapshot: > > Use s3-dist-cp to copy the snapshot data (archive) directory into a new > > directory. The archive directory contains your snapshot data. Depending > on > > your table size, it might be large. Use s3-dist-cp to make this step > > faster: > > $ s3-dist-cp --src s3://<HBase_on_S3_root_dir>/.archive/<table_name> > > --dest > > s3://<HBase_on_S3_root_dir>/archive/data/default/<table_name> > > Create and fix the snapshot descriptor file: > > $ hdfs dfs -mkdir > > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc > > > > $ hdfs dfs -mv > > > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tableinfo.<*> > > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc > > Restore the snapshot: > > $ echo "restore_snapshot '<snapshot_name>'" | hbase shell > > Option 2: Migrate to HBase on S3 using Export and Import > > As I discussed in the earlier sections, HBase snapshots and > ExportSnapshot > > are great options for migrating tables. But sometimes you want to migrate > > only part of a table, so you need a different tool. In this section, I > > describe how to use the HBase Export and Import utilities. > > > > The steps to migrate a table to HBase on S3 using Export and Import is > not > > much different from the steps provided in the HBase documentation. In > those > > docs, you can also find detailed information, including how you can use > > them > > to migrate part of a table. > > > > The following steps show how you can use Export and Import to migrate a > > table to HBase on S3. > > > > From your source cluster, export the HBase table: > > $ hbase org.apache.hadoop.hbase.mapreduce.Export <table_name> > > s3://<table_s3_backup>/<location>/ > > In the destination cluster, create the target table into which to import > > data. Ensure that the column families in the target table are identical > to > > the exported/source table’s column families. > > From the destination cluster, import the table using the Import utility: > > $ hbase org.apache.hadoop.hbase.mapreduce.Import '<table_name>' > > s3://<table_s3_backup>/<location>/ > > HBase snapshots are usually the recommended method to migrate HBase > tables. > > However, the Export and Import utilities can be useful for test use cases > > in > > which you migrate only a small part of your table and test your > > application. > > It’s also handy if you’re migrating from an HBase cluster that does not > > have > > the HBase snapshots feature. > > > > Option 3: Migrate to HBase on S3 using CopyTable > > Similar to the Export and Import utilities, CopyTable is an HBase utility > > that you can use to copy part of HBase tables. However, keep in mind that > > CopyTable doesn’t work if you’re copying or migrating tables between > HBase > > versions that are not wire compatible (for example, copying from HBase > 0.94 > > to HBase 1.x). > > > > For more information and examples, see CopyTable in the HBase > > documentation. > > > > Conclusion > > In this post, I demonstrated how you can use common HBase backup > utilities > > to migrate your tables easily to HBase on S3. By using HBase snapshots, > you > > can migrate entire tables to HBase <https://tekslate.com/> on S3. To > > test > > HBase on S3 by migrating or copying only part of your tables, you can use > > the HBase Export, Import, or CopyTable utilities. > > > > If you have questions or suggestions, please comment below. > > > > > > > > -- > > View this message in context: http://apache-hbase.679495.n3. > > nabble.com/Tips-for-Migrating-to-Apache-HBase-on-Amazon-S3- > > from-HDFS-tp4089926.html > > Sent from the HBase Developer mailing list archive at Nabble.com. > > >
