Re: Tips for Migrating to Apache HBase on Amazon S3 from HDFS

Victor Xu Wed, 23 Aug 2017 09:35:25 -0700

I believe it's
https://aws.amazon.com/blogs/big-data/tips-for-migrating-to-apache-hbase-on-amazon-s3-from-hdfs/


On Wed, Aug 23, 2017 at 5:26 PM Ted Yu <[email protected]> wrote:

> bq. The following diagram summarizes the steps for each option
>
> I don't see diagram.
>
> Is this writing published somewhere ?
>
> On Wed, Aug 23, 2017 at 3:21 AM, RuthEvans <[email protected]> wrote:
>
> > Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase
> > <https://tekslate.com/>   on Amazon S3. Running HBase on S3 gives you
> > several added benefits, including lower costs, data durability, and
> easier
> > scalability.
> >
> > HBase provides several options that you can use to migrate and back up
> > HBase
> > tables. The steps to migrate to HBase on S3 are similar to the steps for
> > HBase on the Apache Hadoop Distributed File System (HDFS). However, the
> > migration can be easier if you are aware of some minor differences and a
> > few
> > “gotchas.”
> >
> > In this post, I describe how to use some of the common HBase migration
> > options to get started with HBase on S3.
> >
> > HBase migration options
> > Selecting the right migration method and tools is an important step in
> > ensuring a successful HBase table migration. However, choosing the right
> > ones is not always an easy task.
> >
> > The following HBase helps you migrate to HBase on S3:
> >
> > Snapshots
> > Export and Import
> > CopyTable
> > The following diagram summarizes the steps for each option.
> >
> >
> >
> >
> > Various factors determine the HBase migration method that you use. For
> > example, EMR offers HBase version 1.2.3 as the earliest version that you
> > can
> > run on S3. Therefore, the HBase version that you’re migrating from can be
> > an
> > important factor in helping you decide. For more information about HBase
> > versions and compatibility, see the HBase version number and
> compatibility
> > documentation in the Apache HBase Reference Guide.
> >
> > If you’re migrating from an older version of HBase (for example, HBase
> > 0.94), you should test your application to make sure it’s compatible with
> > newer HBase API versions. You don’t want to spend several hours
> migrating a
> > large table only to find out that your application and API have issues
> with
> > a different HBase version.
> >
> > The good news is that HBase provides utilities that you can use to
> migrate
> > only part of a table. This lets you test your existing HBase applications
> > without having to fully migrate entire HBase tables. For example, you can
> > use the Export, Import, or CopyTable utilities to migrate a small part of
> > your table to HBase on S3. After you confirm that your application works
> > with newer HBase versions, you can proceed with migrating the entire
> table
> > using  HBase <https://tekslate.com/>   snapshots.
> >
> > Option 1: Migrate to HBase on S3 using snapshots
> > You can create table backups easily by using HBase snapshots. HBase also
> > provides the ExportSnapshot utility, which lets you export snapshots to a
> > different location, like S3. In this section, I discuss how you can
> combine
> > snapshots with ExportSnapshot to migrate tables to HBase on S3.
> >
> > For details about how you can use HBase snapshots to perform table
> backups,
> > see Using HBase Snapshots in the Amazon EMR Release Guide and HBase
> > Snapshots in the Apache HBase Reference Guide. These resources provide
> > additional settings and configurations that you can use with snapshots
> and
> > ExportSnapshot.
> >
> > The following example shows how to use snapshots to migrate HBase tables
> to
> > HBase on S3.
> >
> > Note: Earlier HBase versions, like HBase 0.94, have a different snapshot
> > structure than HBase 1.x, which is what you’re migrating to. If you’re
> > migrating from HBase 0.94 using snapshots, you get a
> > TableInfoMissingException error when you try to restore the table. For
> > details about migrating from HBase 0.94 using snapshots, see the
> Migrating
> > from HBase 0.94 section.
> >
> > From the source HBase cluster, create a snapshot of your table:
> > $ echo "snapshot '<table_name>', '<snapshot_name>'" | hbase shell
> > Export the snapshot to an S3 bucket:
> > $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> > <snapshot_name> -copy-to s3://<HBase_on_S3_root_dir>/
> > For the -copy-to parameter in the ExportSnapshot utility, specify the S3
> > location that you are using for the HBase root directory of your EMR
> > cluster. If your cluster is already up and running, you can find its S3
> > hbase.rootdir value by viewing the cluster’s Configurations in the EMR
> > console, or by using the AWS CLI. Here’s the command to find that value:
> >
> > $ aws emr describe-cluster --cluster-id <cluster_id> | grep hbase.rootdir
> > Launch an EMR cluster that uses the S3 storage option with HBase (skip
> this
> > step if you already have one up and running). For detailed steps, see
> > Creating a Cluster with HBase Using the Console in the Amazon EMR Release
> > Guide. When launching the cluster, ensure that the HBase root directory
> is
> > set to the same S3 location as your exported snapshots (that is, the
> > location used in the -copy-to parameter in the previous step).
> > Restore or clone the HBase table from that snapshot.
> > To restore the table and keep the same table name as the source table,
> use
> > restore_snapshot:
> > $ echo "restore_snapshot '<SNAPSHOT_NAME>'"| hbase shell
> > To restore the table into a different table name, use clone_snapshot:
> > $ echo "clone_snapshot '<snapshot_name>', '<table_name>'" | hbase shell
> > Migrating from HBase 0.94 using snapshots
> > If you’re migrating from HBase version 0.94 using the snapshot method,
> you
> > get an error if you try to restore from the snapshot. This is because the
> > structure of a snapshot in HBase 0.94 is different from the snapshot
> > structure in HBase 1.x.
> >
> > The following steps show how to fix an HBase 0.94 snapshot so that it can
> > be
> > restored to an HBase on S3 table.
> >
> > Complete steps 1—3 in the previous example to create and export a
> snapshot.
> > From your destination cluster, follow these steps to repair the snapshot:
> > Use s3-dist-cp to copy the snapshot data (archive) directory into a new
> > directory. The archive directory contains your snapshot data. Depending
> on
> > your table size, it might be large. Use s3-dist-cp to make this step
> > faster:
> > $ s3-dist-cp --src s3://<HBase_on_S3_root_dir>/.archive/<table_name>
> > --dest
> > s3://<HBase_on_S3_root_dir>/archive/data/default/<table_name>
> > Create and fix the snapshot descriptor file:
> > $ hdfs dfs -mkdir
> > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc
> >
> > $ hdfs dfs -mv
> >
> s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tableinfo.<*>
> > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc
> > Restore the snapshot:
> > $ echo "restore_snapshot '<snapshot_name>'" | hbase shell
> > Option 2: Migrate to HBase on S3 using Export and Import
> > As I discussed in the earlier sections, HBase snapshots and
> ExportSnapshot
> > are great options for migrating tables. But sometimes you want to migrate
> > only part of a table, so you need a different tool. In this section, I
> > describe how to use the HBase Export and Import utilities.
> >
> > The steps to migrate a table to HBase on S3 using Export and Import is
> not
> > much different from the steps provided in the HBase documentation. In
> those
> > docs, you can also find detailed information, including how you can use
> > them
> > to migrate part of a table.
> >
> > The following steps show how you can use Export and Import to migrate a
> > table to HBase on S3.
> >
> > From your source cluster, export the HBase table:
> > $ hbase org.apache.hadoop.hbase.mapreduce.Export <table_name>
> > s3://<table_s3_backup>/<location>/
> > In the destination cluster, create the target table into which to import
> > data. Ensure that the column families in the target table are identical
> to
> > the exported/source table’s column families.
> > From the destination cluster, import the table using the Import utility:
> > $ hbase org.apache.hadoop.hbase.mapreduce.Import '<table_name>'
> > s3://<table_s3_backup>/<location>/
> > HBase snapshots are usually the recommended method to migrate HBase
> tables.
> > However, the Export and Import utilities can be useful for test use cases
> > in
> > which you migrate only a small part of your table and test your
> > application.
> > It’s also handy if you’re migrating from an HBase cluster that does not
> > have
> > the HBase snapshots feature.
> >
> > Option 3: Migrate to HBase on S3 using CopyTable
> > Similar to the Export and Import utilities, CopyTable is an HBase utility
> > that you can use to copy part of HBase tables. However, keep in mind that
> > CopyTable doesn’t work if you’re copying or migrating tables between
> HBase
> > versions that are not wire compatible (for example, copying from HBase
> 0.94
> > to HBase 1.x).
> >
> > For more information and examples, see CopyTable in the HBase
> > documentation.
> >
> > Conclusion
> > In this post, I demonstrated how you can use common HBase backup
> utilities
> > to migrate your tables easily to HBase on S3. By using HBase snapshots,
> you
> > can migrate entire tables to  HBase <https://tekslate.com/>   on S3. To
> > test
> > HBase on S3 by migrating or copying only part of your tables, you can use
> > the HBase Export, Import, or CopyTable utilities.
> >
> > If you have questions or suggestions, please comment below.
> >
> >
> >
> > --
> > View this message in context: http://apache-hbase.679495.n3.
> > nabble.com/Tips-for-Migrating-to-Apache-HBase-on-Amazon-S3-
> > from-HDFS-tp4089926.html
> > Sent from the HBase Developer mailing list archive at Nabble.com.
> >
>

Re: Tips for Migrating to Apache HBase on Amazon S3 from HDFS

Reply via email to