Re: A better way to migrate the whole cluster?

Ted Yu Fri, 15 Aug 2014 10:48:07 -0700

Bryan:
>From javadoc of Backup.java:
bq. it favors swallowing exceptions and incrementing counters as opposed to
failing


Can you share some experience how you handled the errors reported by Backup
?

Thanks


On Fri, Aug 15, 2014 at 10:38 AM, Bryan Beaudreault <
bbeaudrea...@hubspot.com> wrote:

> I agree it would be nice if this was provided by HBase, but it's already
> possible to work straight with the HFiles.  All you need is a custom hadoop
> job.  A good starting point is
>
> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/hadoop/Backup.java
> and modify it to your needs. We've used our own modification of this job
> many times when we do our own cluster migrations.  The idea is that it is
> incremental, so as HFiles get compacted, deleted, etc, you can just run it
> again and move smaller and smaller amounts of data.
>
> Working at the hdfs level should be faster, as you can use more mappers.
> You will still be taxing the IO of the source cluster, but not adding load
> to the actual regionserver processes (ipc queue, memory, etc).
>
> If you upgrade to CDH5 (or the equivalent hdfs version), you can use hdfs
> snapshots to minimize the need to re-run the above Backup job (since you
> are already using replication to keep data up-to-date).
>
>
> On Fri, Aug 15, 2014 at 1:11 PM, Esteban Gutierrez <este...@cloudera.com>
> wrote:
>
> > 1.8TB in a day is not terrible slow if that number comes from the
> CopyTable
> > counters and you are moving data across data centers using public
> networks,
> > that should be about 20MB/sec. Also, CopyTable won't compress anything on
> > the wire so the network overhead should be a lot. If you use anything
> like
> > snappy for block compression and/or fast_diff for block encoding the
> > HFiles, then using snapshots and export them using the ExportSnapshot
> tool
> > should be the way to go.
> >
> > cheers,
> > esteban.
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> >
> > On Thu, Aug 14, 2014 at 11:24 PM, tobe <tobeg3oo...@gmail.com> wrote:
> >
> > > Thank @lars.
> > >
> > > We're using HBase 0.94.11 and follow the instruction to run
> `./bin/hbase
> > > org.apache.hadoop.hbase.mapreduce.CopyTable
> > --peer.adr=hbase://cluster_name
> > > table_name`. We have namespace service to find the ZooKeeper with
> > > "hbase://cluster_name". And the job ran on a shared yarn cluster.
> > >
> > > The performance is affected by many factors, but we haven't found out
> the
> > > reason. It would be great to see your suggestions.
> > >
> > >
> > > On Fri, Aug 15, 2014 at 1:34 PM, lars hofhansl <la...@apache.org>
> wrote:
> > >
> > > > What version of HBase? How are you running CopyTable? A day for 1.8T
> is
> > > > not what we would expect.
> > > > You can definitely take a snapshot and then export the snapshot to
> > > another
> > > > cluster, which will move the actual files; but CopyTable should not
> be
> > so
> > > > slow.
> > > >
> > > >
> > > > -- Lars
> > > >
> > > >
> > > >
> > > > ________________________________
> > > >  From: tobe <tobeg3oo...@gmail.com>
> > > > To: "u...@hbase.apache.org" <u...@hbase.apache.org>
> > > > Cc: dev@hbase.apache.org
> > > > Sent: Thursday, August 14, 2014 8:18 PM
> > > > Subject: A better way to migrate the whole cluster?
> > > >
> > > >
> > > > Sometimes our users want to upgrade their servers or move to a new
> > > > datacenter, then we have to migrate the data from HBase. Currently we
> > > > enable the replication from the old cluster to the new cluster, and
> run
> > > > CopyTable to move the older data.
> > > >
> > > > It's a little inefficient. It takes more than one day to migrate 1.8T
> > > data
> > > > and more time to verify. Can we have a better way to do that, like
> > > snapshot
> > > > or purely HDFS files?
> > > >
> > > > And what's the best practise or your valuable experience?
> > > >
> > >
> >
>

Re: A better way to migrate the whole cluster?

Reply via email to