Sorry for the late reply. We continue to use CopyTable and Replication to migrate date without stopping the clients. It's still a little inefficient. I would like to know how you migrate the cluster without affecting the services. Replication + Snapshot?
And we have found two bugs: HBASE-10153 about VerifyReplicaiton and after `remove_peer` ZooKeeper still recorded its peer_id and hlog. To make matters worse, Replication can't catch up the speed of input from source cluster, so more and more hlogs are backlogged. We will have a deep look at it. On Tue, Aug 19, 2014 at 2:32 PM, lars hofhansl <[email protected]> wrote: > Tobe, > > did any of this fix your problem? > > -- Lars > > > > ________________________________ > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Friday, August 15, 2014 10:57 AM > Subject: Re: A better way to migrate the whole cluster? > > > Well, 1.8TB in 24h is 1.8/24/3600TB/s ~ 22KB/s. That seems pretty slow to > me. :) > > My bet is still on scanner caching set to 1 by default in 0.94 (and hence > each mapper does an RPC for every single row, making this is a local > latency problem not a bandwidth problem). > > As stated in other email, try adding these two to the CopyTable command: > > > -Dhbase.client.scanner.caching=100 > -Dmapred.map.tasks.speculative.execution=false > > > -- Lars > > > > ________________________________ > > > > From: Esteban Gutierrez <[email protected]> > To: "[email protected]" <[email protected]> > Cc: lars hofhansl <[email protected]> > Sent: Friday, August 15, 2014 10:11 AM > Subject: Re: A better way to migrate the whole cluster? > > > 1.8TB in a day is not terrible slow if that number comes from the CopyTable > counters and you are moving data across data centers using public networks, > that should be about 20MB/sec. Also, CopyTable won't compress anything on > the wire so the network overhead should be a lot. If you use anything like > snappy for block compression and/or fast_diff for block encoding the > HFiles, then using snapshots and export them using the ExportSnapshot tool > should be the way to go. > > cheers, > esteban. > > > > -- > Cloudera, Inc. > > > > > On Thu, Aug 14, 2014 at 11:24 PM, tobe <[email protected]> wrote: > > > Thank @lars. > > > > We're using HBase 0.94.11 and follow the instruction to run `./bin/hbase > > org.apache.hadoop.hbase.mapreduce.CopyTable > --peer.adr=hbase://cluster_name > > table_name`. We have namespace service to find the ZooKeeper with > > "hbase://cluster_name". And the job ran on a shared yarn cluster. > > > > The performance is affected by many factors, but we haven't found out the > > reason. It would be great to see your suggestions. > > > > > > On Fri, Aug 15, 2014 at 1:34 PM, lars hofhansl <[email protected]> wrote: > > > > > What version of HBase? How are you running CopyTable? A day for 1.8T is > > > not what we would expect. > > > You can definitely take a snapshot and then export the snapshot to > > another > > > cluster, which will move the actual files; but CopyTable should not be > so > > > slow. > > > > > > > > > -- Lars > > > > > > > > > > > > ________________________________ > > > From: tobe <[email protected]> > > > To: "[email protected]" <[email protected]> > > > Cc: [email protected] > > > Sent: Thursday, August 14, 2014 8:18 PM > > > Subject: A better way to migrate the whole cluster? > > > > > > > > > Sometimes our users want to upgrade their servers or move to a new > > > datacenter, then we have to migrate the data from HBase. Currently we > > > enable the replication from the old cluster to the new cluster, and run > > > CopyTable to move the older data. > > > > > > It's a little inefficient. It takes more than one day to migrate 1.8T > > data > > > and more time to verify. Can we have a better way to do that, like > > snapshot > > > or purely HDFS files? > > > > > > And what's the best practise or your valuable experience? > > > > > >
