Try to add these: 

-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false


Also, as others pointed out, what's the bandwidth between the two clusters?



________________________________
 From: tobe <[email protected]>
To: [email protected]; lars hofhansl <[email protected]> 
Sent: Thursday, August 14, 2014 11:24 PM
Subject: Re: A better way to migrate the whole cluster?
 

Thank @lars.

We're using HBase 0.94.11 and follow the instruction to run `./bin/hbase
org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=hbase://cluster_name
table_name`. We have namespace service to find the ZooKeeper with
"hbase://cluster_name". And the job ran on a shared yarn cluster.

The performance is affected by many factors, but we haven't found out the
reason. It would be great to see your suggestions.



On Fri, Aug 15, 2014 at 1:34 PM, lars hofhansl <[email protected]> wrote:

> What version of HBase? How are you running CopyTable? A day for 1.8T is
> not what we would expect.
> You can definitely take a snapshot and then export the snapshot to another
> cluster, which will move the actual files; but CopyTable should not be so
> slow.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: tobe <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc: [email protected]
> Sent: Thursday, August 14, 2014 8:18 PM
> Subject: A better way to migrate the whole cluster?
>
>
> Sometimes our users want to upgrade their servers or move to a new
> datacenter, then we have to migrate the data from HBase. Currently we
> enable the replication from the old cluster to the new cluster, and run
> CopyTable to move the older data.
>
> It's a little inefficient. It takes more than one day to migrate 1.8T data
> and more time to verify. Can we have a better way to do that, like snapshot
> or purely HDFS files?
>
> And what's the best practise or your valuable experience?
>

Reply via email to