Zheng Hu created HBASE-21642:
--------------------------------

             Summary: CopyTable by reading snapshot and bulkloading will save a 
lot of time.
                 Key: HBASE-21642
                 URL: https://issues.apache.org/jira/browse/HBASE-21642
             Project: HBase
          Issue Type: Bug
            Reporter: Zheng Hu
            Assignee: Zheng Hu


In our HBase clusters,  some users has the need to merge two diff table's data 
into one.  Currently ,  the CopyTable will scan the source table , and put 
mutations into destination table. 
Although CopyTable with bulkload can speed a lot (compared to CopyTable with 
scan and put), it still take lots of time to scan the source table.  and the 
worst thing is:  CopyTable with scan table will impact the cluster's 
availablity, it cost lots of resource in RS to scanning,  the cpu,  memory, gc 
stw,  rs handlers, disk io, network io ... etc.  All those things will affect 
the availablity. 

So in our clusters,  we tried to do all scanning job by using scan snapshot 
instead of scan table.  it at least isolate the cpu & memory & gc resource  
between the online RS and scanning job. What's more,  the snapshot scanning is 
much faster than scaning RS, and it's more stable.

So, here,  I'll make the copy table tool support snapshot scanning. 


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to