[ https://issues.apache.org/jira/browse/HBASE-21642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729938#comment-16729938 ]
Hudson commented on HBASE-21642: -------------------------------- Results for branch branch-2 [build #1579 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1579/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1579//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1579//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1579//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > CopyTable by reading snapshot and bulkloading will save a lot of time. > ---------------------------------------------------------------------- > > Key: HBASE-21642 > URL: https://issues.apache.org/jira/browse/HBASE-21642 > Project: HBase > Issue Type: Improvement > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21642.v1.patch > > > In our HBase clusters, some users has the need to merge two diff table's > data into one. Currently , the CopyTable will scan the source table , and > put mutations into destination table. > Although CopyTable with bulkload can speed a lot (compared to CopyTable with > scan and put), it still take lots of time to scan the source table. and the > worst thing is: CopyTable with scan table will impact the cluster's > availablity, it cost lots of resource in RS to scanning, the cpu, memory, > gc stw, rs handlers, disk io, network io ... etc. All those things will > affect the availablity. > So in our clusters, we tried to do all scanning job by using scan snapshot > instead of scan table. it at least isolate the cpu & memory & gc resource > between the online RS and scanning job. What's more, the snapshot scanning > is much faster than scaning RS, and it's more stable. > So, here, I'll make the copy table tool support snapshot scanning. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)