[ https://issues.apache.org/jira/browse/HBASE-21642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729497#comment-16729497 ]
Zheng Hu commented on HBASE-21642: ---------------------------------- Since the HBASE-21514 has been pushed to branch-2 and master, so will push this to branch-2 & master too. Because we need this patch to fix the Mob table's TableSnapshotInputFormatImpl NPE. > CopyTable by reading snapshot and bulkloading will save a lot of time. > ---------------------------------------------------------------------- > > Key: HBASE-21642 > URL: https://issues.apache.org/jira/browse/HBASE-21642 > Project: HBase > Issue Type: Improvement > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Major > Attachments: HBASE-21642.v1.patch > > > In our HBase clusters, some users has the need to merge two diff table's > data into one. Currently , the CopyTable will scan the source table , and > put mutations into destination table. > Although CopyTable with bulkload can speed a lot (compared to CopyTable with > scan and put), it still take lots of time to scan the source table. and the > worst thing is: CopyTable with scan table will impact the cluster's > availablity, it cost lots of resource in RS to scanning, the cpu, memory, > gc stw, rs handlers, disk io, network io ... etc. All those things will > affect the availablity. > So in our clusters, we tried to do all scanning job by using scan snapshot > instead of scan table. it at least isolate the cpu & memory & gc resource > between the online RS and scanning job. What's more, the snapshot scanning > is much faster than scaning RS, and it's more stable. > So, here, I'll make the copy table tool support snapshot scanning. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)