[jira] [Commented] (HBASE-21642) CopyTable by reading snapshot and bulkloading will save a lot of time.

Zheng Hu (JIRA) Wed, 26 Dec 2018 01:10:21 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728939#comment-16728939
 ]


Zheng Hu commented on HBASE-21642:
----------------------------------

When running  copyTable on Mob table by scan snapshot, I found : 
{code}
2018-12-26 16:52:51,088 DEBUG [LocalJobRunner Map Task Executor #0] 
ipc.AbstractRpcClient(483): Stopping rpc client
2018-12-26 16:52:51,095 WARN  [Thread-1048] mapred.LocalJobRunner$Job(560): 
job_local2134482229_0002
java.lang.Exception: java.lang.NullPointerException
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HMobStore.readCell(HMobStore.java:409)
        at 
org.apache.hadoop.hbase.regionserver.HMobStore.resolve(HMobStore.java:346)
        at 
org.apache.hadoop.hbase.regionserver.MobStoreScanner.next(MobStoreScanner.java:73)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6631)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6795)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6568)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6554)
        at 
org.apache.hadoop.hbase.client.ClientSideRegionScanner.next(ClientSideRegionScanner.java:77)
        at 
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl$RecordReader.nextKeyValue(TableSnapshotInputFormatImpl.java:241)
        at 
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.nextKeyValue(TableSnapshotInputFormat.java:166)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}
It's a bug when scaning snapshot of mob table...

> CopyTable by reading snapshot and bulkloading will save a lot of time.
> ----------------------------------------------------------------------
>
>                 Key: HBASE-21642
>                 URL: https://issues.apache.org/jira/browse/HBASE-21642
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>
> In our HBase clusters,  some users has the need to merge two diff table's 
> data into one.  Currently ,  the CopyTable will scan the source table , and 
> put mutations into destination table. 
> Although CopyTable with bulkload can speed a lot (compared to CopyTable with 
> scan and put), it still take lots of time to scan the source table.  and the 
> worst thing is:  CopyTable with scan table will impact the cluster's 
> availablity, it cost lots of resource in RS to scanning,  the cpu,  memory, 
> gc stw,  rs handlers, disk io, network io ... etc.  All those things will 
> affect the availablity. 
> So in our clusters,  we tried to do all scanning job by using scan snapshot 
> instead of scan table.  it at least isolate the cpu & memory & gc resource  
> between the online RS and scanning job. What's more,  the snapshot scanning 
> is much faster than scaning RS, and it's more stable.
> So, here,  I'll make the copy table tool support snapshot scanning. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21642) CopyTable by reading snapshot and bulkloading will save a lot of time.

Reply via email to