[ https://issues.apache.org/jira/browse/HBASE-21642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728939#comment-16728939 ]
Zheng Hu commented on HBASE-21642: ---------------------------------- When running copyTable on Mob table by scan snapshot, I found : {code} 2018-12-26 16:52:51,088 DEBUG [LocalJobRunner Map Task Executor #0] ipc.AbstractRpcClient(483): Stopping rpc client 2018-12-26 16:52:51,095 WARN [Thread-1048] mapred.LocalJobRunner$Job(560): job_local2134482229_0002 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HMobStore.readCell(HMobStore.java:409) at org.apache.hadoop.hbase.regionserver.HMobStore.resolve(HMobStore.java:346) at org.apache.hadoop.hbase.regionserver.MobStoreScanner.next(MobStoreScanner.java:73) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6631) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6795) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6568) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6554) at org.apache.hadoop.hbase.client.ClientSideRegionScanner.next(ClientSideRegionScanner.java:77) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl$RecordReader.nextKeyValue(TableSnapshotInputFormatImpl.java:241) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.nextKeyValue(TableSnapshotInputFormat.java:166) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} It's a bug when scaning snapshot of mob table... > CopyTable by reading snapshot and bulkloading will save a lot of time. > ---------------------------------------------------------------------- > > Key: HBASE-21642 > URL: https://issues.apache.org/jira/browse/HBASE-21642 > Project: HBase > Issue Type: Improvement > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Major > > In our HBase clusters, some users has the need to merge two diff table's > data into one. Currently , the CopyTable will scan the source table , and > put mutations into destination table. > Although CopyTable with bulkload can speed a lot (compared to CopyTable with > scan and put), it still take lots of time to scan the source table. and the > worst thing is: CopyTable with scan table will impact the cluster's > availablity, it cost lots of resource in RS to scanning, the cpu, memory, > gc stw, rs handlers, disk io, network io ... etc. All those things will > affect the availablity. > So in our clusters, we tried to do all scanning job by using scan snapshot > instead of scan table. it at least isolate the cpu & memory & gc resource > between the online RS and scanning job. What's more, the snapshot scanning > is much faster than scaning RS, and it's more stable. > So, here, I'll make the copy table tool support snapshot scanning. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)