Xu Cang created HBASE-24028:
-------------------------------

             Summary: MapReduce on snapshot restores and opens all regions in 
each mapper
                 Key: HBASE-24028
                 URL: https://issues.apache.org/jira/browse/HBASE-24028
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.6.0, 2.3.0
            Reporter: Xu Cang


Given this scenario: one MR job scans a table (with many regions). I will use 
'RestoreSnapshotHelper' to restore snapshot for all regions in each mapper. 

In the code 
[https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java#L183]

Seems there is no way to only restore relevant regions from snapshot to region.

This leads to extreme slowness and waste of resource. 

Please correct me if I am wrong or miss anything. thanks.

 

One quick example I san show as below, in my test, there are 2 regions in a 
testing table. and each mapper opens and iterates 2 regions. 

2020-03-19 18:58:15,225 INFO [main] mapred.MapTask - Map output collector class 
= org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to 
add: *d7f85b4a9d3fa22a5e7b88bda39f6d50*
2020-03-19 18:58:15,285 INFO [main] snapshot.RestoreSnapshotHelper - region to 
add: *69dd3fdba3698f827f8883ed911161ef*
2020-03-19 18:58:15,286 INFO [main] snapshot.RestoreSnapshotHelper - clone 
region=d7f85b4a9d3fa22a5e7b88bda39f6d50 as d7f85b4a9d3fa22a5e7b88bda39f6d50

 

So if I misunderstood anything, can anyone point to me where in this class, can 
distinguish which region to go through for different mappers? 

 

btw the original implementation for MR on Snapshot is here, there weren't too 
many big changes after that HBASE-8369 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to