[jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

Jim Kellerman (JIRA) Sun, 13 Jan 2008 18:41:58 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558486#action_12558486
 ]


Jim Kellerman commented on HADOOP-2587:
---------------------------------------

Updates prevent:
 - cache flushes
 - closing a region (and consequently, splits)
 - final stage of compaction

Scanners prevent
 - final stage of compaction
 - closing a region (and consequently, splits)

During the final stage of compaction
 - no new scanners may be created
 - updates are prohibited

Cache flushes prevent
 - closing a region (and consequently, splits)
 - updates
 - rolling the HLog

Rolling the HLog prevents
 - cache flushes
 - updates

A region split must close the old region. Consequently before it can start it 
must:
 - wait for any compactions or cache flushes to complete
 - lock the region to prevent new updates
 - wait for active scanners to terminate
 - wait for updates in progress to finish

Once a split is in progress, the actual process is quick. However even after 
the region server reports that the split has completed, clients must wait until 
the master assigns the new regions and the region server(s) report to the 
master that
the new regions are being served.
split, , , log roll, , 

> Splits getting blocked by compactions causeing region to be offline for the 
> length of the compaction 10-15 mins
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2587
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2587
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>         Environment: hadoop subversion 611087
>            Reporter: Billy Pearson
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt
>
>
> The below is cut out of one of my region servers logs full log attached
> What is happening is there is one region on a this region server and its is 
> under heave insert load so compaction are back to back one one finishes a new 
> one starts the problem starts when its time to split the region. 
> A compaction starts just millsecs before the split starts blocking the split 
> but the split closes the region before the compaction is finished. Causing 
> the region to be offline until the compaction is done. Once the compaction is 
> done the split finishes and all is returned to normal but this is a big 
> problem for production if the region is offline for 10-15 mins.
> The solution would be not to let the split thread to issue the below line 
> while a compaction on that region is happening.
> 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
> webdata,,1200085987488 closing (Adding to retiringRegions)
> The only time I have seen this bug is when there is only one region on a 
> region server because if more then one then the compaction happens to the 
> other region(s) after the first one is done compaction and the split can do 
> what it needs on the first region with out getting blocked.
> {code}
> 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
> completed on region webdata,,1200085987488. Took 16mins, 10sec
> 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
> HStore webdata,,1200085987488/size needed.
> 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
> 1773667150/size needs compaction
> 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
> compaction on region webdata,,1200085987488
> 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
> compaction of 14 files using 
> /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
>  for webdata,,1200085987488/size
> 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
> memcache flush for region webdata,,1200085987488. Size 31.2m
> 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
> webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
> size is 64.0m
> 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
> webdata,,1200085987488 closing (Adding to retiringRegions)
> ...
> lots of NotServingRegionException's
> ...
> 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
> completed on region webdata,,1200085987488. Took 10mins, 58sec
> ...
> 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
> /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
> 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
> webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
> webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
> took 11mins, 0sec
> 2008-01-11 16:33:02,227 DEBUG 
> org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
> .META.. Doing a find...
> 2008-01-11 16:33:02,283 DEBUG 
> org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
> for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
> startKey: <>, encodedName(70236052) tableDesc: {name: -ROOT-, families: 
> {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}}}
> 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
> .META. with region split info
> 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
> Reporting region split to master
> 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
> split, META update, and report to master all successful. Old 
> region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
> webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

Reply via email to