[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558462#action_12558462 ]
Hadoop QA commented on HADOOP-2587: ----------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373061/patch.txt against trunk revision r611629. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1571/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1571/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1571/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1571/console This message is automatically generated. > Splits getting blocked by compactions causeing region to be offline for the > length of the compaction 10-15 mins > --------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-2587 > URL: https://issues.apache.org/jira/browse/HADOOP-2587 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Environment: hadoop subversion 611087 > Reporter: Billy Pearson > Assignee: Jim Kellerman > Fix For: 0.16.0 > > Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt > > > The below is cut out of one of my region servers logs full log attached > What is happening is there is one region on a this region server and its is > under heave insert load so compaction are back to back one one finishes a new > one starts the problem starts when its time to split the region. > A compaction starts just millsecs before the split starts blocking the split > but the split closes the region before the compaction is finished. Causing > the region to be offline until the compaction is done. Once the compaction is > done the split finishes and all is returned to normal but this is a big > problem for production if the region is offline for 10-15 mins. > The solution would be not to let the split thread to issue the below line > while a compaction on that region is happening. > 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: > webdata,,1200085987488 closing (Adding to retiringRegions) > The only time I have seen this bug is when there is only one region on a > region server because if more then one then the compaction happens to the > other region(s) after the first one is done compaction and the split can do > what it needs on the first region with out getting blocked. > {code} > 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction > completed on region webdata,,1200085987488. Took 16mins, 10sec > 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for > HStore webdata,,1200085987488/size needed. > 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: > 1773667150/size needs compaction > 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting > compaction on region webdata,,1200085987488 > 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started > compaction of 14 files using > /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size > for webdata,,1200085987488/size > 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started > memcache flush for region webdata,,1200085987488. Size 31.2m > 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting > webdata,,1200085987488 because largest aggregate size is 100.7m and desired > size is 64.0m > 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: > webdata,,1200085987488 closing (Adding to retiringRegions) > ... > lots of NotServingRegionException's > ... > 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction > completed on region webdata,,1200085987488. Took 10mins, 58sec > ... > 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up > /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true > 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of > webdata,,1200085987488 complete; new regions: webdata,,1200090121237, > webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split > took 11mins, 0sec > 2008-01-11 16:33:02,227 DEBUG > org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for > .META.. Doing a find... > 2008-01-11 16:33:02,283 DEBUG > org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) > for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, > startKey: <>, encodedName(70236052) tableDesc: {name: -ROOT-, families: > {info:={name: info, max versions: 1, compression: NONE, in memory: false, max > length: 2147483647, bloom filter: none}}} > 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating > .META. with region split info > 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: > Reporting region split to master > 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region > split, META update, and report to master all successful. Old > region=webdata,,1200085987488, new regions: webdata,,1200090121237, > webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.