[ https://issues.apache.org/jira/browse/HBASE-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204455#comment-13204455 ]
ramkrishna.s.vasudevan commented on HBASE-5312: ----------------------------------------------- We have tried to reproduce this bug in different ways. But still unclear. May be i will check the defect HBASE-3481 to see if it helps. > Closed parent region present in Hlog.lastSeqWritten > --------------------------------------------------- > > Key: HBASE-5312 > URL: https://issues.apache.org/jira/browse/HBASE-5312 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.5 > Reporter: ramkrishna.s.vasudevan > Fix For: 0.90.7 > > > This is in reference to the mail sent in the dev mailing list > "Closed parent region present in Hlog.lastSeqWritten". > The sceanrio described is > We had a region that was split into two daughters. When the hlog roll tried > to flush the region there was an entry in the HLog.lastSeqWritten that was > not flushed or removed from the lastSeqWritten during the parent close. > Because this flush was not happening subsequent flushes were getting blocked > {code} > 05:06:44,422 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many > hlogs: logs=122, maxlogs=32; forcing flush of 1 regions(s): > 2acaf8e3acfd2e8a5825a1f6f0aca4a8 > 05:06:44,422 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed > to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, > requester=null > 05:10:48,666 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many > hlogs: logs=123, maxlogs=32; forcing flush of 1 regions(s): > 2acaf8e3acfd2e8a5825a1f6f0aca4a8 > 05:10:48,666 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed > to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, > requester=null > 05:14:46,075 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many > hlogs: logs=124, maxlogs=32; forcing flush of 1 regions(s): > 2acaf8e3acfd2e8a5825a1f6f0aca4a8 > 05:14:46,075 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed > to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, > requester=null > 05:15:41,584 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many > hlogs: logs=125, maxlogs=32; forcing flush of 1 regions(s): > 2acaf8e3acfd2e8a5825a1f6f0aca4a8 > 05:15:41,584 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed > to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null, > {code} > Lets see what happened for the region 2acaf8e3acfd2e8a5825a1f6f0aca4a8 > {code} > 2012-01-06 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: > Renaming flushed file at > hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815 > to > hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123 > 2012-01-06 00:30:58,946 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Instantiated > Htable_UFDR_016,049790700093168-04565200000,1325809837958.0ebe5bd7fcbc09ee074d5600b9d4e062. > 2012-01-06 00:30:59,614 INFO org.apache.hadoop.hbase.regionserver.Store: > Added > hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123, > entries=7537, sequenceid=20312223, memsize=4.2m, filesize=2.9m > 2012-01-06 00:30:59,787 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting, commencing flushing stores > 2012-01-06 00:30:59,787 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~133.5m for region > Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in > 21816ms, sequenceid=20312223, compaction requested=true > 2012-01-06 00:30:59,787 DEBUG > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested > for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. > because regionserver20020.cacheFlusher; priority=0, compaction queue size=5840 > {code} > A user triggered split has been issued to this region which can be seen in > the above logs. > The flushing of this region has resulted in a seq id 20312223. > The region has been splitted and the parent region has been closed > {code} > 00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: > Starting split of region > Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. > 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing > Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.: > disabling compactions & flushes > 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates > disabled for region > Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. > 00:31:13,718 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed > Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. > 00:31:39,552 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent > region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. > in META > 00:31:41,374 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded > hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/259d0c620c9105928e52713f4a5a252e/value/239119195230239381.2acaf8e3acfd2e8a5825a1f6f0aca4a8, > isReference=true, isBulkLoadResult=false, seqid=20201181, > majorCompaction=false > 2012-01-06 00:31:42,529 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded > hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/52ff3c7c6df6e0337876bbca29cee84a/value/973789709483406123.2acaf8e3acfd2e8a5825a1f6f0aca4a8, > isReference=true, isBulkLoadResult=false, seqid=20312224, > majorCompaction=false > 2012-01-06 00:31:42,532 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded > hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/259d0c620c9105928e52713f4a5a252e/value/973789709483406123.2acaf8e3acfd2e8a5825a1f6f0aca4a8, > isReference=true, isBulkLoadResult=false, seqid=20312223, > majorCompaction=false > INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, > META updated, and report to master. > Parent=Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., > new regions: > Htable_UFDR_031,00332,1325809872607.259d0c620c9105928e52713f4a5a252e., > Htable_UFDR_031,003732800093168-03594291912,1325809872607.52ff3c7c6df6e0337876bbca29cee84a.. > Split took 29sec > {code} > In the above logs we can also see that the new daugher regions have the next > seq id as 20312223 and 20312224. > {code} > DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove > out of total 4; oldest outstanding sequenceid is 20312224 from region > 2acaf8e3acfd2e8a5825a1f6f0aca4a8 > {code} > Now we see that the parent region which was clsoed has a seq id 20312224 > which is not flushed. > So further flush are failing as the region is already removed from > onlineRegionList. > The doubt here is before the region could be closed, a put has arrived for > this region. But due to some reason the flush has not happened for that. We > tried to dig this, but not able to get this problem again. > 0.90.5 version + few 0.90.6 patches ( before 0.90.6RC0) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira