[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156356#comment-13156356 ]
Ted Yu commented on HBASE-4853: ------------------------------- With patch v5, I got the following: {code} testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize) Time elapsed: 11.516 sec <<< FAILURE! java.lang.AssertionError: Server=10.246.204.31,62993,1322086547613, i=0 expected:<0> but was:<608> {code} Here is tail of test output: {code} 2011-11-23 14:15:55,955 INFO [main] regionserver.Store(631): Added hdfs://localhost:62971/user/zhihyu/.META./1028785192/info/6d51d01d9498464eb025ca045e696ce4, entries=47, sequenceid=36, filesize=8.4k 2011-11-23 14:15:55,956 INFO [main] regionserver.HRegion(1396): Finished memstore flush of ~17.2k/17608 for region .META.,,1.1028785192 in 44ms, sequenceid=36, compaction requested=false 2011-11-23 14:15:55,956 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush .META.,,1.1028785192 on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,,1322086555196.e2b7276e785c7f6213a5bdd08a54cf8e. on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,c,P\xE3+,1322086555201.2c847584e6af6e64f3bae631bd722934. on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,q\x83\xCC\xF1{,1322086555217.f5079469f9fa696de61b9db6364cd6e7. on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(101): Post flush on 10.246.204.31,62993,1322086547613 {code} Basically there was no mentioning of flush completion for TestGlobalMemStoreSize table. I think we should add a log before the assertion so that we know how long we spent waiting in the while loop: {code} assertEquals("Server=" + server.getServerName() + ", i=" + i++, 0, server.getRegionServerAccounting().getGlobalMemstoreSize()); {code} We should increase the wait time beyond 3 seconds. > HBASE-4789 does overzealous pruning of seqids > --------------------------------------------- > > Key: HBASE-4853 > URL: https://issues.apache.org/jira/browse/HBASE-4853 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Priority: Critical > Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, > 4853-v5.txt, 4853-v6.txt, 4853.txt > > > Working w/ J-D on failing replication test turned up hole in seqids made by > the patch over in hbase-4789. With this patch in place we see lots of > instances of the suspicious: 'Last sequenceid written is empty. Deleting all > old hlogs' > At a minimum, these lines need removing: > {code} > diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > index 623edbe..a0bbe01 100644 > --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { > // Cleaning up of lastSeqWritten is in the finally clause because we > // don't want to confuse getOldestOutstandingSeqNum() > this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); > - Long l = this.lastSeqWritten.remove(encodedRegionName); > - if (l != null) { > - LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? > name=" + > - Bytes.toString(encodedRegionName) + ", seqid=" + l); > - } > this.cacheFlushLock.unlock(); > } > } > {code} > ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira