[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-5611: ------------------------- Fix Version/s: 0.95.0 > Replayed edits from regions that failed to open during recovery aren't > removed from the global MemStore size > ------------------------------------------------------------------------------------------------------------ > > Key: HBASE-5611 > URL: https://issues.apache.org/jira/browse/HBASE-5611 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.6 > Reporter: Jean-Daniel Cryans > Assignee: Jieshan Bean > Priority: Critical > Fix For: 0.92.2, 0.94.0, 0.95.0 > > Attachments: 5611-94.addendum, 5611-94-v2.txt, HBASE-5611-92.patch, > HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch > > > This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think > it's still possible to hit it if a region fails to open for more obscure > reasons like HDFS errors. > Consider a region that just went through distributed splitting and that's now > being opened by a new RS. The first thing it does is to read the recovery > files and put the edits in the {{MemStores}}. If this process takes a long > time, the master will move that region away. At that point the edits are > still accounted for in the global {{MemStore}} size but they are dropped when > the {{HRegion}} gets cleaned up. It's completely invisible until the > {{MemStoreFlusher}} needs to force flush a region and that none of them have > edits: > {noformat} > 2012-03-21 00:33:39,303 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=5.9g > 2012-03-21 00:33:39,303 ERROR > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed > for entry null > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) > at java.lang.Thread.run(Thread.java:662) > {noformat} > The {{null}} here is a region. In my case I had so many edits in the > {{MemStore}} during recovery that I'm over the low barrier although in fact > I'm at 0. It happened yesterday and it still printing this out. > To fix this we need to be able to decrease the global {{MemStore}} size when > the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira