[ https://issues.apache.org/jira/browse/HBASE-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303053#comment-15303053 ]
Hudson commented on HBASE-15837: -------------------------------- FAILURE: Integrated in HBase-1.2 #636 (See [https://builds.apache.org/job/HBase-1.2/636/]) HBASE-15837 Memstore size accounting is wrong if postBatchMutate() (enis: rev bd6903b9e7bd7707a0c03f30f089b1d31f700411) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java > Memstore size accounting is wrong if postBatchMutate() throws exception > ----------------------------------------------------------------------- > > Key: HBASE-15837 > URL: https://issues.apache.org/jira/browse/HBASE-15837 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 2.0.0, 1.3.0, 1.2.2, 0.98.20, 1.1.6 > > Attachments: HBASE-15837.001.patch, hbase-15837-v1.patch, > hbase-15837.branch-1.patch, hbase-memstore-size-accounting.patch > > > Over in PHOENIX-2883, I've been trying to figure out how to track down the > root cause of an issue we were seeing where a negative memstoreSize was > ultimately causing an RS to abort. The tl;dr version is > * Something causes memstoreSize to be negative (not sure what is doing this > yet) > * All subsequent flushes short-circuit and don't run because they think there > is no data to flush > * The region is eventually closed (commonly, for a move). > * A final flush is attempted on each store before closing (which also > short-circuit for the same reason), leaving unflushed data in each store. > * The sanity check that each store's size is zero fails and the RS aborts. > I have a little patch which I think should improve our failure case around > this, preventing the RS abort safely (forcing a flush when memstoreSize is > negative) and logging a calltrace when an update to memstoreSize make it > negative (to find culprits in the future). -- This message was sent by Atlassian JIRA (v6.3.4#6332)