[ https://issues.apache.org/jira/browse/HBASE-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297503#comment-15297503 ]
Josh Elser commented on HBASE-15837: ------------------------------------ bq. I was not able to find the "submit patch" button. Do you see it? Button clicked :). v1 looks good to me too. Thanks for consolidating! > Memstore size accounting is wrong if postBatchMutate() throws exception > ----------------------------------------------------------------------- > > Key: HBASE-15837 > URL: https://issues.apache.org/jira/browse/HBASE-15837 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 2.0.0 > > Attachments: HBASE-15837.001.patch, hbase-15837-v1.patch, > hbase-memstore-size-accounting.patch > > > Over in PHOENIX-2883, I've been trying to figure out how to track down the > root cause of an issue we were seeing where a negative memstoreSize was > ultimately causing an RS to abort. The tl;dr version is > * Something causes memstoreSize to be negative (not sure what is doing this > yet) > * All subsequent flushes short-circuit and don't run because they think there > is no data to flush > * The region is eventually closed (commonly, for a move). > * A final flush is attempted on each store before closing (which also > short-circuit for the same reason), leaving unflushed data in each store. > * The sanity check that each store's size is zero fails and the RS aborts. > I have a little patch which I think should improve our failure case around > this, preventing the RS abort safely (forcing a flush when memstoreSize is > negative) and logging a calltrace when an update to memstoreSize make it > negative (to find culprits in the future). -- This message was sent by Atlassian JIRA (v6.3.4#6332)