[ 
https://issues.apache.org/jira/browse/HBASE-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287248#comment-15287248
 ] 

Josh Elser commented on HBASE-15837:
------------------------------------

bq. I was also looking at this to understand what may have happened. The 
memstore size discrepancy happens when the index update fails for Phoenix. I 
also have a patch that should fix the root cause. 

Great findings! Your patch makes sense to me.

> More gracefully handle a negative memstoreSize
> ----------------------------------------------
>
>                 Key: HBASE-15837
>                 URL: https://issues.apache.org/jira/browse/HBASE-15837
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15837.001.patch, 
> hbase-memstore-size-accounting.patch
>
>
> Over in PHOENIX-2883, I've been trying to figure out how to track down the 
> root cause of an issue we were seeing where a negative memstoreSize was 
> ultimately causing an RS to abort. The tl;dr version is
> * Something causes memstoreSize to be negative (not sure what is doing this 
> yet)
> * All subsequent flushes short-circuit and don't run because they think there 
> is no data to flush
> * The region is eventually closed (commonly, for a move).
> * A final flush is attempted on each store before closing (which also 
> short-circuit for the same reason), leaving unflushed data in each store.
> * The sanity check that each store's size is zero fails and the RS aborts.
> I have a little patch which I think should improve our failure case around 
> this, preventing the RS abort safely (forcing a flush when memstoreSize is 
> negative) and logging a calltrace when an update to memstoreSize make it 
> negative (to find culprits in the future).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to