[ 
https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218554#comment-15218554
 ] 

Anoop Sam John commented on HBASE-15436:
----------------------------------------

Thanks Nicholas
bq.iirc, A long time ago, the buffer was attached to the Table object, so the 
policy (or at least the objective :-)) when one of the puts had failed (i.e. 
reached the max retry number) was simple: all the operations currently in the 
buffer were considered as failed as well, even if we had not even tried to send 
them. As a consequence the buffer was empty after the failure of a single put. 
It was then up to the client to continue or not. May be we should do the same 
with the buffered mutator, for all  cases, close or not? I haven't looked at 
the bufferedMutator code, but I can have a look it you whish [~anoop.hbase].
Both BufferedMutator and normal Table uses same AycnProcess path.  Am not 
remembering our old way of fail all when one failed(after max retries).
Also I feel, we need to add the closed check in the loop of retry..  Some how 
user called close on the BufferedMutator.  Ya it has to be a graceful close.  
But not like mins user has to wait for the close..   We are in a trial and that 
failed, and at least before the next retry, we need to see the close flag.


> BufferedMutatorImpl.flush() appears to get stuck
> ------------------------------------------------
>
>                 Key: HBASE-15436
>                 URL: https://issues.apache.org/jira/browse/HBASE-15436
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.2
>            Reporter: Sangjin Lee
>         Attachments: hbaseException.log, threaddump.log
>
>
> We noticed an instance where the thread that was executing a flush 
> ({{BufferedMutatorImpl.flush()}}) got stuck when the (local one-node) cluster 
> shut down and was unable to get out of that stuck state.
> The setup is a single node HBase cluster, and apparently the cluster went 
> away when the client was executing flush. The flush eventually logged a 
> failure after 30+ minutes of retrying. That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the 
> {{flush()}} call). I would have expected the {{flush()}} call to return after 
> the complete failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to