[ 
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584902#comment-13584902
 ] 

Himanshu Vashishtha commented on HBASE-7507:
--------------------------------------------

This patch looks safe to me (shouldn't introduce any flakiness as such). Ran it 
on jenkins on the current 0.94 and it was green. Rather, I think instead of 
re-trying the flush operation, why not just check whether the file system is 
available or not in a re-trying mode? That should be more efficient. Or, yo 
have considered that already?

The other possible candidates in a running cluster I can see are Compaction and 
Log rolling. The former can be made to check the file system health in a 
retrying manner (if people agree, I can upload a patch for that). 
The log rolling looks a bit tricky because there are two idempotent operations 
involved: Creating a new HLog writer, and closing the existing one. Having a 
retrying loop for these (especially creating a new hlog file in the .logs 
directory) doesn't look to be a good idea. I would avoid doing that. 
Looking for more opinions?
                
> Make memstore flush be able to retry after exception
> ----------------------------------------------------
>
>                 Key: HBASE-7507
>                 URL: https://issues.apache.org/jira/browse/HBASE-7507
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch, 
> 7507-trunkv3.patch
>
>
> We will abort regionserver if memstore flush throws exception.
> I thinks we could do retry to make regionserver more stable because file 
> system may be not ok in a transient time. e.g. Switching namenode in the 
> NamenodeHA environment
> {code}
> HRegion#internalFlushcache(){
> ...
> try {
> ...
> }catch(Throwable t){
> DroppedSnapshotException dse = new DroppedSnapshotException("region: " +
>           Bytes.toStringBinary(getRegionName()));
> dse.initCause(t);
> throw dse;
> }
> ...
> }
> MemStoreFlusher#flushRegion(){
> ...
> region.flushcache();
> ...
>  try {
> }catch(DroppedSnapshotException ex){
> server.abort("Replay of HLog required. Forcing server shutdown", ex);
> }
> ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to