[ https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584902#comment-13584902 ]
Himanshu Vashishtha commented on HBASE-7507: -------------------------------------------- This patch looks safe to me (shouldn't introduce any flakiness as such). Ran it on jenkins on the current 0.94 and it was green. Rather, I think instead of re-trying the flush operation, why not just check whether the file system is available or not in a re-trying mode? That should be more efficient. Or, yo have considered that already? The other possible candidates in a running cluster I can see are Compaction and Log rolling. The former can be made to check the file system health in a retrying manner (if people agree, I can upload a patch for that). The log rolling looks a bit tricky because there are two idempotent operations involved: Creating a new HLog writer, and closing the existing one. Having a retrying loop for these (especially creating a new hlog file in the .logs directory) doesn't look to be a good idea. I would avoid doing that. Looking for more opinions? > Make memstore flush be able to retry after exception > ---------------------------------------------------- > > Key: HBASE-7507 > URL: https://issues.apache.org/jira/browse/HBASE-7507 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3 > Reporter: chunhui shen > Assignee: chunhui shen > Priority: Critical > Fix For: 0.96.0 > > Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch, > 7507-trunkv3.patch > > > We will abort regionserver if memstore flush throws exception. > I thinks we could do retry to make regionserver more stable because file > system may be not ok in a transient time. e.g. Switching namenode in the > NamenodeHA environment > {code} > HRegion#internalFlushcache(){ > ... > try { > ... > }catch(Throwable t){ > DroppedSnapshotException dse = new DroppedSnapshotException("region: " + > Bytes.toStringBinary(getRegionName())); > dse.initCause(t); > throw dse; > } > ... > } > MemStoreFlusher#flushRegion(){ > ... > region.flushcache(); > ... > try { > }catch(DroppedSnapshotException ex){ > server.abort("Replay of HLog required. Forcing server shutdown", ex); > } > ... > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira