[ 
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585985#comment-13585985
 ] 

stack commented on HBASE-7507:
------------------------------

Regards the 0.94 patch, why are these statics?

+  private static final int DEFAULT_FLUSH_RETRIES_NUMBER = 10;
+  private static int flush_retries_number;
+  private static int pauseTime;


Patch looks fine.  Pity as said already that we have to localize the retry and 
rather we can't put all the retries behind a filesystem facade; we can do this 
for 0.96....

I wonder what happens retrying after an IOE.    Is it ok retrying any IOE?  Has 
the flush path been reviewed to make sure only IOE is failed NN op?   Is it 
possible to get an IOE after the file has been successfully written?

Just wondering.  Would say commit -- because helps us work with HA NN (HANN).
                
> Make memstore flush be able to retry after exception
> ----------------------------------------------------
>
>                 Key: HBASE-7507
>                 URL: https://issues.apache.org/jira/browse/HBASE-7507
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch, 
> 7507-trunkv3.patch
>
>
> We will abort regionserver if memstore flush throws exception.
> I thinks we could do retry to make regionserver more stable because file 
> system may be not ok in a transient time. e.g. Switching namenode in the 
> NamenodeHA environment
> {code}
> HRegion#internalFlushcache(){
> ...
> try {
> ...
> }catch(Throwable t){
> DroppedSnapshotException dse = new DroppedSnapshotException("region: " +
>           Bytes.toStringBinary(getRegionName()));
> dse.initCause(t);
> throw dse;
> }
> ...
> }
> MemStoreFlusher#flushRegion(){
> ...
> region.flushcache();
> ...
>  try {
> }catch(DroppedSnapshotException ex){
> server.abort("Replay of HLog required. Forcing server shutdown", ex);
> }
> ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to