[jira] [Comment Edited] (PHOENIX-3964) Index.preWALRestore should handle index write failure

chenglei (JIRA) Fri, 23 Jun 2017 07:55:16 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061051#comment-16061051
 ]


chenglei edited comment on PHOENIX-3964 at 6/23/17 2:54 PM:
------------------------------------------------------------

[[email protected]],thank you for review,

bq. IMO, index regions will come up on all regionserver before data region 
during the race.

I just mean when we write  index regions on other RegionServers, we can not 
guarantee the index regions on other RegionServers is online.

bq. No, it will kill the region server because of KillServerOnFailurePolicy. 
And, We will lose the cached edits and not sure we will get a chance to replay 
them again.

the method name  {{IndexWriter.writeAndKillYourselfOnFailure}} is very 
misleading, maybe we can rename it to 
{{IndexWriter.writeAndHandleExceptionOnFailure}},  actually the default 
IndexFailurePolicy used by IndexWriter is PhoenixIndexFailurePolicy, not 
KillServerOnFailurePolicy, PhoenixIndexFailurePolicy will not kill the 
RegionServer if it successes, and even if it would kill the RegionServer in the 
worst case, the region's {{recovered.edits}} file folder is not deleted,and 
surely  we have a chance to replay them again.

bq.I'm now becoming sceptical about this change, cached WAL replay after 
postOpen could overwrite the new index writes, if there are new overlapped 
writes coming to the data table(which eventually also written to index table) 
because now the region is open and available.(May result in data loss)

Cached WALs replay in postOpen would not overwrite the new index writes,because 
when we builded the index updates cells previously, we set the timestamp of 
cells in PhoenixIndexCodec, so even there are new overlapped writes now, 
because the timestamp, it will not cause data overwrite or loss. 


was (Author: comnetwork):
[[email protected]],thank you for review,

bq. IMO, index regions will come up on all regionserver before data region 
during the race.

I just mean when we write  index regions on other RegionServers, we can not 
guarantee the index regions on other RegionServers is online.

bq. No, it will kill the region server because of KillServerOnFailurePolicy. 
And, We will lose the cached edits and not sure we will get a chance to replay 
them again.

the method name  {{IndexWriter.writeAndKillYourselfOnFailure}} is very 
misleading, maybe we can rename it to 
{{IndexWriter.writeAndHandleExceptionOnFailure}},  actually the default 
IndexFailurePolicy is PhoenixIndexFailurePolicy, not KillServerOnFailurePolicy, 
PhoenixIndexFailurePolicy will not kill the RegionServer if it successes, and 
even if it would kill the RegionServer in the worst case, the region's 
{{recovered.edits}} file folder is not deleted,and surely  we have a chance to 
replay them again.

bq.I'm now becoming sceptical about this change, cached WAL replay after 
postOpen could overwrite the new index writes, if there are new overlapped 
writes coming to the data table(which eventually also written to index table) 
because now the region is open and available.(May result in data loss)

Cached WALs replay in postOpen would not overwrite the new index writes,because 
when we builded the index updates cells previously, we set the timestamp of 
cells in PhoenixIndexCodec, so even there are new overlapped writes now, 
because the timestamp, it will not cause data overwrite or loss. 

> Index.preWALRestore should handle index write failure
> -----------------------------------------------------
>
>                 Key: PHOENIX-3964
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3964
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.10.0
>            Reporter: chenglei
>         Attachments: PHOENIX-3964_v1.patch
>
>
> When I restarted my hbase cluster a certain time, I noticed some regions are 
> in FAILED_OPEN state and the RegionServers have some error logs as following:
> {code:java}
> 2017-06-20 12:31:30,493 ERROR [RS_OPEN_REGION-rsync:60020-3] 
> handler.OpenRegionHandler: Failed open of 
> region=BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,0100134e-7ddf-4d13-ab77-6f0d683e5fad_0,1487594358223.57a7be72f9beaeb4285529ac6236f4e5.,
>  starting to roll back the global memstore size.
> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: 
> Failed to write to multiple index tables
>         at 
> org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:221)
>         at 
> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:185)
>         at 
> org.apache.phoenix.hbase.index.write.RecoveryIndexWriter.write(RecoveryIndexWriter.java:75)
>         at 
> org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:554)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1312)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1517)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1592)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1549)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1308)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3338)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3220)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:823)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:716)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:687)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4596)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4566)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4538)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4494)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4445)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:465)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:139)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> When I look the code of Index.preWALRestore method, I find 
> RecoveryIndexWriter.write method is used to write the indexUpdates in 
> following line 543: 
>    
> {code:java}
>  
> 526  public void preWALRestore(ObserverContext<RegionCoprocessorEnvironment> 
> env, HRegionInfo info,
> 527      HLogKey logKey, WALEdit logEdit) throws IOException {
> 528      if (this.disabled) {
> 529          super.preWALRestore(env, info, logKey, logEdit);
> 530          return;
> 531        }
> 532    // TODO check the regions in transition. If the server on which the 
> region lives is this one,
> 533    // then we should rety that write later in postOpen.
> 534    // we might be able to get even smarter here and pre-split the edits 
> that are server-local
> 535    // into their own recovered.edits file. This then lets us do a 
> straightforward recovery of each
> 536    // region (and more efficiently as we aren't writing quite as 
> hectically from this one place).
> 537
> 538    /*
> 539     * Basically, we let the index regions recover for a little while long 
> before retrying in the
> 540     * hopes they come up before the primary table finishes.
> 541     */
> 542    Collection<Pair<Mutation, byte[]>> indexUpdates = 
> extractIndexUpdate(logEdit);
> 543    recoveryWriter.write(indexUpdates, true);
> 544  }
> {code}
> but the RecoveryIndexWriter.write method is as following, it directly throws 
> Exception except non-existing tables, so RecoveryIndexWriter's 
> failurePolicy(which is StoreFailuresInCachePolicy by default) even has no 
> opportunity to be used,  and it leads to Index.failedIndexEdits which is 
> filled by the StoreFailuresInCachePolicy is always empty.
> {code:java}
>  public void write(Collection<Pair<Mutation, byte[]>> toWrite, boolean 
> allowLocalUpdates) throws IOException {
>         try {
>             write(resolveTableReferences(toWrite), allowLocalUpdates);
>         } catch (MultiIndexWriteFailureException e) {
>             for (HTableInterfaceReference table : e.getFailedTables()) {
>                 if (!admin.tableExists(table.getTableName())) {
>                     LOG.warn("Failure due to non existing table: " + 
> table.getTableName());
>                     nonExistingTablesList.add(table);
>                 } else {
>                     throw e;
>                 }
>             }
>         }
>     }
> {code}
> So the Index.postOpen method seems useless, because the updates Multimap in 
> following 500 line which is got from Index.failedIndexEdits is always empty.
> {code:java}
> 499  public void postOpen(final ObserverContext<RegionCoprocessorEnvironment> 
> c) {
> 500     Multimap<HTableInterfaceReference, Mutation> updates = 
> failedIndexEdits.getEdits(c.getEnvironment().getRegion());
> 501     
> 502     if (this.disabled) {
> 503         super.postOpen(c);
> 504         return;
> 505      }
> 506   
> 507     //if we have no pending edits to complete, then we are done
> 508     if (updates == null || updates.size() == 0) {
> 509       return;
> 510      }
> 511
> 512     LOG.info("Found some outstanding index updates that didn't succeed 
> during"
> 513            + " WAL replay - attempting to replay now.");
> 514    
> 515     // do the usual writer stuff, killing the server again, if we can't 
> manage to make the index
> 516     // writes succeed again
> 517     try {
> 518        writer.writeAndKillYourselfOnFailure(updates, true);
> 519     } catch (IOException e) {
> 520              LOG.error("During WAL replay of outstanding index updates, "
> 521                    + "Exception is thrown instead of killing server 
> during index writing", e);
> 522    }
> 523  }
> {code}
> So  I think in Index.preWALRestore method,  we should use 
> RecoveryWriter.writeAndKillYourselfOnFailure method to write the indexUpdates 
> and handle index write failure, not just use the RecoveryIndexWriter.write 
> method.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (PHOENIX-3964) Index.preWALRestore should handle index write failure

Reply via email to