[jira] [Commented] (PHOENIX-3964) Index.preWALRestore should handle index write failure

Hudson (JIRA) Fri, 30 Jun 2017 04:15:03 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069927#comment-16069927
 ]


Hudson commented on PHOENIX-3964:
---------------------------------

FAILURE: Integrated in Jenkins build Phoenix-master #1671 (See 
[https://builds.apache.org/job/Phoenix-master/1671/])
PHOENIX-3964 Index.preWALRestore should handle index write failure (chenglei: 
rev 5c43b347a4cd2c3c38b28bee2983e716460e50db)
* (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/Indexer.java


> Index.preWALRestore should handle index write failure
> -----------------------------------------------------
>
>                 Key: PHOENIX-3964
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3964
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.10.0
>            Reporter: chenglei
>            Assignee: chenglei
>             Fix For: 4.12.0, 4.11.1
>
>         Attachments: PHOENIX-3964_v1.patch
>
>
> When I restarted my hbase cluster a certain time, I noticed some regions are 
> in FAILED_OPEN state and the RegionServers have some error logs as following:
> {code:java}
> 2017-06-20 12:31:30,493 ERROR [RS_OPEN_REGION-rsync:60020-3] 
> handler.OpenRegionHandler: Failed open of 
> region=BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,0100134e-7ddf-4d13-ab77-6f0d683e5fad_0,1487594358223.57a7be72f9beaeb4285529ac6236f4e5.,
>  starting to roll back the global memstore size.
> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: 
> Failed to write to multiple index tables
>         at 
> org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:221)
>         at 
> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:185)
>         at 
> org.apache.phoenix.hbase.index.write.RecoveryIndexWriter.write(RecoveryIndexWriter.java:75)
>         at 
> org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:554)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1312)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1517)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1592)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1549)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1308)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3338)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3220)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:823)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:716)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:687)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4596)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4566)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4538)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4494)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4445)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:465)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:139)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> When I look the code of Index.preWALRestore method, I find 
> RecoveryIndexWriter.write method is used to write the indexUpdates in 
> following line 543: 
>    
> {code:java}
>  
> 526  public void preWALRestore(ObserverContext<RegionCoprocessorEnvironment> 
> env, HRegionInfo info,
> 527      HLogKey logKey, WALEdit logEdit) throws IOException {
> 528      if (this.disabled) {
> 529          super.preWALRestore(env, info, logKey, logEdit);
> 530          return;
> 531        }
> 532    // TODO check the regions in transition. If the server on which the 
> region lives is this one,
> 533    // then we should rety that write later in postOpen.
> 534    // we might be able to get even smarter here and pre-split the edits 
> that are server-local
> 535    // into their own recovered.edits file. This then lets us do a 
> straightforward recovery of each
> 536    // region (and more efficiently as we aren't writing quite as 
> hectically from this one place).
> 537
> 538    /*
> 539     * Basically, we let the index regions recover for a little while long 
> before retrying in the
> 540     * hopes they come up before the primary table finishes.
> 541     */
> 542    Collection<Pair<Mutation, byte[]>> indexUpdates = 
> extractIndexUpdate(logEdit);
> 543    recoveryWriter.write(indexUpdates, true);
> 544  }
> {code}
> but the RecoveryIndexWriter.write method is as following, it directly throws 
> Exception except non-existing tables, so RecoveryIndexWriter's 
> failurePolicy(which is StoreFailuresInCachePolicy by default) even has no 
> opportunity to be used,  and it leads to Index.failedIndexEdits which is 
> filled by the StoreFailuresInCachePolicy is always empty.
> {code:java}
>  public void write(Collection<Pair<Mutation, byte[]>> toWrite, boolean 
> allowLocalUpdates) throws IOException {
>         try {
>             write(resolveTableReferences(toWrite), allowLocalUpdates);
>         } catch (MultiIndexWriteFailureException e) {
>             for (HTableInterfaceReference table : e.getFailedTables()) {
>                 if (!admin.tableExists(table.getTableName())) {
>                     LOG.warn("Failure due to non existing table: " + 
> table.getTableName());
>                     nonExistingTablesList.add(table);
>                 } else {
>                     throw e;
>                 }
>             }
>         }
>     }
> {code}
> So the Index.postOpen method seems useless, because the updates Multimap in 
> following 500 line which is got from Index.failedIndexEdits is always empty.
> {code:java}
> 499  public void postOpen(final ObserverContext<RegionCoprocessorEnvironment> 
> c) {
> 500     Multimap<HTableInterfaceReference, Mutation> updates = 
> failedIndexEdits.getEdits(c.getEnvironment().getRegion());
> 501     
> 502     if (this.disabled) {
> 503         super.postOpen(c);
> 504         return;
> 505      }
> 506   
> 507     //if we have no pending edits to complete, then we are done
> 508     if (updates == null || updates.size() == 0) {
> 509       return;
> 510      }
> 511
> 512     LOG.info("Found some outstanding index updates that didn't succeed 
> during"
> 513            + " WAL replay - attempting to replay now.");
> 514    
> 515     // do the usual writer stuff, killing the server again, if we can't 
> manage to make the index
> 516     // writes succeed again
> 517     try {
> 518        writer.writeAndKillYourselfOnFailure(updates, true);
> 519     } catch (IOException e) {
> 520              LOG.error("During WAL replay of outstanding index updates, "
> 521                    + "Exception is thrown instead of killing server 
> during index writing", e);
> 522    }
> 523  }
> {code}
> So  I think in Index.preWALRestore method,  we should use 
> RecoveryWriter.writeAndKillYourselfOnFailure method to write the indexUpdates 
> and handle index write failure, not just use the RecoveryIndexWriter.write 
> method.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-3964) Index.preWALRestore should handle index write failure

Reply via email to