[ 
https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367958#comment-16367958
 ] 

stack commented on HBASE-20006:
-------------------------------

Ok, this is some read replica mess. I don't want to work on this figuring out 
this filenaming done for read replicas. Will let it to a read replicas person 
-- if any around. And I don't want this messing up our test runs. So for now 
disabling this test.

Other exceptions seen are:

{code}
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File 
does not exist: 
/user/jenkins/test-data/463e63dc-23bb-44ff-a32c-033c390552a6/data/default/testRestoreSnapshotAfterSplittingRegions-1518810548820/1c8eb80ac0831f0f27074b953eb647bb/cf/testRestoreSnapshotAfterSplittingRegions-1518810548820=1c8eb80ac0831f0f27074b953eb647bb-bfe5320da17b47e4b1553a14bacbc532
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1836)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1808)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1723)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
  at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:366)
  at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)

  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1040)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:903)
  at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:871)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7017)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6974)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6945)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6901)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6852)
  at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
  at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109)
  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
{code}

.. then this makes for failed assigns.

> TestRestoreSnapshotFromClientWithRegionReplicas is flakey
> ---------------------------------------------------------
>
>                 Key: HBASE-20006
>                 URL: https://issues.apache.org/jira/browse/HBASE-20006
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: HBASE-20006.branch-2.001.patch
>
>
> Failing 10% of the time. Interestingly, it is below that causes fail. We go 
> to split but it is already split. We will then fail the split with an 
> internal assert which messes up procedures; at a minimum we should just not 
> split (this is in the prepare stage).
> {code}
> 2018-02-15 23:21:42,162 INFO  [PEWorker-12] 
> procedure.MasterProcedureScheduler(571): pid=105, 
> state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838, 
> parent=3f850cea7d71a7ebd019f2f009efca4d, 
> daughterA=06b5e6366efbef155d70e56cfdf58dc9, 
> daughterB=8c175de1b33765a5683ac1e502edb0bd, 
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838, 
> testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.
> 2018-02-15 23:21:42,162 INFO  [PEWorker-12] 
> assignment.SplitTableRegionProcedure(440): Split of {ENCODED => 
> 3f850cea7d71a7ebd019f2f009efca4d, NAME => 
> 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.',
>  STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT
> 2018-02-15 23:21:42,163 ERROR [PEWorker-12] 
> procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: 
> pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838, 
> parent=3f850cea7d71a7ebd019f2f009efca4d, 
> daughterA=06b5e6366efbef155d70e56cfdf58dc9, 
> daughterB=8c175de1b33765a5683ac1e502edb0bd
> java.lang.AssertionError: split region should have an exception here
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to