[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367958#comment-16367958 ]
stack commented on HBASE-20006: ------------------------------- Ok, this is some read replica mess. I don't want to work on this figuring out this filenaming done for read replicas. Will let it to a read replicas person -- if any around. And I don't want this messing up our test runs. So for now disabling this test. Other exceptions seen are: {code} java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /user/jenkins/test-data/463e63dc-23bb-44ff-a32c-033c390552a6/data/default/testRestoreSnapshotAfterSplittingRegions-1518810548820/1c8eb80ac0831f0f27074b953eb647bb/cf/testRestoreSnapshotAfterSplittingRegions-1518810548820=1c8eb80ac0831f0f27074b953eb647bb-bfe5320da17b47e4b1553a14bacbc532 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1836) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1808) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1723) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:366) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1040) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:903) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:871) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7017) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6974) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6945) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6901) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6852) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} .. then this makes for failed assigns. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > --------------------------------------------------------- > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task > Reporter: stack > Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)