[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367611#comment-16367611 ]
stack commented on HBASE-20006: ------------------------------- With patch in place, we make more progress. We do the below output: 2018-02-16 14:35:01,027 INFO [PEWorker-15] procedure.MasterProcedureScheduler(571): pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure table=testOnlineSnapshotAfterSplittingRegions-1518791689780, parent=034c0b19e0cdb4c5788c2d4172fd16d9, daughterA=b5355f606c3f6dae55367b082065b41c, daughterB=094cf44c1d0b3a294f42d2017fd99907, table=testOnlineSnapshotAfterSplittingRegions-1518791689780, testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9. 2018-02-16 14:35:01,027 INFO [PEWorker-15] assignment.SplitTableRegionProcedure(439): Split of {ENCODED => 034c0b19e0cdb4c5788c2d4172fd16d9, NAME => 'testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9.', STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT ... but rather than failing we then move on to... 2018-02-16 14:35:01,031 INFO [PEWorker-15] procedure2.ProcedureExecutor(1249): Finished pid=105, state=SUCCESS; SplitTableRegionProcedure table=testOnlineSnapshotAfterSplittingRegions-1518791689780, parent=034c0b19e0cdb4c5788c2d4172fd16d9, daughterA=b5355f606c3f6dae55367b082065b41c, daughterB=094cf44c1d0b3a294f42d2017fd99907 in 1.0180sec ... which is good in this case at least. Now I'm on to a new failure type.... Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://localhost:55231/user/jenkins/test-data/fe7360bf-946e-44d4-8682-120eae0b7055/data/default/testOnlineSnapshotAfterSplittingRegions-1518791702651/1dda732469ff033fa21cc271586a80b5/cf/testOnlineSnapshotAfterSplittingRegions-1518791689780=034c0b19e0cdb4c5788c2d4172fd16d9-395104433d8d43e7b6710b6ec44d5b85.3cc16fba4ef7fb478d3eb1626a24a661 at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:579) at org.apache.hadoop.hbase.regionserver.StoreFileReader.<init>(StoreFileReader.java:104) at org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:108) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:267) at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:352) at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:460) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:668) at org.apache.hadoop.hbase.regionserver.HStore.lambda$openStoreFiles$0(HStore.java:535) ... 6 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:401) at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:532) ... 14 more The file name is crazy. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > --------------------------------------------------------- > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task > Reporter: stack > Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)