[ 
https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367611#comment-16367611
 ] 

stack commented on HBASE-20006:
-------------------------------

With patch in place, we make more progress. We do the below output:

2018-02-16 14:35:01,027 INFO  [PEWorker-15] 
procedure.MasterProcedureScheduler(571): pid=105, 
state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
table=testOnlineSnapshotAfterSplittingRegions-1518791689780, 
parent=034c0b19e0cdb4c5788c2d4172fd16d9, 
daughterA=b5355f606c3f6dae55367b082065b41c, 
daughterB=094cf44c1d0b3a294f42d2017fd99907, 
table=testOnlineSnapshotAfterSplittingRegions-1518791689780, 
testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9.
2018-02-16 14:35:01,027 INFO  [PEWorker-15] 
assignment.SplitTableRegionProcedure(439): Split of {ENCODED => 
034c0b19e0cdb4c5788c2d4172fd16d9, NAME => 
'testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9.',
 STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT

... but rather than failing we then move on to...

2018-02-16 14:35:01,031 INFO  [PEWorker-15] procedure2.ProcedureExecutor(1249): 
Finished pid=105, state=SUCCESS; SplitTableRegionProcedure 
table=testOnlineSnapshotAfterSplittingRegions-1518791689780, 
parent=034c0b19e0cdb4c5788c2d4172fd16d9, 
daughterA=b5355f606c3f6dae55367b082065b41c, 
daughterB=094cf44c1d0b3a294f42d2017fd99907 in 1.0180sec

... which is good in this case at least.


Now I'm on to a new failure type....


Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem 
reading HFile Trailer from file 
hdfs://localhost:55231/user/jenkins/test-data/fe7360bf-946e-44d4-8682-120eae0b7055/data/default/testOnlineSnapshotAfterSplittingRegions-1518791702651/1dda732469ff033fa21cc271586a80b5/cf/testOnlineSnapshotAfterSplittingRegions-1518791689780=034c0b19e0cdb4c5788c2d4172fd16d9-395104433d8d43e7b6710b6ec44d5b85.3cc16fba4ef7fb478d3eb1626a24a661
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545)
        at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:579)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileReader.<init>(StoreFileReader.java:104)
        at 
org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:108)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:267)
        at 
org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:352)
        at 
org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:460)
        at 
org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:668)
        at 
org.apache.hadoop.hbase.regionserver.HStore.lambda$openStoreFiles$0(HStore.java:535)
        ... 6 more
Caused by: java.lang.IllegalArgumentException
        at java.nio.Buffer.position(Buffer.java:244)
        at 
org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:401)
        at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:532)
        ... 14 more

The file name is crazy.


> TestRestoreSnapshotFromClientWithRegionReplicas is flakey
> ---------------------------------------------------------
>
>                 Key: HBASE-20006
>                 URL: https://issues.apache.org/jira/browse/HBASE-20006
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-20006.branch-2.001.patch
>
>
> Failing 10% of the time. Interestingly, it is below that causes fail. We go 
> to split but it is already split. We will then fail the split with an 
> internal assert which messes up procedures; at a minimum we should just not 
> split (this is in the prepare stage).
> {code}
> 2018-02-15 23:21:42,162 INFO  [PEWorker-12] 
> procedure.MasterProcedureScheduler(571): pid=105, 
> state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838, 
> parent=3f850cea7d71a7ebd019f2f009efca4d, 
> daughterA=06b5e6366efbef155d70e56cfdf58dc9, 
> daughterB=8c175de1b33765a5683ac1e502edb0bd, 
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838, 
> testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.
> 2018-02-15 23:21:42,162 INFO  [PEWorker-12] 
> assignment.SplitTableRegionProcedure(440): Split of {ENCODED => 
> 3f850cea7d71a7ebd019f2f009efca4d, NAME => 
> 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.',
>  STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT
> 2018-02-15 23:21:42,163 ERROR [PEWorker-12] 
> procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: 
> pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
> table=testOnlineSnapshotAfterSplittingRegions-1518736887838, 
> parent=3f850cea7d71a7ebd019f2f009efca4d, 
> daughterA=06b5e6366efbef155d70e56cfdf58dc9, 
> daughterB=8c175de1b33765a5683ac1e502edb0bd
> java.lang.AssertionError: split region should have an exception here
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to