[jira] [Comment Edited] (PHOENIX-7049) WALRecoveryRegionPostOpenIT very flakey on 5.2 with Hbase 2.5

Istvan Toth (Jira) Thu, 05 Oct 2023 07:45:05 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772113#comment-17772113
 ]


Istvan Toth edited comment on PHOENIX-7049 at 10/5/23 2:44 PM:
---------------------------------------------------------------

In the PASSING case, I see:
{noformat}
2023-10-02 11:10:50,404 WARN  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=41919] 
org.apache.hadoop.hbase.master.MasterRpcServices(605): 
faa1167abeb6,46489,1696270207947 reported a fatal error:
***** ABORTING region server faa1167abeb6,46489,1696270207947: Simulated kill 
*****
2023-10-02 11:10:50,407 INFO  [Listener at localhost/39651] 
org.apache.hadoop.hbase.regionserver.HRegionServer(2428): ***** STOPPING region 
server 'faa1167abeb6,46489,1696270207947' *****
2023-10-02 11:10:50,407 INFO  [Listener at localhost/39651] 
org.apache.hadoop.hbase.regionserver.HRegionServer(2442): STOPPED: Simulated 
kill{noformat}
While in the FAILING case, I see:
{noformat}
2023-10-04 22:54:56,160 INFO  [Listener at localhost/45997] 
org.apache.hadoop.hbase.MiniHBaseCluster(272): Killing 
4a918263d7d2,37633,1696485262361
2023-10-04 22:54:56,166 ERROR [Listener at localhost/45997] 
org.slf4j.helpers.MarkerIgnoringBase(143): ***** ABORTING region server 
4a918263d7d2,37633,1696485262361: Simulated kill *****
2023-10-04 22:54:56,166 ERROR [Listener at localhost/45997] 
org.slf4j.helpers.MarkerIgnoringBase(143): RegionServer abort: loaded 
coprocessors are: <deleted>
2023-10-04 22:57:59,923 WARN  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=46399] 
org.apache.hadoop.hbase.master.MasterRpcServices(605): 
4a918263d7d2,37195,1696485262252 reported a fatal error:
***** ABORTING region server 4a918263d7d2,37195,1696485262252: The coprocessor 
org.apache.phoenix.hbase.index.Indexer threw 
org.apache.phoenix.hbase.index.builder.FatalIndexBuildingFailureException: 
Could not update the index table, killing server region because couldn't write 
to an index table *****
Cause:
org.apache.phoenix.hbase.index.builder.FatalIndexBuildingFailureException: 
Could not update the index table, killing server region because couldn't write 
to an index table
    at 
org.apache.phoenix.hbase.index.write.KillServerOnFailurePolicy.handleFailure(KillServerOnFailurePolicy.java:68){noformat}
So in the failing case, we kill one RS from the test, but the then the indexer 
kills the other one. While in the passing case, only one RS is killed.


was (Author: stoty):
In the PASSING case, I see:
{noformat}
2023-10-02 11:10:50,404 WARN  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=41919] 
org.apache.hadoop.hbase.master.MasterRpcServices(605): 
faa1167abeb6,46489,1696270207947 reported a fatal error:
***** ABORTING region server faa1167abeb6,46489,1696270207947: Simulated kill 
*****
2023-10-02 11:10:50,407 INFO  [Listener at localhost/39651] 
org.apache.hadoop.hbase.regionserver.HRegionServer(2428): ***** STOPPING region 
server 'faa1167abeb6,46489,1696270207947' *****
2023-10-02 11:10:50,407 INFO  [Listener at localhost/39651] 
org.apache.hadoop.hbase.regionserver.HRegionServer(2442): STOPPED: Simulated 
kill{noformat}
While in the FAILING case, I see:
{noformat}
2023-10-04 22:54:56,160 INFO  [Listener at localhost/45997] 
org.apache.hadoop.hbase.MiniHBaseCluster(272): Killing 
4a918263d7d2,37633,1696485262361
2023-10-04 22:54:56,166 ERROR [Listener at localhost/45997] 
org.slf4j.helpers.MarkerIgnoringBase(143): ***** ABORTING region server 
4a918263d7d2,37633,1696485262361: Simulated kill *****
2023-10-04 22:54:56,166 ERROR [Listener at localhost/45997] 
org.slf4j.helpers.MarkerIgnoringBase(143): RegionServer abort: loaded 
coprocessors are: <deleted>
2023-10-04 22:57:59,923 WARN  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=46399] 
org.apache.hadoop.hbase.master.MasterRpcServices(605): 
4a918263d7d2,37195,1696485262252 reported a fatal error:
***** ABORTING region server 4a918263d7d2,37195,1696485262252: The coprocessor 
org.apache.phoenix.hbase.index.Indexer threw 
org.apache.phoenix.hbase.index.builder.FatalIndexBuildingFailureException: 
Could not update the index table, killing server region because couldn't write 
to an index table *****
Cause:
org.apache.phoenix.hbase.index.builder.FatalIndexBuildingFailureException: 
Could not update the index table, killing server region because couldn't write 
to an index table
    at 
org.apache.phoenix.hbase.index.write.KillServerOnFailurePolicy.handleFailure(KillServerOnFailurePolicy.java:68){noformat}
 

> WALRecoveryRegionPostOpenIT very flakey on 5.2 with Hbase 2.5
> -------------------------------------------------------------
>
>                 Key: PHOENIX-7049
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7049
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Istvan Toth
>            Assignee: Istvan Toth
>            Priority: Major
>              Labels: test
>         Attachments: 
> PASS-org.apache.hadoop.hbase.regionserver.wal.WALRecoveryRegionPostOpenIT-output.txt.gz,
>  
> org.apache.hadoop.hbase.regionserver.wal.WALRecoveryRegionPostOpenIT-output.txt.gz,
>  org.apache.hadoop.hbase.regionserver.wal.WALRecoveryRegionPostOpenIT.txt.gz
>
>
> {noformat}
> org.apache.hadoop.hbase.regionserver.wal.WALRecoveryRegionPostOpenIT.testRecoveryRegionPostOpen
>   Time elapsed: 265.264 s  <<< FAILURE!
> java.lang.AssertionError
>     at org.junit.Assert.fail(Assert.java:87)
>     at org.junit.Assert.assertTrue(Assert.java:42)
>     at org.junit.Assert.assertTrue(Assert.java:53)
>     at 
> org.apache.hadoop.hbase.regionserver.wal.WALRecoveryRegionPostOpenIT.testRecoveryRegionPostOpen(WALRecoveryRegionPostOpenIT.java:245){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (PHOENIX-7049) WALRecoveryRegionPostOpenIT very flakey on 5.2 with Hbase 2.5

Reply via email to