[ 
https://issues.apache.org/jira/browse/HBASE-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated HBASE-13061:
-------------------------------------
    Description: 
Got failed test in HBASE-13017. It seems that with zk nodes were ordered in one 
way and test didn't trigger error, but with new meta rows ordered differently 
test became flakey.

That leads to interesting sequence of offline/online regions and triggers bug 
and NPE in AM (thats seen in TestZKLessAMOnCluster)

That can happen if region was moved from RS1 to other region server RS2, and 
thats happens that RS2 failed. Region remains in PENDING_OPEN. SSH will offline 
it from RS1(without removing from oldAssignments because of disabled table). 
When AssingnmentManager come and assign region it then removes oldAssignment of 
region from serverHoldings. And thats happen to be our just assigned RS1.

Small bit of logs. Most interesting are last 3 lines, region 
b73fe9f1185361e846b0e1ceb7d6d64e added to server and immediately removed from 
it. Later that triggers NPE in disable table handler.
{code}
2015-02-18 01:21:18,338 INFO  [Thread-436] master.RegionStates(1109): 
Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=PENDING_OPEN, 
ts=1424222478324, server=octobook.home,65370,1424222474885} to 
{b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, 
server=octobook.home,65370,1424222474885}
2015-02-18 01:21:18,339 INFO  [Thread-436] master.RegionStateStore(218): 
Updating row 
testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.
 with state=OFFLINE
2015-02-18 01:21:18,340 DEBUG [Thread-436] master.RegionStates(591): Old server 
name for {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 
'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
 STARTKEY => 'I', ENDKEY => 'Q'} is null
2015-02-18 01:21:18,340 INFO  [Thread-436] master.RegionStates(1109): 
Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, 
server=octobook.home,65370,1424222474885} to {b73fe9f1185361e846b0e1ceb7d6d64e 
state=OPEN, ts=1424222478340, server=octobook.home,65359,1424222474743}
2015-02-18 01:21:18,341 INFO  [Thread-436] master.RegionStateStore(218): 
Updating row 
testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.
 with state=OPEN&sn=octobook.home,65359,1424222474743
2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(457): Onlined 
b73fe9f1185361e846b0e1ceb7d6d64e on octobook.home,65359,1424222474743 {ENCODED 
=> b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 
'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
 STARTKEY => 'I', ENDKEY => 'Q'}
2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(481): Adding  
b73fe9f1185361e846b0e1ceb7d6d64e to server octobook.home,65359,1424222474743
2015-02-18 01:21:18,342 INFO  [Thread-436] master.RegionStates(467): Offlined 
b73fe9f1185361e846b0e1ceb7d6d64e from octobook.home,65359,1424222474743
2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(496): Removing 
b73fe9f1185361e846b0e1ceb7d6d64e from server octobook.home,65359,1424222474743
{code}

  was:
Got failed test in HBASE-13017. It seems that with zk nodes were ordered in one 
way and test didn't trigger error, but with new meta rows ordered differently 
test became flakey.

That leads to interesting sequence of offline/online regions and triggers bug 
and NPE in AM (thats seen in TestZKLessAMOnCluster)

That can happen if region was moved from RS1 to other region server RS2, and 
thats happens that RS2 failed. Region remains in PENDING_OPEN. SSH will offline 
it from RS1(without removing from oldAssignments because of disabled table). 
When AssingnmentManager come and assign region it then removes oldAssignment of 
region from serverHoldings. And thats happen to be our just assigned RS1.

Small bit of logs.
{code}
2015-02-18 01:21:18,338 INFO  [Thread-436] master.RegionStates(1109): 
Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=PENDING_OPEN, 
ts=1424222478324, server=octobook.home,65370,1424222474885} to 
{b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, 
server=octobook.home,65370,1424222474885}
2015-02-18 01:21:18,339 INFO  [Thread-436] master.RegionStateStore(218): 
Updating row 
testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.
 with state=OFFLINE
2015-02-18 01:21:18,340 DEBUG [Thread-436] master.RegionStates(591): Old server 
name for {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 
'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
 STARTKEY => 'I', ENDKEY => 'Q'} is null
2015-02-18 01:21:18,340 INFO  [Thread-436] master.RegionStates(1109): 
Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, 
server=octobook.home,65370,1424222474885} to {b73fe9f1185361e846b0e1ceb7d6d64e 
state=OPEN, ts=1424222478340, server=octobook.home,65359,1424222474743}
2015-02-18 01:21:18,341 INFO  [Thread-436] master.RegionStateStore(218): 
Updating row 
testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.
 with state=OPEN&sn=octobook.home,65359,1424222474743
2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(457): Onlined 
b73fe9f1185361e846b0e1ceb7d6d64e on octobook.home,65359,1424222474743 {ENCODED 
=> b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 
'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
 STARTKEY => 'I', ENDKEY => 'Q'}
2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(481): Adding  
b73fe9f1185361e846b0e1ceb7d6d64e to server octobook.home,65359,1424222474743
2015-02-18 01:21:18,342 INFO  [Thread-436] master.RegionStates(467): Offlined 
b73fe9f1185361e846b0e1ceb7d6d64e from octobook.home,65359,1424222474743
2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(496): Removing 
b73fe9f1185361e846b0e1ceb7d6d64e from server octobook.home,65359,1424222474743
2015-02-18 01:21:18,347 INFO  [Thread-436] hbase.MetaTableAccessor(1437): 
Updated table testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState state 
to DISABLED in META
{code}


> RegionStates can remove wrong region from server holdings
> ---------------------------------------------------------
>
>                 Key: HBASE-13061
>                 URL: https://issues.apache.org/jira/browse/HBASE-13061
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Andrey Stepachev
>            Assignee: Andrey Stepachev
>         Attachments: HBASE-13061.patch
>
>
> Got failed test in HBASE-13017. It seems that with zk nodes were ordered in 
> one way and test didn't trigger error, but with new meta rows ordered 
> differently test became flakey.
> That leads to interesting sequence of offline/online regions and triggers bug 
> and NPE in AM (thats seen in TestZKLessAMOnCluster)
> That can happen if region was moved from RS1 to other region server RS2, and 
> thats happens that RS2 failed. Region remains in PENDING_OPEN. SSH will 
> offline it from RS1(without removing from oldAssignments because of disabled 
> table). When AssingnmentManager come and assign region it then removes 
> oldAssignment of region from serverHoldings. And thats happen to be our just 
> assigned RS1.
> Small bit of logs. Most interesting are last 3 lines, region 
> b73fe9f1185361e846b0e1ceb7d6d64e added to server and immediately removed from 
> it. Later that triggers NPE in disable table handler.
> {code}
> 2015-02-18 01:21:18,338 INFO  [Thread-436] master.RegionStates(1109): 
> Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=PENDING_OPEN, 
> ts=1424222478324, server=octobook.home,65370,1424222474885} to 
> {b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, 
> server=octobook.home,65370,1424222474885}
> 2015-02-18 01:21:18,339 INFO  [Thread-436] master.RegionStateStore(218): 
> Updating row 
> testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.
>  with state=OFFLINE
> 2015-02-18 01:21:18,340 DEBUG [Thread-436] master.RegionStates(591): Old 
> server name for {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 
> 'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
>  STARTKEY => 'I', ENDKEY => 'Q'} is null
> 2015-02-18 01:21:18,340 INFO  [Thread-436] master.RegionStates(1109): 
> Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, 
> server=octobook.home,65370,1424222474885} to 
> {b73fe9f1185361e846b0e1ceb7d6d64e state=OPEN, ts=1424222478340, 
> server=octobook.home,65359,1424222474743}
> 2015-02-18 01:21:18,341 INFO  [Thread-436] master.RegionStateStore(218): 
> Updating row 
> testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.
>  with state=OPEN&sn=octobook.home,65359,1424222474743
> 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(457): Onlined 
> b73fe9f1185361e846b0e1ceb7d6d64e on octobook.home,65359,1424222474743 
> {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 
> 'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
>  STARTKEY => 'I', ENDKEY => 'Q'}
> 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(481): Adding  
> b73fe9f1185361e846b0e1ceb7d6d64e to server octobook.home,65359,1424222474743
> 2015-02-18 01:21:18,342 INFO  [Thread-436] master.RegionStates(467): Offlined 
> b73fe9f1185361e846b0e1ceb7d6d64e from octobook.home,65359,1424222474743
> 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(496): Removing 
> b73fe9f1185361e846b0e1ceb7d6d64e from server octobook.home,65359,1424222474743
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to