[jira] [Updated] (HBASE-9721) meta assignment did not timeout

2013-12-17 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9721:
-

Attachment: hbase-9721_v0.patch

Here is a patch which adds ServerName to open and close region RPCs. The region 
server rejects the request if it is not the intended server to receive the 
request. 

Will try to come up with a unit test tomorrow. 

> meta assignment did not timeout
> ---
>
> Key: HBASE-9721
> URL: https://issues.apache.org/jira/browse/HBASE-9721
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: hbase-9721_v0.patch
>
>
> On a test cluster, this following events happened with ITBLL and CM leading 
> to meta being unavailable until master is restarted. 
> An RS carrying meta died, and master assigned the region to one of the RSs. 
> {code}
> 2013-10-03 23:30:06,611 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
> master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
> 2013-10-03 23:30:06,611 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
> master.RegionStates: Transitioned {1588230740 state=OFFLINE, 
> ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, 
> ts=1380843006611, 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820}
> 2013-10-03 23:30:06,611 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
> master.ServerManager: New admin connection to 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
> {code}
> At the same time, the RS that meta recently got assigned also died (due to 
> CM), and restarted: 
> {code}
> 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] 
> master.ServerManager: REPORT: Server 
> gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came 
> back up, removed it from the dead servers list
> 2013-10-03 23:30:08,769 INFO  [RpcServer.handler=18,port=6] 
> master.ServerManager: Triggering server recovery; existingServer 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks 
> stale, new 
> server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362
> 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] 
> master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
>  
> current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820,
>  matches=true
> 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] 
> master.ServerManager: 
> Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 
> to dead servers, submitted shutdown handler to be executed meta=true
> 2013-10-03 23:30:08,771 INFO  [RpcServer.handler=18,port=6] 
> master.ServerManager: Registering 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362
> 2013-10-03 23:30:08,772 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> handler.MetaServerShutdownHandler: Splitting hbase:meta logs for 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
> {code}
> AM/SSH sees that the RS that died was carrying meta, but the assignment RPC 
> request was still not sent:
> {code}
> 2013-10-03 23:30:08,791 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
>  
> current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820,
>  matches=true
> 2013-10-03 23:30:08,791 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> handler.MetaServerShutdownHandler: Server 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was 
> carrying META. Trying to assign.
> 2013-10-03 23:30:08,791 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, 
> expected state=OFFLINE/SPLITTING/MERGING
> 2013-10-03 23:30:08,791 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, 
> ts=1380843006611, 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820}
>  to {1588230740 state=OFFLINE, ts=1380843008791, server=null}
> 2013-10-03 23:30:09,809 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380

[jira] [Updated] (HBASE-9721) meta assignment did not timeout

2013-12-01 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-9721:
--

Affects Version/s: 0.98.0
   0.96.0
Fix Version/s: (was: 0.96.1)
   (was: 0.98.0)

> meta assignment did not timeout
> ---
>
> Key: HBASE-9721
> URL: https://issues.apache.org/jira/browse/HBASE-9721
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>
> On a test cluster, this following events happened with ITBLL and CM leading 
> to meta being unavailable until master is restarted. 
> An RS carrying meta died, and master assigned the region to one of the RSs. 
> {code}
> 2013-10-03 23:30:06,611 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
> master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
> 2013-10-03 23:30:06,611 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
> master.RegionStates: Transitioned {1588230740 state=OFFLINE, 
> ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, 
> ts=1380843006611, 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820}
> 2013-10-03 23:30:06,611 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
> master.ServerManager: New admin connection to 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
> {code}
> At the same time, the RS that meta recently got assigned also died (due to 
> CM), and restarted: 
> {code}
> 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] 
> master.ServerManager: REPORT: Server 
> gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came 
> back up, removed it from the dead servers list
> 2013-10-03 23:30:08,769 INFO  [RpcServer.handler=18,port=6] 
> master.ServerManager: Triggering server recovery; existingServer 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks 
> stale, new 
> server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362
> 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] 
> master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
>  
> current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820,
>  matches=true
> 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] 
> master.ServerManager: 
> Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 
> to dead servers, submitted shutdown handler to be executed meta=true
> 2013-10-03 23:30:08,771 INFO  [RpcServer.handler=18,port=6] 
> master.ServerManager: Registering 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362
> 2013-10-03 23:30:08,772 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> handler.MetaServerShutdownHandler: Splitting hbase:meta logs for 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
> {code}
> AM/SSH sees that the RS that died was carrying meta, but the assignment RPC 
> request was still not sent:
> {code}
> 2013-10-03 23:30:08,791 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
>  
> current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820,
>  matches=true
> 2013-10-03 23:30:08,791 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> handler.MetaServerShutdownHandler: Server 
> gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was 
> carrying META. Trying to assign.
> 2013-10-03 23:30:08,791 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, 
> expected state=OFFLINE/SPLITTING/MERGING
> 2013-10-03 23:30:08,791 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, 
> ts=1380843006611, 
> server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820}
>  to {1588230740 state=OFFLINE, ts=1380843008791, server=null}
> 2013-10-03 23:30:09,809 INFO  
> [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
> zookeeper.ZooKeeperNodeTracker: Unsetting hbase:meta region location in 
> ZooKeeper
> {code}
> Our first attempt at the assig