[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-09-15 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14207:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to 0.98

> Region was hijacked and remained in transition when RS failed to open a 
> region and later regionplan changed to new RS on retry
> --
>
> Key: HBASE-14207
> URL: https://issues.apache.org/jira/browse/HBASE-14207
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.98.6
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 0.98.15
>
> Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98-V2.patch, 
> HBASE-14207-0.98.patch
>
>
> On production environment, following events happened
> 1. Master is trying to assign a region to RS, but due to 
> KeeperException$SessionExpiredException RS failed to open the region.
>   In RS log, saw multiple WARN log related to 
> KeeperException$SessionExpiredException 
>   > KeeperErrorCode = Session expired for 
> /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
>   > Unable to get data of znode 
> /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
> 2. Master retried to assign the region to same RS, but RS again failed.
> 3. On second retry new plan formed and this time plan destination (RS) is 
> different, so master send the request to new RS to open the region. But new 
> RS failed to open the region as there was server mismatch in ZNODE than the  
> expected current server name. 
> Logs Snippet:
> {noformat}
> HM
> 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
> 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
> 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
> {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
> server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
> ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
> region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
> server: T101PC03VM13,21302,1436816690692 | 
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
> 2015-07-14 03:50:29,800 | INFO  | 
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
> T101PC03VM13,21302,1436816690692 | 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:29,801 | WARN  | 
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
> T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
> of 10 | 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
> 2015-07-14 03:50:29,802 | INFO  | 
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
> the same failed server. | 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
> 2015-07-14 03:50:31,804 | INFO  | 
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
> T101PC03VM13,21302,1436816690692 | 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:31,806 | WARN  | 
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
> T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
> of 10 | 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
> 2015-07-14 03:50:31,807 | INFO  | 
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
> {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
> server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
> state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:50:31,807 | INFO  | 
> 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14207:
---
Attachment: HBASE-14207-0.98-V2.patch

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.15

 Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98-V2.patch, 
 HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14207:
---
Status: Patch Available  (was: Open)

Reattach to kick HadoopQA

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.15

 Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98-V2.patch, 
 HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14207:
---
Status: Open  (was: Patch Available)

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.15

 Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98-V2.patch, 
 HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14207:
--
Status: Patch Available  (was: Open)

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.15

 Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-14 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-14207:
-
Attachment: HBASE-14207-0.98-V2.patch

Modified patch with below changes
{code}
if (useZKForAssignment) {
 setOfflineInZK = true;
}
{code}

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.15

 Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-12 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14207:
---
Fix Version/s: (was: 0.98.14)
   Status: Open  (was: Patch Available)

bq. org.apache.hadoop.hbase.master.TestZKLessAMOnCluster

This looks like a relevant test failure.

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.15

 Attachments: HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-12 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-14207:
-
Attachment: HBASE-14207-0.98.patch

Attached patch for  0.98, but I think this bug may exist with ZK assignment.

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Attachments: HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-12 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-14207:
-
Fix Version/s: 0.98.15
   0.98.14
   Status: Patch Available  (was: Open)

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical
 Fix For: 0.98.14, 0.98.15

 Attachments: HBASE-14207-0.98.patch


 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-11 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-14207:
-
Affects Version/s: 0.98.6

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.98.6
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical

 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,807 | INFO  | 
 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-11 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-14207:
-
Priority: Critical  (was: Major)

 Region was hijacked and remained in transition when RS failed to open a 
 region and later regionplan changed to new RS on retry
 --

 Key: HBASE-14207
 URL: https://issues.apache.org/jira/browse/HBASE-14207
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Critical

 On production environment, following events happened
 1. Master is trying to assign a region to RS, but due to 
 KeeperException$SessionExpiredException RS failed to open the region.
   In RS log, saw multiple WARN log related to 
 KeeperException$SessionExpiredException 
KeeperErrorCode = Session expired for 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
Unable to get data of znode 
 /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 2. Master retried to assign the region to same RS, but RS again failed.
 3. On second retry new plan formed and this time plan destination (RS) is 
 different, so master send the request to new RS to open the region. But new 
 RS failed to open the region as there was server mismatch in ZNODE than the  
 expected current server name. 
 Logs Snippet:
 {noformat}
 HM
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
 2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, 
 server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
 ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed 
 region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on 
 server: T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
 2015-07-14 03:50:29,800 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:29,801 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:29,802 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 the same failed server. | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
 2015-07-14 03:50:31,804 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,806 | WARN  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 
 of 10 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned 
 {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, 
 server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b 
 state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | 
 org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning 
 INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
 T101PC03VM14,21302,1436816997967 | 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
 2015-07-14 03:50:31,807 | INFO  | 
 MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | 

[jira] [Updated] (HBASE-14207) Region was hijacked and remained in transition when RS failed to open a region and later regionplan changed to new RS on retry

2015-08-11 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-14207:
-
Description: 
On production environment, following events happened
1. Master is trying to assign a region to RS, but due to 
KeeperException$SessionExpiredException RS failed to open the region.
In RS log, saw multiple WARN log related to 
KeeperException$SessionExpiredException 
 KeeperErrorCode = Session expired for 
/hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
 Unable to get data of znode 
/hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
2. Master retried to assign the region to same RS, but RS again failed.
3. On second retry new plan formed and this time plan destination (RS) is 
different, so master send the request to new RS to open the region. But new RS 
failed to open the region as there was server mismatch in ZNODE than the  
expected current server name. 

Logs Snippet:

{noformat}
HM

2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Processing 
08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | 
org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
2015-07-14 03:50:29,759 | INFO  | master:T101PC03VM13:21300 | Transitioned 
{08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, server=null} 
to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817029759, 
server=T101PC03VM13,21302,1436816690692} | 
org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
2015-07-14 03:50:29,760 | INFO  | master:T101PC03VM13:21300 | Processed region 
08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on server: 
T101PC03VM13,21302,1436816690692 | 
org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
2015-07-14 03:50:29,800 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Assigning 
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
T101PC03VM13,21302,1436816690692 | 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
2015-07-14 03:50:29,801 | WARN  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Failed assignment of 
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 of 
10 | 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
2015-07-14 03:50:29,802 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Trying to re-assign 
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
the same failed server. | 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
2015-07-14 03:50:31,804 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Assigning 
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
T101PC03VM13,21302,1436816690692 | 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
2015-07-14 03:50:31,806 | WARN  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Failed assignment of 
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 of 
10 | 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
2015-07-14 03:50:31,807 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Transitioned {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, 
ts=1436817031804, server=T101PC03VM13,21302,1436816690692} to 
{08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817031807, 
server=T101PC03VM13,21302,1436816690692} | 
org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
2015-07-14 03:50:31,807 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Assigning 
INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to 
T101PC03VM14,21302,1436816997967 | 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
2015-07-14 03:50:31,807 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 
| Transitioned {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, 
ts=1436817031807, server=T101PC03VM13,21302,1436816690692} to 
{08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031807, 
server=T101PC03VM14,21302,1436816997967} | 
org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
2015-07-14 03:51:09,501 | INFO  | MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-4 
| Skip assigning region in transition on other 
server{08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031807, 
server=T101PC03VM14,21302,1436816997967} |