[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-10 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Fix Version/s: 0.94.5

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-10 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-94.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-10 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7504-94.patch, 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-trunk v1.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning -ROOT- region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.-ROO- is offline now, and won't be assigned any more unless we restart 
 master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning -ROOT-
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-trunk v1.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning -ROOT- region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.-ROO- is offline now, and won't be assigned any more unless we restart 
 master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning -ROOT-
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: (was: 7504-trunk v1.patch)

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning -ROOT- region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.-ROO- is offline now, and won't be assigned any more unless we restart 
 master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning -ROOT-
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Description: 
1.FullGC happen on ROOT regionserver.
2.ZK session timeout, master expire the regionserver and submit to 
ServerShutdownHandler
3.Regionserver complete the FullGC
4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true
5.ServerShutdownHandler skip assigning ROOT region
6.Regionserver abort itself because it reveive YouAreDeadException after a 
regionserver report
7.ROOT is offline now, and won't be assigned any more unless we restart master



Master Log:
{code}
2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown 
handler to be executed, root=true, meta=false
2012-10-31 19:51:39,045 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for dw88.kgb.sqa.cm4,60020,1351671478752
2012-10-31 19:51:50,113 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Server REPORT rejected; currently processing 
dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
2012-10-31 19:52:15,945 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
splitting for dw88.kgb.sqa.cm4,60020,1351671478752
{code}

No log of assigning ROOT

Regionserver log:
{code}
2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
229128ms instead of 10ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
{code}




  was:
1.FullGC happen on ROOT regionserver.
2.ZK session timeout, master expire the regionserver and submit to 
ServerShutdownHandler
3.Regionserver complete the FullGC
4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true
5.ServerShutdownHandler skip assigning -ROOT- region
6.Regionserver abort itself because it reveive YouAreDeadException after a 
regionserver report
7.-ROO- is offline now, and won't be assigned any more unless we restart master



Master Log:
{code}
2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown 
handler to be executed, root=true, meta=false
2012-10-31 19:51:39,045 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for dw88.kgb.sqa.cm4,60020,1351671478752
2012-10-31 19:51:50,113 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Server REPORT rejected; currently processing 
dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
2012-10-31 19:52:15,945 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
splitting for dw88.kgb.sqa.cm4,60020,1351671478752
{code}

No log of assigning -ROOT-

Regionserver log:
{code}
2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
229128ms instead of 10ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
{code}





 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. 

[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-trunk v2.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning ROOT region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.ROOT is offline now, and won't be assigned any more unless we restart master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning ROOT
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira