I think there is a problem in 0.90.6.  Rolling restart seems broke.

Mistakenly I had previous RC out on cluster and had only updated the master.

My cluster would not start.  The master would assign out -ROOT- but it
would fail to open on the regionserver with this:

2012-02-27 20:16:09,559 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Processing open of -ROOT-,,0.70236052
2012-02-27 20:16:09,561 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x135c07495b70002 Attempting to transition node
70236052/-ROOT- from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x135c07495b70002 Attempt to transition the
unassigned node for 70236052 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING failed, the node existed but was in the state
M_SERVER_SHUTDOWN set by the server sv4r11s38:7001
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
transition from OFFLINE to OPENING for region=70236052
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Region
was hijacked? It no longer exists, encodedName=70236052

See how its thinking a state of M_ZK_REGION_OFFLINE is actually
M_SERVER_SHUTDOWN?

This seems to be because of this commit:

------------------------------------------------------------------------
r1244137 | tedyu | 2012-02-14 09:54:23 -0800 (Tue, 14 Feb 2012) | 3 lines

HBASE-5379  Backport HBASE-4287 to 0.90 - If region opening fails, try
to transition region back to
               "offline" in ZK (Ram)


It does this:

Index: src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java    
(revision
1090348)
+++ src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java    (working
copy)
@@ -107,6 +107,7 @@
     RS_ZK_REGION_CLOSED       (2),   // RS has finished closing a region
     RS_ZK_REGION_OPENING      (3),   // RS is in process of opening a region
     RS_ZK_REGION_OPENED       (4),   // RS has finished opening a region
+    RS_ZK_REGION_FAILED_OPEN  (5),   // RS failed to open a region

     // Messages originating from Master to RS
     M_RS_OPEN_REGION          (20),  // Master asking RS to open a region

If you look at EventType in EventHandler, the constructor does nothing
w/ the passed value.  Thats a problem.  That means the enum is using
default ordinal and the addition of the above into middle of enums
shifts lower enums up one; M_ZK_REGION_OFFLINE is just before
M_SERVER_SHUTDOWN.

It looks like we need to back out HBASE-5379 from 0.90 branch and cut a new RC.

Does rolling restart work for you Ram?

St.Ack


On Sat, Feb 18, 2012 at 11:25 PM, rama krishna <[email protected]> wrote:
>
> Hi Devs
> The download of 0.90.6RC4 is available at
> http://people.apache.org/~ramkrishna/0.90.6RC4/
> The release has been signed by Stack as my key is not  yet registered with 
> web of trust.
> Regarding the new issues added to 0.90 after RC3 are
>   HBASE-5377  Fix licenses on the 0.90 branch.
>   HBASE-5379  Backport HBASE-4287 to 0.90 - If region opening fails, try to 
> transition region back
>               to "offline" in ZK
>   HBASE-5396  Handle the regions in regionPlans while processing 
> ServerShutdownHandler(Jieshan)Improvements   HBASE-5327  Print a message when 
> an invalid hbase.rootdir is passed (Jimmy Xiang)
>   HBASE-5197  [replication] Handle socket timeouts in ReplicationSource
>               to prevent DDOS
>   HBASE-5395  CopyTable needs to use GenericOptionsParserI would like to 
> freeze the check ins to 0.90 till this RC goes out of release.Please provide 
> your votes on the release.  The voting closes on 25th Feb.Hope to release out 
> 0.90.6 before Feb ends.Thanks to all who contributed and looking forward for 
> your support.
> RegardsRam
>
>
>

Reply via email to