Re: CDH4.4 and HBASE-8912 issue

2013-10-21 Thread Boris Emelyanov

Boris, what does hbck say?

We have had this issue a couple times before. To fix it I had to stop the 
cluster, run offline meta repair tool,
delete zk-store on each zk quorum node
Offline Meta repair tool will not work if there are  inconsistencies  in HBase 
- you better try hbase hbck
-fixAll first.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ,www.carrieriq.com  http://www.carrieriq.com/
e-mail:vrodionov@...  
http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org


Hbck says 0 inconsistencies detected.
I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase 
org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair,
and got INFO util.HBaseFsck: Success! .META. table rebuilt..
After that, cluster continued crashing during auto-loadbalancing.


--
Best regards,

Boris Emelyanov.



Re: CDH4.4 and HBASE-8912 issue

2013-10-21 Thread Samir Ahmic
Hi, Boris

Did you check RS logs ? There should be exception regarding why assignment
failed. Can you past that exception ?

Cheers :)


On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov emelya...@post.km.ruwrote:

  Boris, what does hbck say?
 
 We have had this issue a couple times before. To fix it I had to stop the 
 cluster, run offline meta repair tool,
 delete zk-store on each zk quorum node
 Offline Meta repair tool will not work if there are  inconsistencies  in 
 HBase - you better try hbase hbck
 -fixAll first.
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com

 e-mail: vrodionov@... 
 http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org

 Hbck says 0 inconsistencies detected.
 I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase 
 org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair,
 and got INFO util.HBaseFsck: Success! .META. table rebuilt..
 After that, cluster continued crashing during auto-loadbalancing.



 --
 Best regards,

 Boris Emelyanov.




Re: CDH4.4 and HBASE-8912 issue

2013-10-21 Thread Boris Emelyanov

On 21.10.2013 12:17, Samir Ahmic wrote:

Hi, Boris

Did you check RS logs ? There should be exception regarding why 
assignment failed. Can you past that exception ?


Cheers :)


On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov emelya...@post.km.ru 
mailto:emelya...@post.km.ru wrote:


Boris, what does hbck say?  We have had this issue a couple
times before. To fix it I had to stop the cluster, run offline
meta repair tool, delete zk-store on each zk quorum node Offline
Meta repair tool will not work if there are inconsistencies in
HBase - you better try hbase hbck -fixAll first.  Best regards,
Vladimir Rodionov Principal Platform Engineer Carrier IQ,
www.carrieriq.com http://www.carrieriq.com/

e-mail:vrodionov@...  
http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org

Hbck says 0 inconsistencies detected.
I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase 
org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair,
and got INFO util.HBaseFsck: Success! .META. table rebuilt..
After that, cluster continued crashing during auto-loadbalancing.


-- 
Best regards,


Boris Emelyanov.



Hi, Samir! Thank you for your answers!

Actually, as I could understand, the assignment did not fail.
Here are my logs (time may be slightly out of sync):

on master:

2013-10-21 12:27:51,541 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan 
for 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 


destination server is testhadoop-102.example.com,60020,1382339032897
2013-10-21 12:27:51,541 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing 
plan for region 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d
74f05b.; 
plan=hri=mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b., 
src=, dest=testhadoop-102.example.com,60020,1382339032897
2013-10-21 12:27:51,541 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 
to testhadoop-102.example.com,60020,1382339032897
2013-10-21 12:27:51,576 FATAL org.apache.hadoop.hbase.master.HMaster: 
Master server abort: loaded coprocessors are: []
2013-10-21 12:27:51,577 FATAL org.apache.hadoop.hbase.master.HMaster: 
Unexpected state : 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 
state=PENDING_OPEN, ts=1382344071576, 
server=testhadoop-102.example.com,60020,1382339032897 .. Cannot transit 
it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 
state=PENDING_OPEN, ts=1382344071576, 
server=testhadoop-102.example.com,60020,1382339032897 .. Cannot transit 
it to OFFLINE.


on affected regionserver:

2013-10-21 12:27:52,561 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close 
region: 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.. 
Version of ZK closing node:0
2013-10-21 12:27:52,562 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: 
Processing close of 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
2013-10-21 12:27:52,563 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegion: Closing 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.: 
disabling compactions  flushes
2013-10-21 12:27:52,563 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for 
region 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed a
2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed b
2013-10-21 12:27:52,565 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed c
2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed d
2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed e
2013-10-21 12:27:52,567 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed f
2013-10-21 12:27:52,567 INFO org.apache.hadoop.hbase.regionserver.Store: 
Closed g
2013-10-21 12:27:52,567 INFO 
org.apache.hadoop.hbase.regionserver.HRegion: Closed 
mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
2013-10-21 12:27:52,567 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x241bc934d55039b Attempting to transition node 
45e518b477eeac50872de5a73d74f05b from M_ZK_REGION_CLOSING to 
RS_ZK_REGION_CLOSED
2013-10-21 12:27:52,600 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZKAssign: 

Re: CDH4.4 and HBASE-8912 issue

2013-10-21 Thread Samir Ahmic
I can't see anything wrong in your logs, but fact that you trigger this
issue by running balancer makes me think that some of your RS may have some
problem. Here is what would i do in this situation:

1. Make sure that system time, OS configuration, hadoop/HBase configuration
is synced on all servers
2. I would try to isolate issue (first start HMaster and then add
regionservers one by one in order to determine if some of regionservers
cause this issue)
3. Check what hadoop says about HBase data (hadoop fsck /hbase  -files
-locations -blocks)
4. Try to determine if some of your regions have some issues (hbase hbck
-details)

Good luck :)


On Mon, Oct 21, 2013 at 11:24 AM, Boris Emelyanov emelya...@post.km.ruwrote:

  On 21.10.2013 12:17, Samir Ahmic wrote:

 Hi, Boris

  Did you check RS logs ? There should be exception regarding why
 assignment failed. Can you past that exception ?

  Cheers :)


 On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov emelya...@post.km.ruwrote:


 Boris, what does hbck say?
 
 We have had this issue a couple times before. To fix it I had to stop the 
 cluster, run offline meta repair tool,
 delete zk-store on each zk quorum node
 Offline Meta repair tool will not work if there are  inconsistencies  in 
 HBase - you better try hbase hbck
 -fixAll first.
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com

 e-mail: vrodionov@... 
 http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org

 Hbck says 0 inconsistencies detected.
 I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase 
 org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair,
 and got INFO util.HBaseFsck: Success! .META. table rebuilt..
 After that, cluster continued crashing during auto-loadbalancing.



  --
 Best regards,

 Boris Emelyanov.


  Hi, Samir! Thank you for your answers!

 Actually, as I could understand, the assignment did not fail.
 Here are my logs (time may be slightly out of sync):

 on master:

 2013-10-21 12:27:51,541 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan
 for
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.

 destination server is testhadoop-102.example.com,60020,1382339032897
 2013-10-21 12:27:51,541 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
 for region
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d
 74f05b.;
 plan=hri=mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.,
 src=, dest=testhadoop-102.example.com,60020,1382339032897
 2013-10-21 12:27:51,541 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
 to testhadoop-102.example.com,60020,1382339032897
 2013-10-21 12:27:51,576 FATAL org.apache.hadoop.hbase.master.HMaster:
 Master server abort: loaded coprocessors are: []
 2013-10-21 12:27:51,577 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unexpected state :
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
 state=PENDING_OPEN, ts=1382344071576, 
 server=testhadoop-102.example.com,60020,1382339032897
 .. Cannot transit it to OFFLINE.
 java.lang.IllegalStateException: Unexpected state :
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
 state=PENDING_OPEN, ts=1382344071576, 
 server=testhadoop-102.example.com,60020,1382339032897
 .. Cannot transit it to OFFLINE.

 on affected regionserver:

 2013-10-21 12:27:52,561 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region:
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b..
 Version of ZK closing node:0
 2013-10-21 12:27:52,562 DEBUG
 org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
 close of
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
 2013-10-21 12:27:52,563 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion: Closing
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.:
 disabling compactions  flushes
 2013-10-21 12:27:52,563 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region
 mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
 2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store:
 Closed a
 2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store:
 Closed b
 2013-10-21 12:27:52,565 INFO org.apache.hadoop.hbase.regionserver.Store:
 Closed c
 2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store:
 Closed d
 2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store:
 Closed e
 

CDH4.4 and HBASE-8912 issue

2013-10-17 Thread Boris Emelyanov
Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 
with hbase-0.94.6 and have faced 
https://issues.apache.org/jira/browse/HBASE-8912 issue.


Suggested solution was to update hbase to version 0.94.13, which is 
absent in cloudera distribution.


Is it possible to run pure hbase over cloudera hadoop?

Or how can i find if this bug is present in previous versions of cdh?

--
Best regards,

Boris Emelyanov.



Re: CDH4.4 and HBASE-8912 issue

2013-10-17 Thread Ted Yu
If I read Lars' comment on the JIRA correctly, HBASE-8912's target was moved to 
0.94.13

It is still open. Meaning, if there is no patch, the target may move to next 
release. 

Cheers

On Oct 17, 2013, at 2:25 AM, Boris Emelyanov emelya...@post.km.ru wrote:

 Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with 
 hbase-0.94.6 and have faced https://issues.apache.org/jira/browse/HBASE-8912 
 issue.
 
 Suggested solution was to update hbase to version 0.94.13, which is absent in 
 cloudera distribution.
 
 Is it possible to run pure hbase over cloudera hadoop?
 
 Or how can i find if this bug is present in previous versions of cdh?
 -- 
 Best regards,
 
 Boris Emelyanov.


Re: CDH4.4 and HBASE-8912 issue

2013-10-17 Thread Harsh J
Moving to u...@hbase.apache.org.

HBASE-8912 is still unresolved, and 0.94.13 is just the targeted
version presently.

Are you certain this is the exact issue you're hitting? I believe you
can workaround this by removing the specific bad znode in ZK or so.
When starting up after your major upgrade, did you ensure cleaning
your ZK /hbase znode?

On Thu, Oct 17, 2013 at 2:55 PM, Boris Emelyanov emelya...@post.km.ru wrote:
 Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with
 hbase-0.94.6 and have faced https://issues.apache.org/jira/browse/HBASE-8912
 issue.

 Suggested solution was to update hbase to version 0.94.13, which is absent
 in cloudera distribution.

 Is it possible to run pure hbase over cloudera hadoop?

 Or how can i find if this bug is present in previous versions of cdh?

 --
 Best regards,

 Boris Emelyanov.



-- 
Harsh J


Re: CDH4.4 and HBASE-8912 issue

2013-10-17 Thread Boris Emelyanov



On 17.10.2013 13:43, Harsh J wrote:

Moving tou...@hbase.apache.org.

HBASE-8912 is still unresolved, and 0.94.13 is just the targeted
version presently.

Are you certain this is the exact issue you're hitting? I believe you
can workaround this by removing the specific bad znode in ZK or so.
When starting up after your major upgrade, did you ensure cleaning
your ZK /hbase znode?

On Thu, Oct 17, 2013 at 2:55 PM, Boris Emelyanovemelya...@post.km.ru  wrote:

Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with
hbase-0.94.6 and have facedhttps://issues.apache.org/jira/browse/HBASE-8912
issue.

Suggested solution was to update hbase to version 0.94.13, which is absent
in cloudera distribution.

Is it possible to run pure hbase over cloudera hadoop?

Or how can i find if this bug is present in previous versions of cdh?

--
Best regards,

Boris Emelyanov.





Are you certain this is the exact issue you're hitting?


I gues so. Description suits just fine!


I believe you can workaround this by removing the specific bad znode

in ZK or so.

When I disable loadbalancer just after master startup (balance_switch
false), cluster works fine.
But when loadbalancer is enabled and it starts, it allways fails with
exception described in issue, allways with different regions.


When starting up after your major upgrade, did you ensure cleaning

your ZK /hbase znode?

Yes, of course.

--
Best regards,

Boris Emelyanov.