Re: CDH4.4 and HBASE-8912 issue
Boris, what does hbck say? We have had this issue a couple times before. To fix it I had to stop the cluster, run offline meta repair tool, delete zk-store on each zk quorum node Offline Meta repair tool will not work if there are inconsistencies in HBase - you better try hbase hbck -fixAll first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ,www.carrieriq.com http://www.carrieriq.com/ e-mail:vrodionov@... http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org Hbck says 0 inconsistencies detected. I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair, and got INFO util.HBaseFsck: Success! .META. table rebuilt.. After that, cluster continued crashing during auto-loadbalancing. -- Best regards, Boris Emelyanov.
Re: CDH4.4 and HBASE-8912 issue
Hi, Boris Did you check RS logs ? There should be exception regarding why assignment failed. Can you past that exception ? Cheers :) On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov emelya...@post.km.ruwrote: Boris, what does hbck say? We have had this issue a couple times before. To fix it I had to stop the cluster, run offline meta repair tool, delete zk-store on each zk quorum node Offline Meta repair tool will not work if there are inconsistencies in HBase - you better try hbase hbck -fixAll first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@... http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org Hbck says 0 inconsistencies detected. I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair, and got INFO util.HBaseFsck: Success! .META. table rebuilt.. After that, cluster continued crashing during auto-loadbalancing. -- Best regards, Boris Emelyanov.
Re: CDH4.4 and HBASE-8912 issue
On 21.10.2013 12:17, Samir Ahmic wrote: Hi, Boris Did you check RS logs ? There should be exception regarding why assignment failed. Can you past that exception ? Cheers :) On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov emelya...@post.km.ru mailto:emelya...@post.km.ru wrote: Boris, what does hbck say? We have had this issue a couple times before. To fix it I had to stop the cluster, run offline meta repair tool, delete zk-store on each zk quorum node Offline Meta repair tool will not work if there are inconsistencies in HBase - you better try hbase hbck -fixAll first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com http://www.carrieriq.com/ e-mail:vrodionov@... http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org Hbck says 0 inconsistencies detected. I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair, and got INFO util.HBaseFsck: Success! .META. table rebuilt.. After that, cluster continued crashing during auto-loadbalancing. -- Best regards, Boris Emelyanov. Hi, Samir! Thank you for your answers! Actually, as I could understand, the assignment did not fail. Here are my logs (time may be slightly out of sync): on master: 2013-10-21 12:27:51,541 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. destination server is testhadoop-102.example.com,60020,1382339032897 2013-10-21 12:27:51,541 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d 74f05b.; plan=hri=mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b., src=, dest=testhadoop-102.example.com,60020,1382339032897 2013-10-21 12:27:51,541 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. to testhadoop-102.example.com,60020,1382339032897 2013-10-21 12:27:51,576 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-10-21 12:27:51,577 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. state=PENDING_OPEN, ts=1382344071576, server=testhadoop-102.example.com,60020,1382339032897 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. state=PENDING_OPEN, ts=1382344071576, server=testhadoop-102.example.com,60020,1382339032897 .. Cannot transit it to OFFLINE. on affected regionserver: 2013-10-21 12:27:52,561 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.. Version of ZK closing node:0 2013-10-21 12:27:52,562 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 2013-10-21 12:27:52,563 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.: disabling compactions flushes 2013-10-21 12:27:52,563 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store: Closed a 2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store: Closed b 2013-10-21 12:27:52,565 INFO org.apache.hadoop.hbase.regionserver.Store: Closed c 2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store: Closed d 2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store: Closed e 2013-10-21 12:27:52,567 INFO org.apache.hadoop.hbase.regionserver.Store: Closed f 2013-10-21 12:27:52,567 INFO org.apache.hadoop.hbase.regionserver.Store: Closed g 2013-10-21 12:27:52,567 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 2013-10-21 12:27:52,567 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x241bc934d55039b Attempting to transition node 45e518b477eeac50872de5a73d74f05b from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED 2013-10-21 12:27:52,600 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
Re: CDH4.4 and HBASE-8912 issue
I can't see anything wrong in your logs, but fact that you trigger this issue by running balancer makes me think that some of your RS may have some problem. Here is what would i do in this situation: 1. Make sure that system time, OS configuration, hadoop/HBase configuration is synced on all servers 2. I would try to isolate issue (first start HMaster and then add regionservers one by one in order to determine if some of regionservers cause this issue) 3. Check what hadoop says about HBase data (hadoop fsck /hbase -files -locations -blocks) 4. Try to determine if some of your regions have some issues (hbase hbck -details) Good luck :) On Mon, Oct 21, 2013 at 11:24 AM, Boris Emelyanov emelya...@post.km.ruwrote: On 21.10.2013 12:17, Samir Ahmic wrote: Hi, Boris Did you check RS logs ? There should be exception regarding why assignment failed. Can you past that exception ? Cheers :) On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov emelya...@post.km.ruwrote: Boris, what does hbck say? We have had this issue a couple times before. To fix it I had to stop the cluster, run offline meta repair tool, delete zk-store on each zk quorum node Offline Meta repair tool will not work if there are inconsistencies in HBase - you better try hbase hbck -fixAll first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@... http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org Hbck says 0 inconsistencies detected. I stopped hbase cluster, deleted zk-database on all quorum nodes, ran hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair, and got INFO util.HBaseFsck: Success! .META. table rebuilt.. After that, cluster continued crashing during auto-loadbalancing. -- Best regards, Boris Emelyanov. Hi, Samir! Thank you for your answers! Actually, as I could understand, the assignment did not fail. Here are my logs (time may be slightly out of sync): on master: 2013-10-21 12:27:51,541 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. destination server is testhadoop-102.example.com,60020,1382339032897 2013-10-21 12:27:51,541 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d 74f05b.; plan=hri=mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b., src=, dest=testhadoop-102.example.com,60020,1382339032897 2013-10-21 12:27:51,541 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. to testhadoop-102.example.com,60020,1382339032897 2013-10-21 12:27:51,576 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-10-21 12:27:51,577 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. state=PENDING_OPEN, ts=1382344071576, server=testhadoop-102.example.com,60020,1382339032897 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. state=PENDING_OPEN, ts=1382344071576, server=testhadoop-102.example.com,60020,1382339032897 .. Cannot transit it to OFFLINE. on affected regionserver: 2013-10-21 12:27:52,561 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.. Version of ZK closing node:0 2013-10-21 12:27:52,562 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 2013-10-21 12:27:52,563 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.: disabling compactions flushes 2013-10-21 12:27:52,563 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b. 2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store: Closed a 2013-10-21 12:27:52,564 INFO org.apache.hadoop.hbase.regionserver.Store: Closed b 2013-10-21 12:27:52,565 INFO org.apache.hadoop.hbase.regionserver.Store: Closed c 2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store: Closed d 2013-10-21 12:27:52,566 INFO org.apache.hadoop.hbase.regionserver.Store: Closed e
CDH4.4 and HBASE-8912 issue
Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with hbase-0.94.6 and have faced https://issues.apache.org/jira/browse/HBASE-8912 issue. Suggested solution was to update hbase to version 0.94.13, which is absent in cloudera distribution. Is it possible to run pure hbase over cloudera hadoop? Or how can i find if this bug is present in previous versions of cdh? -- Best regards, Boris Emelyanov.
Re: CDH4.4 and HBASE-8912 issue
If I read Lars' comment on the JIRA correctly, HBASE-8912's target was moved to 0.94.13 It is still open. Meaning, if there is no patch, the target may move to next release. Cheers On Oct 17, 2013, at 2:25 AM, Boris Emelyanov emelya...@post.km.ru wrote: Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with hbase-0.94.6 and have faced https://issues.apache.org/jira/browse/HBASE-8912 issue. Suggested solution was to update hbase to version 0.94.13, which is absent in cloudera distribution. Is it possible to run pure hbase over cloudera hadoop? Or how can i find if this bug is present in previous versions of cdh? -- Best regards, Boris Emelyanov.
Re: CDH4.4 and HBASE-8912 issue
Moving to u...@hbase.apache.org. HBASE-8912 is still unresolved, and 0.94.13 is just the targeted version presently. Are you certain this is the exact issue you're hitting? I believe you can workaround this by removing the specific bad znode in ZK or so. When starting up after your major upgrade, did you ensure cleaning your ZK /hbase znode? On Thu, Oct 17, 2013 at 2:55 PM, Boris Emelyanov emelya...@post.km.ru wrote: Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with hbase-0.94.6 and have faced https://issues.apache.org/jira/browse/HBASE-8912 issue. Suggested solution was to update hbase to version 0.94.13, which is absent in cloudera distribution. Is it possible to run pure hbase over cloudera hadoop? Or how can i find if this bug is present in previous versions of cdh? -- Best regards, Boris Emelyanov. -- Harsh J
Re: CDH4.4 and HBASE-8912 issue
On 17.10.2013 13:43, Harsh J wrote: Moving tou...@hbase.apache.org. HBASE-8912 is still unresolved, and 0.94.13 is just the targeted version presently. Are you certain this is the exact issue you're hitting? I believe you can workaround this by removing the specific bad znode in ZK or so. When starting up after your major upgrade, did you ensure cleaning your ZK /hbase znode? On Thu, Oct 17, 2013 at 2:55 PM, Boris Emelyanovemelya...@post.km.ru wrote: Hello! I've just upgraded my hadoop test-cluster from cdh3 to cdh4.4 with hbase-0.94.6 and have facedhttps://issues.apache.org/jira/browse/HBASE-8912 issue. Suggested solution was to update hbase to version 0.94.13, which is absent in cloudera distribution. Is it possible to run pure hbase over cloudera hadoop? Or how can i find if this bug is present in previous versions of cdh? -- Best regards, Boris Emelyanov. Are you certain this is the exact issue you're hitting? I gues so. Description suits just fine! I believe you can workaround this by removing the specific bad znode in ZK or so. When I disable loadbalancer just after master startup (balance_switch false), cluster works fine. But when loadbalancer is enabled and it starts, it allways fails with exception described in issue, allways with different regions. When starting up after your major upgrade, did you ensure cleaning your ZK /hbase znode? Yes, of course. -- Best regards, Boris Emelyanov.