Hi Wellington,

Indeed, the META is now deployed.  I found the namespace region encoded
name using hdfs dfs -ls -R /hbase/data/hbase/namespace and it gives
me 7f4a480f47f98300185d1ae2ff663295. But here again, HBCK doesn't want to
do anything because the master is initializing :( I tried with ad without
the maintenant flag and I get the same result.

On HBCK2 side: PleaseHoldException: Master is initializing
On the master side, it just stoped after 5 minutes trying to assign
namespace :(

JMS


Le mer. 13 mars 2019 à 12:04, Wellington Chevreuil <
[email protected]> a écrit :

> "1588230740" would be the meta region name, not namespace. It seems meta is
> already online, per below log:
> ...
> 2019-03-12 20:53:41,037 INFO  [master/node2:60000:becomeActiveMaster]
> master.HMaster: hbase:meta {1588230740 state=OPEN, ts=1552438420570,
> server=
> node7.distparser.com,16020,1552421510124}
> ...
>
> The maintenance mode I suggested before was to have master doing minimum
> required stuff while attempting to getting meta/namespace online, but I
> guess it wouldn't be able to avoid such timeouts. Below message also means
> AM could read meta table, giving another indication meta is fine:
> ...
> 2019-03-12 20:53:45,942 INFO  [master/node2:60000:becomeActiveMaster]
> assignment.AssignmentManager: Joined the cluster in 308msec
> ...
>
> Now issue is namespace table. For some reason, AM is not able to kick APs
> before the 5 minutes timeout exceeds, and that's probably why namespace
> table never comes available:
> ...
> 2019-03-12 20:53:45,942 INFO  [master/node2:60000:becomeActiveMaster]
> assignment.AssignmentManager: Joined the cluster in 308msec
> 2019-03-12 20:54:45,725 INFO
> [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0]
> zookeeper.ZooKeeper: Session: 0x16911bd542a02a2 closed
> 2019-03-12 20:54:45,725 INFO
> [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0-EventThread]
> zookeeper.ClientCnxn: EventThread shut down for session: 0x16911bd542a02a2
> 2019-03-12 20:58:46,603 ERROR [master/node2:60000:becomeActiveMaster]
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
> ...
>
> You may be able to force namespace region coming online with hbck2 assigns
> command. You would need to find out the namespace region name first, you
> can either scan meta table or check the region dir name in hdfs with "hdfs
> dfs -ls -R /hbase | grep namespace", in order to pass it as a param for
>
> Em qua, 13 de mar de 2019 às 13:00, Jean-Marc Spaggiari <
> [email protected]> escreveu:
>
> > Hi Sean,
> >
> > I tried. I looked-up the region name for base:namespace like this:
> >
> > hdfs dfs -ls /hbase/data/hbase/meta/
> >
> > And found the region to be 1588230740.
> >
> > The master dies after 5 minutes, so I start the master, wait 2 minutes to
> > be sure it's up, and run the following command:
> >
> > bin/hbase hbck -j
> >
> test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
> > assigns 1588230740
> >
> > But HBCK2 doesn't like it:
> > 08:57:35.273 [main] INFO
> > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception,
> > tries=9, retries=16, started=29322 ms ago, cancelled=false,
> > msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3057)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:942)
> > at
> >
> >
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> > at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> > at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> >
> >
> > It keeps retrying and after 16 times it stopped saying the master is not
> > initialized.
> >
> > On the WebUI I can see that there is a single region assigned, the META
> > region.
> >
> > Also, here is the HDFS structure of my META table. Sounds like some parts
> > got lost in the process (The info content).
> >
> > hbase@node2:~/hbase-2.2.0$ hdfs dfs -ls -R /hbase/data/hbase/meta/
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:42
> > /hbase/data/hbase/meta/.tabledesc
> > -rw-r--r--   3 hbase supergroup       1447 2019-03-12 15:42
> > /hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:42
> > /hbase/data/hbase/meta/.tmp
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:49
> > /hbase/data/hbase/meta/1588230740
> > -rw-r--r--   3 hbase supergroup         32 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/.regioninfo
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/info
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/recovered.edits
> > -rw-r--r--   3 hbase supergroup          0 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/recovered.edits/2.seqid
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:42
> > /hbase/data/hbase/meta/1588230740/rep_barrier
> > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:47
> > /hbase/data/hbase/meta/1588230740/table
> > -rw-r--r--   3 hbase supergroup       5454 2019-03-12 15:47
> > /hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
> >
> > What will be the next best step?
> >
> > Thanks,
> >
> > JMS
> >
> >
> >
> > Le mer. 13 mars 2019 à 08:45, Sean Busbey <[email protected]> a écrit :
> >
> > > Okay so master thinks hbase:namespace is already enabled, but no RS
> > > believes it should be hosting the regions.
> > >
> > > Can you find the region name for the hbase:namespace region and issue
> > > an hbck2 assigns command for it?
> > >
> > > On Tue, Mar 12, 2019 at 8:26 PM Jean-Marc Spaggiari
> > > <[email protected]> wrote:
> > > >
> > > > It doesn't say that much :(
> > > >
> > > > hbase@node2:~/hbase-2.2.0$ cat logs/hbase-hbase-master-node2.log  |
> > > grep -i
> > > > namespace
> > > > Caused by: java.io.IOException: Timedout 300000ms waiting for
> namespace
> > > > table to be assigned and enabled: tableName=hbase:namespace,
> > > state=ENABLED
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > > > Caused by: java.io.IOException: Timedout 300000ms waiting for
> namespace
> > > > table to be assigned and enabled: tableName=hbase:namespace,
> > > state=ENABLED
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > > >
> > > > I cleared the logs before restarting the instance. That all what it
> > says
> > > > about namespace.
> > > >
> > > > Full logs are available there: https://pastebin.com/9j2Rzdcg
> > > >
> > > > Le mar. 12 mars 2019 à 20:47, Sean Busbey <[email protected]> a
> > > écrit :
> > > >
> > > > > okay so the master spent ~5 minutes waiting to see if it could get
> > the
> > > > > namespace table working. when it couldn't it aborted.
> > > > >
> > > > > can you look back over that 5 minutes and see what the master had
> to
> > > > > say about the namespace table? did the master think some particular
> > > > > server should have it open already? was it waiting for someone to
> > > > > finish opening or closing it?
> > > > >
> > > > > On Tue, Mar 12, 2019 at 6:39 PM Jean-Marc Spaggiari
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Le mar. 12 mars 2019 à 19:25, Sean Busbey <[email protected]> a
> > > écrit :
> > > > > >
> > > > > > > your command above points at the wrong jar from the hbck2 repo.
> > > it's
> > > > > > >
> > > > > > pointing at the one where you need to manually assemble all the
> > > > > > > dependencies it has.
> > > > > > >
> > > > > > > You want the one that does not say "original" in the name.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Ha!!! That's why! Way easier ;)
> > > > > >
> > > > > > indeed, this works even without removing all environment
> variables:
> > > > > >  bin/hbase hbck -j
> > > > > >
> > > > >
> > >
> >
> test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > Can you confirm it's the one in
> > > > > > > > > the bin tarball? what does the version command output? What
> > > does
> > > > > the
> > > > > > > > > mapredcp command output? What does the cli help for the
> hbase
> > > > > command
> > > > > > > > > show?
> > > > > > > > >
> > > > > > > >
> > > > > > > >  hbase@node2:~/hbase-2.2.0$ hbase mapredcp
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> /home/hbase/hbase-2.2.0/bin/../lib/shaded-clients/hbase-shaded-mapreduce-2.2.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar
> > > > > > > >
> > > > > > >
> > > > > > > that looks great now.
> > > > > > >
> > > > > > >
> > > > > > > Once you correct the hbck2 jar above I think you'll be good for
> > > > > invoking
> > > > > > > HBCK2.
> > > > > > >
> > > > > > > Next, what does the initializing master say it's doing? It
> should
> > > be
> > > > > > > on the master UI near the bottom. If it hasn't made progress
> > since
> > > > > > > your last update it'll be waiting for the hbase:namespace
> table.
> > > If it
> > > > > > > is, find the region and see what the last few messages in the
> > > master
> > > > > > > log are about that region.
> > > > > > >
> > > > > >
> > > > > > The master died some times ago. It dies after 5 minutes.
> > > > > >
> > > > > > 2019-03-12 19:35:58,568 ERROR
> > [master/node2:60000:becomeActiveMaster]
> > > > > > master.HMaster: Failed to become active master
> > > > > > java.lang.IllegalStateException: Expected the service
> > > > > > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service
> > has
> > > > > FAILED
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1341)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1119)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2347)
> > > > > > at
> > > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:595)
> > > > > > at java.lang.Thread.run(Thread.java:748)
> > > > > > Caused by: java.io.IOException: Timedout 300000ms waiting for
> > > namespace
> > > > > > table to be assigned and enabled: tableName=hbase:namespace,
> > > > > state=ENABLED
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> > > > > > at
> > > > > >
> > > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1339)
> > > > > > ... 4 more
> > > > > >
> > > > > >
> > > > > > I just restarted it. I can see the meta table being assigned. I
> can
> > > > > access
> > > > > > the WebUI and I don't see any initializing information. On the
> > table
> > > > > > section, I don't see anything, in any tab. However, when doing
> > > "list" on
> > > > > > the shell, I can see my tables. But I can not scan them. Scanning
> > any
> > > > > table
> > > > > > gives :
> > > > > > hbase(main):001:0> scan 'hbase:namespace'
> > > > > > ROW                                   COLUMN+CELL
> > > > > >
> > > > > >
> > > > > > ERROR: Unknown table hbase:namespace!
> > > > > >
> > > > > > For usage try 'help "scan"'
> > > > > >
> > > > > > Took 1.0395 seconds
> > > > > >
> > > > > >
> > > > > >
> > > > > > JMS
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sean
> > > > >
> > >
> >
>

Reply via email to