Yeah, as I suspected in my previous comment, for this type of timeouts, the
maintenance mode wouldn't give any help. It's weird that AM starts but
apparently does nothing until the namespace 5 mins timeout is reached:
...
2019-03-12 20:53:45,942 INFO  [master/node2:60000:becomeActiveMaster]
assignment.AssignmentManager: Joined the cluster in 308msec
2019-03-12 20:54:45,725 INFO
[ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0]
zookeeper.ZooKeeper: Session: 0x16911bd542a02a2 closed
2019-03-12 20:54:45,725 INFO
[ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0-EventThread]
zookeeper.ClientCnxn: EventThread shut down for session: 0x16911bd542a02a2
2019-03-12 20:58:46,603 ERROR [master/node2:60000:becomeActiveMaster]
master.HMaster: Failed to become active master
java.lang.IllegalStateException: Expected the service
ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
...

I would expect namespace region to be processed by this call
<https://github.com/apache/hbase/blob/2.2.0-RC0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1088>,
if
namespace region is offline, as the timeout suggests. Also odd is that we
don;'t see any logs suggesting offlined regions are getting assigned. Maybe
all regions are already online on RSes? But then master should had figured
that out. Have you already tried restart all RSes? That could kick some
reassignments.

Em qui, 14 de mar de 2019 às 02:21, Jean-Marc Spaggiari <
[email protected]> escreveu:

> Hi Wellington,
>
> Indeed, the META is now deployed.  I found the namespace region encoded
> name using hdfs dfs -ls -R /hbase/data/hbase/namespace and it gives
> me 7f4a480f47f98300185d1ae2ff663295. But here again, HBCK doesn't want to
> do anything because the master is initializing :( I tried with ad without
> the maintenant flag and I get the same result.
>
> On HBCK2 side: PleaseHoldException: Master is initializing
> On the master side, it just stoped after 5 minutes trying to assign
> namespace :(
>
> JMS
>
>
> Le mer. 13 mars 2019 à 12:04, Wellington Chevreuil <
> [email protected]> a écrit :
>
> > "1588230740" would be the meta region name, not namespace. It seems meta
> is
> > already online, per below log:
> > ...
> > 2019-03-12 20:53:41,037 INFO  [master/node2:60000:becomeActiveMaster]
> > master.HMaster: hbase:meta {1588230740 state=OPEN, ts=1552438420570,
> > server=
> > node7.distparser.com,16020,1552421510124}
> > ...
> >
> > The maintenance mode I suggested before was to have master doing minimum
> > required stuff while attempting to getting meta/namespace online, but I
> > guess it wouldn't be able to avoid such timeouts. Below message also
> means
> > AM could read meta table, giving another indication meta is fine:
> > ...
> > 2019-03-12 20:53:45,942 INFO  [master/node2:60000:becomeActiveMaster]
> > assignment.AssignmentManager: Joined the cluster in 308msec
> > ...
> >
> > Now issue is namespace table. For some reason, AM is not able to kick APs
> > before the 5 minutes timeout exceeds, and that's probably why namespace
> > table never comes available:
> > ...
> > 2019-03-12 20:53:45,942 INFO  [master/node2:60000:becomeActiveMaster]
> > assignment.AssignmentManager: Joined the cluster in 308msec
> > 2019-03-12 20:54:45,725 INFO
> > [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0]
> > zookeeper.ZooKeeper: Session: 0x16911bd542a02a2 closed
> > 2019-03-12 20:54:45,725 INFO
> > [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0-EventThread]
> > zookeeper.ClientCnxn: EventThread shut down for session:
> 0x16911bd542a02a2
> > 2019-03-12 20:58:46,603 ERROR [master/node2:60000:becomeActiveMaster]
> > master.HMaster: Failed to become active master
> > java.lang.IllegalStateException: Expected the service
> > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has
> FAILED
> > ...
> >
> > You may be able to force namespace region coming online with hbck2
> assigns
> > command. You would need to find out the namespace region name first, you
> > can either scan meta table or check the region dir name in hdfs with
> "hdfs
> > dfs -ls -R /hbase | grep namespace", in order to pass it as a param for
> >
> > Em qua, 13 de mar de 2019 às 13:00, Jean-Marc Spaggiari <
> > [email protected]> escreveu:
> >
> > > Hi Sean,
> > >
> > > I tried. I looked-up the region name for base:namespace like this:
> > >
> > > hdfs dfs -ls /hbase/data/hbase/meta/
> > >
> > > And found the region to be 1588230740.
> > >
> > > The master dies after 5 minutes, so I start the master, wait 2 minutes
> to
> > > be sure it's up, and run the following command:
> > >
> > > bin/hbase hbck -j
> > >
> >
> test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
> > > assigns 1588230740
> > >
> > > But HBCK2 doesn't like it:
> > > 08:57:35.273 [main] INFO
> > > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception,
> > > tries=9, retries=16, started=29322 ms ago, cancelled=false,
> > > msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
> > > at
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3057)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:942)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> > > at
> > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> > > at
> > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> > >
> > >
> > > It keeps retrying and after 16 times it stopped saying the master is
> not
> > > initialized.
> > >
> > > On the WebUI I can see that there is a single region assigned, the META
> > > region.
> > >
> > > Also, here is the HDFS structure of my META table. Sounds like some
> parts
> > > got lost in the process (The info content).
> > >
> > > hbase@node2:~/hbase-2.2.0$ hdfs dfs -ls -R /hbase/data/hbase/meta/
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:42
> > > /hbase/data/hbase/meta/.tabledesc
> > > -rw-r--r--   3 hbase supergroup       1447 2019-03-12 15:42
> > > /hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:42
> > > /hbase/data/hbase/meta/.tmp
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:49
> > > /hbase/data/hbase/meta/1588230740
> > > -rw-r--r--   3 hbase supergroup         32 2019-03-12 15:40
> > > /hbase/data/hbase/meta/1588230740/.regioninfo
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:40
> > > /hbase/data/hbase/meta/1588230740/info
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:40
> > > /hbase/data/hbase/meta/1588230740/recovered.edits
> > > -rw-r--r--   3 hbase supergroup          0 2019-03-12 15:40
> > > /hbase/data/hbase/meta/1588230740/recovered.edits/2.seqid
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:42
> > > /hbase/data/hbase/meta/1588230740/rep_barrier
> > > drwxr-xr-x   - hbase supergroup          0 2019-03-12 15:47
> > > /hbase/data/hbase/meta/1588230740/table
> > > -rw-r--r--   3 hbase supergroup       5454 2019-03-12 15:47
> > >
> /hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
> > >
> > > What will be the next best step?
> > >
> > > Thanks,
> > >
> > > JMS
> > >
> > >
> > >
> > > Le mer. 13 mars 2019 à 08:45, Sean Busbey <[email protected]> a écrit
> :
> > >
> > > > Okay so master thinks hbase:namespace is already enabled, but no RS
> > > > believes it should be hosting the regions.
> > > >
> > > > Can you find the region name for the hbase:namespace region and issue
> > > > an hbck2 assigns command for it?
> > > >
> > > > On Tue, Mar 12, 2019 at 8:26 PM Jean-Marc Spaggiari
> > > > <[email protected]> wrote:
> > > > >
> > > > > It doesn't say that much :(
> > > > >
> > > > > hbase@node2:~/hbase-2.2.0$ cat logs/hbase-hbase-master-node2.log
> |
> > > > grep -i
> > > > > namespace
> > > > > Caused by: java.io.IOException: Timedout 300000ms waiting for
> > namespace
> > > > > table to be assigned and enabled: tableName=hbase:namespace,
> > > > state=ENABLED
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > > > > Caused by: java.io.IOException: Timedout 300000ms waiting for
> > namespace
> > > > > table to be assigned and enabled: tableName=hbase:namespace,
> > > > state=ENABLED
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > > > >
> > > > > I cleared the logs before restarting the instance. That all what it
> > > says
> > > > > about namespace.
> > > > >
> > > > > Full logs are available there: https://pastebin.com/9j2Rzdcg
> > > > >
> > > > > Le mar. 12 mars 2019 à 20:47, Sean Busbey <[email protected]>
> a
> > > > écrit :
> > > > >
> > > > > > okay so the master spent ~5 minutes waiting to see if it could
> get
> > > the
> > > > > > namespace table working. when it couldn't it aborted.
> > > > > >
> > > > > > can you look back over that 5 minutes and see what the master had
> > to
> > > > > > say about the namespace table? did the master think some
> particular
> > > > > > server should have it open already? was it waiting for someone to
> > > > > > finish opening or closing it?
> > > > > >
> > > > > > On Tue, Mar 12, 2019 at 6:39 PM Jean-Marc Spaggiari
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Le mar. 12 mars 2019 à 19:25, Sean Busbey <[email protected]>
> a
> > > > écrit :
> > > > > > >
> > > > > > > > your command above points at the wrong jar from the hbck2
> repo.
> > > > it's
> > > > > > > >
> > > > > > > pointing at the one where you need to manually assemble all the
> > > > > > > > dependencies it has.
> > > > > > > >
> > > > > > > > You want the one that does not say "original" in the name.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Ha!!! That's why! Way easier ;)
> > > > > > >
> > > > > > > indeed, this works even without removing all environment
> > variables:
> > > > > > >  bin/hbase hbck -j
> > > > > > >
> > > > > >
> > > >
> > >
> >
> test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > Can you confirm it's the one in
> > > > > > > > > > the bin tarball? what does the version command output?
> What
> > > > does
> > > > > > the
> > > > > > > > > > mapredcp command output? What does the cli help for the
> > hbase
> > > > > > command
> > > > > > > > > > show?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >  hbase@node2:~/hbase-2.2.0$ hbase mapredcp
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> /home/hbase/hbase-2.2.0/bin/../lib/shaded-clients/hbase-shaded-mapreduce-2.2.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar
> > > > > > > > >
> > > > > > > >
> > > > > > > > that looks great now.
> > > > > > > >
> > > > > > > >
> > > > > > > > Once you correct the hbck2 jar above I think you'll be good
> for
> > > > > > invoking
> > > > > > > > HBCK2.
> > > > > > > >
> > > > > > > > Next, what does the initializing master say it's doing? It
> > should
> > > > be
> > > > > > > > on the master UI near the bottom. If it hasn't made progress
> > > since
> > > > > > > > your last update it'll be waiting for the hbase:namespace
> > table.
> > > > If it
> > > > > > > > is, find the region and see what the last few messages in the
> > > > master
> > > > > > > > log are about that region.
> > > > > > > >
> > > > > > >
> > > > > > > The master died some times ago. It dies after 5 minutes.
> > > > > > >
> > > > > > > 2019-03-12 19:35:58,568 ERROR
> > > [master/node2:60000:becomeActiveMaster]
> > > > > > > master.HMaster: Failed to become active master
> > > > > > > java.lang.IllegalStateException: Expected the service
> > > > > > > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the
> service
> > > has
> > > > > > FAILED
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1341)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1119)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2347)
> > > > > > > at
> > > > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:595)
> > > > > > > at java.lang.Thread.run(Thread.java:748)
> > > > > > > Caused by: java.io.IOException: Timedout 300000ms waiting for
> > > > namespace
> > > > > > > table to be assigned and enabled: tableName=hbase:namespace,
> > > > > > state=ENABLED
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1339)
> > > > > > > ... 4 more
> > > > > > >
> > > > > > >
> > > > > > > I just restarted it. I can see the meta table being assigned. I
> > can
> > > > > > access
> > > > > > > the WebUI and I don't see any initializing information. On the
> > > table
> > > > > > > section, I don't see anything, in any tab. However, when doing
> > > > "list" on
> > > > > > > the shell, I can see my tables. But I can not scan them.
> Scanning
> > > any
> > > > > > table
> > > > > > > gives :
> > > > > > > hbase(main):001:0> scan 'hbase:namespace'
> > > > > > > ROW                                   COLUMN+CELL
> > > > > > >
> > > > > > >
> > > > > > > ERROR: Unknown table hbase:namespace!
> > > > > > >
> > > > > > > For usage try 'help "scan"'
> > > > > > >
> > > > > > > Took 1.0395 seconds
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > JMS
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sean
> > > > > >
> > > >
> > >
> >
>

Reply via email to