yabinmeng opened a new issue #91:
URL: https://github.com/apache/pulsar-helm-chart/issues/91
**Describe the bug**
When using the official Helm chart to deploy a K8s based Pulsar cluster for
version 2.6.x, the **broker** pod is stuck in " wait-bookkeeper-ready" check
although the bookie pod is up and running without any issue, see Pod list below:
```
$ kubectl -n pulsar get pod
NAME READY STATUS
RESTARTS AGE
mytest-pulsar1-bookie-0 1/1 Running 0
7m14s
mytest-pulsar1-bookie-init-z8gtb 0/1 Completed 0
7m14s
mytest-pulsar1-broker-0 0/1 Init:1/2 0
7m14s
mytest-pulsar1-grafana-7bcb854cf4-lmbmj 1/1 Running 0
7m15s
mytest-pulsar1-prometheus-6f79d5c86c-2fdvt 1/1 Running 0
7m15s
mytest-pulsar1-proxy-0 0/1 Init:1/2 0
7m14s
mytest-pulsar1-pulsar-init-hln7v 0/1 Completed 0
7m14s
mytest-pulsar1-pulsar-manager-6959fb64d4-tl65f 1/1 Running 0
7m15s
mytest-pulsar1-recovery-0 1/1 Running 0
7m15s
mytest-pulsar1-toolset-0 1/1 Running 0
7m15s
mytest-pulsar1-zookeeper-0 1/1 Running 0
7m14s
```
It looks like it is stuck in the following check of "wait-bookkeeper-ready"
init container
```
until bin/bookkeeper shell whatisinstanceid; do
echo "bookkeeper cluster is not initialized yet. backoff for 3
seconds ...";
sleep 3;
done;
```
When I manually run command "bin/bookkeeper shell whatisinstanceid" in
"wait-bookkeeper-ready" init container, the result is as below and it looks
fine to me.
```
...
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.io.tmpdir=/tmp
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:java.compiler=<NA>
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.name=Linux
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.arch=amd64
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.version=5.4.0-1029-gke
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:user.name=root
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:user.home=/root
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:user.dir=/pulsar
20:53:30.932 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.memory.free=899MB
20:53:30.933 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.memory.max=1024MB
20:53:30.933 [main] INFO org.apache.zookeeper.ZooKeeper - Client
environment:os.memory.total=1024MB
20:53:30.940 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client
connection, connectString=mytest-pulsar1-zookeeper:2181 sessionTimeout=30000
watcher=org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase@5e4bd84a
20:53:30.948 [main] INFO org.apache.zookeeper.common.X509Util - Setting -D
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS
renegotiation
20:53:30.957 [main] INFO org.apache.zookeeper.ClientCnxnSocket -
jute.maxbuffer value is 4194304 Bytes
20:53:30.967 [main] INFO org.apache.zookeeper.ClientCnxn -
zookeeper.request.timeout value is 0. feature enabled=
20:53:30.986 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
mytest-pulsar1-zookeeper/10.100.1.36:2181. Will not attempt to authenticate
using SASL (unknown error)
20:53:30.994 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established, initiating
session, client: /10.100.0.168:55470, server:
mytest-pulsar1-zookeeper/10.100.1.36:2181
20:53:31.007 [main-SendThread(mytest-pulsar1-zookeeper:2181)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
mytest-pulsar1-zookeeper/10.100.1.36:2181, sessionid = 0x10000b712950011,
negotiated timeout = 30000
20:53:31.011 [main-EventThread] INFO
org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is
connected now.
20:53:31.040 [main] INFO
org.apache.bookkeeper.tools.cli.commands.bookies.InstanceIdCommand - Metadata
Service Uri: zk+null://mytest-pulsar1-zookeeper:2181/ledgers InstanceId:
0b091500-6750-479f-b419-25d957d5a4e0
20:53:31.147 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn -
EventThread shut down for session: 0x10000b712950011
20:53:31.148 [main] INFO org.apache.zookeeper.ZooKeeper - Session:
0x10000b712950011 closed
```
**To Reproduce**
Just follow the official procedure except specify an older Pulsar version in
the value.yaml file, as below:
```
mages:
zookeeper:
repository: apachepulsar/pulsar-all
tag: 2.6.1
pullPolicy: IfNotPresent
bookie:
repository: apachepulsar/pulsar-all
tag: 2.6.1
pullPolicy: IfNotPresent
autorecovery:
repository: apachepulsar/pulsar-all
tag: 2.6.1
pullPolicy: IfNotPresent
broker:
repository: apachepulsar/pulsar-all
tag: 2.6.1
pullPolicy: IfNotPresent
proxy:
repository: apachepulsar/pulsar-all
tag: 2.6.1
pullPolicy: IfNotPresent
functions:
repository: apachepulsar/pulsar-all
tag: 2.6.1
```
I also tested ont version 2.6.2 and this time both bookie and broker Pods
got stuck in the same check.
The chart works with 2.7.0 though without any issue.
**Expected behavior**
All Pulsar Pods should be up and running, as expected and demonstrated with
version 2.7.0
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- GKE (Ubuntu); K8s version: 1.17.14-gke.1600
**Additional context**
Add any other context about the problem here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]