saffronjam opened a new issue, #7829:
URL: https://github.com/apache/cloudstack/issues/7829
##### ISSUE TYPE
* Bug Report
##### COMPONENT NAME
~~~
CKS
~~~
##### CLOUDSTACK VERSION
~~~
4.18
~~~
##### CONFIGURATION
CKS 1.24.0 and 1.27.3
(also 1.26.0, I experienced the same problems, but I did not include it in
this bug report,)
I used 6 different setups using 2 service offerings with the 2 CKS-versions
specified above.
Every setup had 1 controller node and the default Kubernetes isolated
network offering.
SO1 (small):
- 1 CPU cores
- 2 GB RAM
- 8 GB root disk
SO2 (big):
- 4 CPU cores
- 16 GB RAM
- 64 GB root disk
```
test-124-small
1 worker
SO 1
CKS 1.24.0
```
```
test-124-small-many
10 workers
SO 1
CKS 1.24.0
```
```
test-1273-small
1 worker
SO 1
CKS 1.27.3
```
```
test-1273-small-many
10 workers
SO 1
CKS 1.27.3
```
```
test-1273-big
1 worker
SO 2
CKS 1.27.3
```
```
test-1273-big-many
10 workers
SO 2
CKS 1.27.3
```
##### OS / ENVIRONMENT
Ubuntu 22.04 nodes running KVM
##### SUMMARY
When creating Kubernetes clusters, it fails some of the times. It appears as
if it is more common when creating multiple nodes instead of just a few.
I can access the nodes in the failed clusters through ssh just fine and I am
able to look at the logs, such as /var/log/daemon.log.
In the _test-1273-big-many-node1_ the entries in daemon.log indicates that
the issue is related to some binary ISO not being attached.
```
Aug 8 09:30:21 systemvm cloud-init[1102]: Waiting for Binaries directory
/mnt/k8sdisk/ to be available, sleeping for 15 seconds, attempt: 99
Aug 8 09:30:36 systemvm cloud-init[1102]: Waiting for Binaries directory
/mnt/k8sdisk/ to be available, sleeping for 15 seconds, attempt: 100
Aug 8 09:30:51 systemvm cloud-init[1102]: Warning: Offline install timed
out!
which I assume could cause the following entry:
Aug 8 09:30:52 systemvm deploy-kube-system[1420]:
/opt/bin/deploy-kube-system: line 19: kubeadm: command not found
```
But that is not the case for every failed cluster, such as
_test-small-many_, actually succeeds on some nodes:
```
Aug 8 09:43:10 systemvm cloud-init[1070]: Waiting for Binaries directory
/mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 8
Aug 8 09:43:41 systemvm cloud-init[1070]: Waiting for Binaries directory
/mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 9
Aug 8 09:44:11 systemvm cloud-init[1070]: Installing binaries from
/mnt/k8sdisk/
```
But fails on the 5th node out of 10 total with the same error
```
Aug 8 10:00:16 systemvm cloud-init[1072]: Waiting for Binaries directory
/mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 39
Aug 8 10:00:46 systemvm cloud-init[1072]: Waiting for Binaries directory
/mnt/k8sdisk/ to be available, sleeping for 30 seconds, attempt: 40
Aug 8 10:01:16 systemvm cloud-init[1072]: Warning: Offline install timed
out!
with the kubectl output:
➜ ~ kubectl get nodes
NAME STATUS ROLES AGE
VERSION
test-1273-small-many-control-189d4833e43 Ready control-plane 3h7m
v1.27.3
test-1273-small-many-node-189d483a95b Ready <none> 3h7m
v1.27.3
test-1273-small-many-node-189d484150e Ready <none> 3h7m
v1.27.3
test-1273-small-many-node-189d4847014 Ready <none> 3h7m
v1.27.3
test-1273-small-many-node-189d484b721 Ready <none> 3h7m
v1.27.3
....
should be 6 more!
```
All logs can be found at the bottom of this issue
##### STEPS TO REPRODUCE
~~~
1. Create a cluster with one of the configurations above
2. Wait for it to be created
3. Check if it worked or not
~~~
##### EXPECTED RESULTS
~~~
Cluster ends up in Running state
~~~
##### ACTUAL RESULTS
~~~
Cluster ends up in Error state
~~~
##### LOGS
I supplied logs from /var/log/daemon.log. I added logs from the controller
and the first worker.
For the clusters with more than 1 worker, I added the logs for the first
worker that failed as well.
**SUCCEEDED**
[test-124-small-controller.txt](https://github.com/apache/cloudstack/files/12291211/test-124-small-controller.txt)
[test-124-small-node1.txt](https://github.com/apache/cloudstack/files/12291215/test-124-small-node1.txt)
**FAILED**
[test-124-small-many-controller.txt](https://github.com/apache/cloudstack/files/12291212/test-124-small-many-controller.txt)
[test-124-small-many-node1.txt](https://github.com/apache/cloudstack/files/12291214/test-124-small-many-node1.txt)
[test-124-small-many-node2.txt](https://github.com/apache/cloudstack/files/12291652/test-124-small-many-node2.txt)
**FAILED**
[test-1273-small-controller.txt](https://github.com/apache/cloudstack/files/12291221/test-1273-small-controller.txt)
[test-1273-small-node1.txt](https://github.com/apache/cloudstack/files/12291224/test-1273-small-node1.txt)
**FAILED**
[test-1273-small-many-controller.txt](https://github.com/apache/cloudstack/files/12291222/test-1273-small-many-controller.txt)
[test-1273-small-many-node1.txt](https://github.com/apache/cloudstack/files/12291223/test-1273-small-many-node1.txt)
[test-1273-small-many-node5.txt](https://github.com/apache/cloudstack/files/12291554/test-1273-small-many-node5.txt)
**SUCCEEDED**
[test-1273-big-controller.txt](https://github.com/apache/cloudstack/files/12291216/test-1273-big-controller.txt)
[test-1273-big-node1.txt](https://github.com/apache/cloudstack/files/12291219/test-1273-big-node1.txt)
**FAILED**
[test-1273-big-many-controller.txt](https://github.com/apache/cloudstack/files/12291217/test-1273-big-many-controller.txt)
[test-1273-big-many-node1.txt](https://github.com/apache/cloudstack/files/12291218/test-1273-big-many-node1.txt)
(failed on the first node)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]