devops-42 opened a new issue, #9959:
URL: https://github.com/apache/cloudstack/issues/9959
<!--
Verify first that your issue/request is not already reported on GitHub.
Also test if the latest release and main branch are affected too.
Always add information AFTER of these HTML comments, but no need to delete
the comments.
-->
##### ISSUE TYPE
* Bug Report
##### COMPONENT NAME
~~~
API / Backend
~~~
##### CLOUDSTACK VERSION
~~~
4.19.1.x
~~~
##### CONFIGURATION
Issue was with both: basic and advanced networking. The configuration is
shown below.
##### OS / ENVIRONMENT
Used setup:
Management + KVM host:
* OS: Ubuntu (jammy)
* Network: both use the same CIDR
Management:
* serves MySQL
* serves NFS shares for primary and secondary storage
KVM host:
* cloudbr0 configured
* Agent installed
* Joined via public ssh key (root)
##### SUMMARY
When setting up a zone (basic or advanced) the KVM host has joined to the
cluster, but the SSVM and the CPVM stuck in "Starting". The log file of the
SSVM shows a SSL error:
```
2024-11-20 23:54:22,618 WARN [cloud.agent.Agent] (main:null) NIO Connection
Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake
failed while connecting to host: **.**.**.** port: 8250
```
The same log indicates, that the cloud agent on the SSVM was not able to
detect the keystore:
```
2024-11-20 23:54:21,927 WARN [utils.nio.Link] (main:null) Failed to load
keystore, using trust all manager
```
After playing around I found out, that the cloud agent expects to have a
keystore `cloud.jks` in the `/usr/local/cloud/systemvm/conf` directory, which
is populated from the `/etc/cloudstack` directory. Unfortunately,
`/etc/cloudstack` is empty on the VM.
Already tried to work around by setting the global configuration parameter
`ca.plugin.root.auth.strictness` to `false` (not really working for me, but
with unexpected results):
* The Agent state of the system VM's turned immediately to `Up`, while the
overall state remains on `Starting`
* After restarting cloudstack management and/or cloudstack agent, but status
where `Up`.
* Creating a compute instance with a guest network is not possible, the
virtual router instance aborts with an error state (possibly due to the same
keystore issue)
##### STEPS TO REPRODUCE
Setup management server:
~~~
apt-get install -y \
apt-transport-https \
bridge-utils \
ca-certificates \
curl \
chrony \
gnupg \
lsb-release \
mysql-server \
net-tools \
nfs-kernel-server \
quota \
software-properties-common \
unattended-upgrades
cat <<'EOF' > /etc/mysql/mysql.conf.d/cloudstack.cnf
[mysqld]
server-id=1
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=350
log-bin=mysql-bin
binlog-format = 'ROW'
EOF
systemctl restart mysql
wget -O - https://download.cloudstack.org/release.asc | tee
/etc/apt/trusted.gpg.d/cloudstack.asc
echo "deb https://download.cloudstack.org/ubuntu noble 4.19" | tee
/etc/apt/sources.list.d/cloudstack.list
apt-get update
apt-get install -y cloudstack-management
mkdir -p /export/primary /export/secondary
echo "/export *(rw,async,no_root_squash,no_subtree_check,insecure)" >>
/etc/exports
exportfs -a
sed -i -e 's/^RPCMOUNTDOPTS="--manage-gids"$/RPCMOUNTDOPTS="-p 892
--manage-gids"/g' /etc/default/nfs-kernel-server
sed -i -e 's/^STATDOPTS=$/STATDOPTS="--port 662 --outgoing-port 2020"/g'
/etc/default/nfs-common
echo "NEED_STATD=yes" >> /etc/default/nfs-common
sed -i -e 's/^RPCRQUOTADOPTS=$/RPCRQUOTADOPTS="-p 875"/g' /etc/default/quota
service nfs-kernel-server restart
cloudstack-setup-databases ***:***@localhost --deploy-as=root -i 127.0.0.1
cloudstack-setup-management
~~~
Setup KVM host:
~~~
apt-get install -y \
apt-transport-https \
bridge-utils \
ca-certificates \
curl \
chrony \
gnupg \
lsb-release \
net-tools \
quota \
software-properties-common \
unattended-upgrades
cat <<'EOM' > /etc/netplan/01-netcfg.yaml
network:
version: 2
ethernets:
eth0: {}
bridges:
cloudbr0:
addresses:
- **.**.**.**/**
nameservers:
addresses:
- **.**.**.**
routes:
- to: default
via: **.**.**.**
metric: 100
interfaces: [eth0]
EOM
chmod 600 /etc/netplan/01-netcfg.yaml
mv /etc/netplan/50-cloud-init.yaml /etc/netplan/50-cloud-init.yaml.dist
netplan generate && netplan apply
wget -O - https://download.cloudstack.org/release.asc | tee
/etc/apt/trusted.gpg.d/cloudstack.asc
echo "deb https://download.cloudstack.org/ubuntu noble 4.19" | tee
/etc/apt/sources.list.d/cloudstack.list
apt-get update
apt-get install -y qemu-kvm cloudstack-agent
sed -i -e 's/\#vnc_listen.*$/vnc_listen = "0.0.0.0"/g' /etc/libvirt/qemu.conf
systemctl mask libvirtd.socket libvirtd-ro.socket libvirtd-admin.socket
libvirtd-tls.socket libvirtd-tcp.socket
systemctl restart libvirtd
mv /etc/libvirt/libvirtd.conf /etc/libvirt/libvirtd.conf.dist
cat <<'EOM' > /etc/libvirt/libvirtd.conf
listen_tls=0
listen_tcp=0
tcp_port = "16509"
mdns_adv = 0
auth_tcp = "none"
EOM
systemctl restart libvirtd
modprobe br_netfilter
echo 'net.bridge.bridge-nf-call-arptables = 0' >> /etc/sysctl.conf
echo 'net.bridge.bridge-nf-call-iptables = 0' >> /etc/sysctl.conf
echo 'net.bridge.bridge-nf-call-ip6tables = 0' >> /etc/sysctl.conf
sysctl -p
~~~
* Copy SSH key of the cloudstack management to the KVM host.
* Wait, until management is ready
* Login in, create an advanced zone using the gateway from the cloudstack
subnet and assign reserved IP ranges to pod and for public traffic.
* Create primary/secondary storage, join host using SSH key and root account.
* Enable zone
* Navigate to SystemVM below Infrastructure menu and see 2 VMs in starting
mode
##### EXPECTED RESULTS
SSVM and CPVM starting up, cloud agent is running. Creation of compute
instances using virtual router (isolated guest network) is possible.
##### ACTUAL RESULTS
Here the (hopefully) relevant log snippet.
~~~
2024-11-20 23:54:21,734 INFO [cloud.agent.Agent] (main:null) Agent [id =
new : type = PremiumSecondaryStorageResource : zone = 1 : pod = 1 : workers = 5
: host = **.**.**.** : port = 8250
2024-11-20 23:54:21,809 INFO [utils.nio.NioClient] (main:null) Connecting
to **.**.**.**:8250
2024-11-20 23:54:21,828 INFO [utils.nio.Link] (main:null) Conf file found:
/usr/local/cloud/systemvm/conf/agent.properties
2024-11-20 23:54:21,927 WARN [utils.nio.Link] (main:null) Failed to load
keystore, using trust all manager
2024-11-20 23:54:22,597 ERROR [utils.nio.Link] (main:null) SSL error caught
during unwrap data: Received fatal alert: bad_certificate, for local
address=/**.**.**.**:43322, remote address=/**.**.**.**:8250. The client may
have invalid ca-certificates.
2024-11-20 23:54:22,602 ERROR [utils.nio.NioClient] (main:null) SSL
Handshake failed while connecting to host: **.**.**.** port: 8250
2024-11-20 23:54:22,604 ERROR [utils.nio.NioConnection] (main:null) Unable
to initialize the threads.
java.io.IOException: SSL Handshake failed while connecting to host:
**.**.**.** port: 8250
at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
at com.cloud.agent.Agent.start(Agent.java:286)
at com.cloud.agent.AgentShell.launchNewAgent(AgentShell.java:454)
at
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:431)
at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:415)
at com.cloud.agent.AgentShell.start(AgentShell.java:511)
at com.cloud.agent.AgentShell.main(AgentShell.java:541)
2024-11-20 23:54:22,618 WARN [cloud.agent.Agent] (main:null) NIO Connection
Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake
failed while connecting to host: **.**.**.** port: 8250
2024-11-20 23:54:22,618 INFO [cloud.agent.Agent] (main:null) Attempted to
connect to the server, but received an unexpected exception, trying again...
~~~
Thanks for looking at it✌️
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]