OmCheeLin opened a new pull request, #882:
URL: https://github.com/apache/skywalking-banyandb/pull/882
# Internal TLS Certificate Dynamic Reloading Test Guide
This document describes how to test the dynamic reloading of certificates
for internal TLS communication between liaison nodes and data nodes.
## Overview
The internal TLS dynamic reloading feature allows certificates used for
internal TLS communication to be updated without restarting the service:
- **Liaison nodes**: CA certificate (`--data-client-ca-cert`) used to verify
data node server certificates, and server certificates (`--cert-file`,
`--key-file`) used for serving clients
- **Data nodes**: Server certificates (`--cert-file`, `--key-file`) used for
serving liaison node clients
This is similar to the external TLS certificate dynamic reloading feature
implemented in #12862.
## Test Scenario
### Prerequisites
1. Build the BanyanDB server:
```bash
make build
```
2. Prepare test certificates directory:
```bash
mkdir -p test-internal-certs && cd test-internal-certs
```
### Step 1: Generate Initial CA and Server Certificates
**Important**: The server certificate's Common Name (CN) or Subject
Alternative Name (SAN) must match the hostname used when nodes register to
etcd. If your nodes register with a hostname like `master01` or `node1`, the
certificate must include that hostname. If you use `localhost`, ensure all
connections use `localhost` as well.
**Note**: Modern TLS implementations (including Go's TLS library used by
gRPC) require Subject Alternative Name (SAN) extensions for proper hostname
verification. The certificate generation steps below include SAN extensions.
```bash
cd ~/test-internal-certs
# Generate CA certificate
openssl req -x509 -newkey rsa:2048 -keyout ca_key1.pem -out ca_cert1.pem
-days 365 -nodes -subj "/CN=TestCA1"
cp ca_cert1.pem ca_cert.pem
# Generate server private key
openssl genrsa -out server_key1.pem 2048
# Create certificate configuration file (for SAN extension)
# Replace 'master01' with your actual hostname
cat > server_cert.conf <<EOF
[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no
[req_distinguished_name]
CN = master01
[v3_req]
subjectAltName = @alt_names
[alt_names]
DNS.1 = master01
DNS.2 = localhost
IP.1 = 127.0.0.1
EOF
# Generate certificate signing request (with SAN extension)
openssl req -new -key server_key1.pem -out server_csr1.pem \
-subj "/CN=master01" \
-config server_cert.conf
# Sign certificate with CA (includes SAN extension)
openssl x509 -req -in server_csr1.pem -CA ca_cert1.pem -CAkey ca_key1.pem \
-CAcreateserial -out server_cert1.pem -days 365 \
-extensions v3_req -extfile server_cert.conf
# Copy to the filenames used by the servers
cp server_cert1.pem server_cert.pem
cp server_key1.pem server_key.pem
# Verify the certificate
echo "=== Check certificate Subject ==="
openssl x509 -in server_cert.pem -text -noout | grep -A 2 "Subject:"
echo "=== Check SAN extension ==="
openssl x509 -in server_cert.pem -text -noout | grep -A 5 "Subject
Alternative Name"
echo "=== Verify CA can verify server certificate ==="
openssl verify -CAfile ca_cert.pem server_cert.pem
```
The verification should show:
- Subject: CN = master01
- Subject Alternative Name: DNS:master01, DNS:localhost, IP Address:127.0.0.1
- Verification result: `server_cert.pem: OK`
### Step 2: Start Data Node with TLS
```bash
CERTS_DIR=/home/ubuntu/test-internal-certs
./banyand/build/bin/dev/banyand-server data \
--etcd-endpoints=http://127.0.0.1:2379 \
--tls=true \
--cert-file=$CERTS_DIR/server_cert.pem \
--key-file=$CERTS_DIR/server_key.pem \
--grpc-port=17912 \
--http-port=17913 \
--measure-root-path=/tmp/test-data-measure \
--stream-root-path=/tmp/test-data-stream
```
### Step 3: Start Liaison Node with Internal TLS
```bash
CERTS_DIR=/home/ubuntu/test-internal-certs
./banyand/build/bin/dev/banyand-server liaison \
--etcd-endpoints=http://127.0.0.1:2379 \
--data-client-tls \
--data-client-ca-cert=$CERTS_DIR/ca_cert.pem \
--grpc-port=17914 \
--http-port=17915
```
**Note**: The flag names are `--data-client-tls` and
`--data-client-ca-cert`. The prefix "data" comes from the role of the target
nodes (data nodes).
### Step 4: Verify Initial Connection
Check the liaison node logs to verify that it successfully connected to the
data node:
```bash
# Look for messages like:
# "new node is healthy, add it to active queue"
# "Started CA certificate file monitoring"
# "TLS file watcher loop started"
```
If you see `"new node is healthy, add it to active queue"` in the logs, the
connection is successful. If you see `"node is unhealthy"` or `"failed to
re-connect to grpc server"`, check the troubleshooting section below.
### Step 5: Generate New Certificates
When testing certificate reloading, generate new certificates with SAN
extensions:
```bash
cd ~/test-internal-certs
# Generate new CA certificate
openssl req -x509 -newkey rsa:2048 -keyout ca_key2.pem -out ca_cert2.pem
-days 365 -nodes -subj "/CN=TestCA2"
# Generate new server private key
openssl genrsa -out server_key2.pem 2048
# Create certificate configuration file for new certificate
cat > server_cert2.conf <<EOF
[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no
[req_distinguished_name]
CN = master01
[v3_req]
subjectAltName = @alt_names
[alt_names]
DNS.1 = master01
DNS.2 = localhost
IP.1 = 127.0.0.1
EOF
# Generate certificate signing request
openssl req -new -key server_key2.pem -out server_csr2.pem \
-subj "/CN=master01" \
-config server_cert2.conf
# Sign certificate with new CA
openssl x509 -req -in server_csr2.pem -CA ca_cert2.pem -CAkey ca_key2.pem \
-CAcreateserial -out server_cert2.pem -days 365 \
-extensions v3_req -extfile server_cert2.conf
# IMPORTANT: Update certificates in the correct order to avoid connection
failures:
# 1. First, update server certificates on data nodes (so they use
certificates signed by new CA)
# 2. Then, update CA certificate on liaison nodes (so they can verify the
new server certificates)
# Step 1: Update server cert and key on data node first
# This should trigger reload on data node
cp server_cert2.pem server_cert.pem && cp server_key2.pem server_key.pem
# Wait a few seconds for data node to reload the server certificate
sleep 2
# Step 2: Update CA cert file on liaison node
# This should trigger reload and reconnection on liaison for client
connections
cp ca_cert2.pem ca_cert.pem
```
**Note**:
- **Certificate Update Order**: When updating both CA and server
certificates, you must update the server certificates **first**, then update
the CA certificate. This ensures that when the liaison node reconnects with the
new CA certificate, the data node is already using a server certificate signed
by that new CA. If you update the CA certificate first, the liaison will try to
reconnect but fail because the data node is still using a certificate signed by
the old CA.
- When updating server certificates, you need to update them on both liaison
and data nodes if they are using the same certificate files. The reloader will
automatically detect the changes and reload the certificates.
### Step 6: Verify Certificate Reload
Wait a few seconds (the reloader has a 500ms debounce), then check the logs:
**Liaison node logs:**
```bash
# For CA certificate reload:
# "CA certificate updated, reconnecting clients"
# "successfully reconnected client after CA certificate update"
# For server certificate reload:
# "Successfully updated TLS certificate after content change"
# "TLS certificate updated in memory"
# "Starting TLS file monitoring"
```
**Data node logs:**
```bash
# Look for messages like:
# "Successfully updated TLS certificate after content change"
# "TLS certificate updated in memory"
# "Started TLS file monitoring for queue server"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]