Copilot commented on code in PR #882:
URL: 
https://github.com/apache/skywalking-banyandb/pull/882#discussion_r2592965536


##########
banyand/queue/pub/pub.go:
##########
@@ -441,13 +486,55 @@ func isFailoverError(err error) bool {
 }
 
 func (p *pub) getClientTransportCredentials() ([]grpc.DialOption, error) {
+       if !p.tlsEnabled {
+               return grpchelper.SecureOptions(nil, false, false, "")
+       }
+
+       // Use reloader if available (for dynamic reloading)
+       if p.caCertReloader != nil {
+               // Extract server name from the connection (we'll use a default 
for now)
+               // The actual server name will be validated by the TLS handshake

Review Comment:
   The comment on lines 495-496 states "Extract server name from the connection 
(we'll use a default for now)" and "The actual server name will be validated by 
the TLS handshake", but the code passes an empty string `""` as the server 
name. 
   
   This comment is misleading because:
   1. No extraction is actually happening - an empty string is hardcoded
   2. When `ServerName` is empty, gRPC uses the target hostname from the dial 
address for validation (this is the standard behavior)
   
   Consider clarifying the comment:
   
   ```go
   // Use reloader if available (for dynamic reloading)
   // Pass empty server name - gRPC will extract it from the dial target address
   tlsConfig, err := p.caCertReloader.GetClientTLSConfig("")
   ```
   ```suggestion
                // Pass empty server name - gRPC will use the dial target 
address for TLS validation.
                // No extraction is performed; TLS handshake will validate 
using the dial target.
   ```



##########
banyand/queue/pub/pub.go:
##########
@@ -441,13 +486,55 @@ func isFailoverError(err error) bool {
 }
 
 func (p *pub) getClientTransportCredentials() ([]grpc.DialOption, error) {
+       if !p.tlsEnabled {
+               return grpchelper.SecureOptions(nil, false, false, "")
+       }
+
+       // Use reloader if available (for dynamic reloading)
+       if p.caCertReloader != nil {
+               // Extract server name from the connection (we'll use a default 
for now)
+               // The actual server name will be validated by the TLS handshake
+               tlsConfig, err := p.caCertReloader.GetClientTLSConfig("")
+               if err != nil {
+                       return nil, fmt.Errorf("failed to get TLS config from 
reloader: %w", err)
+               }
+               creds := credentials.NewTLS(tlsConfig)
+               return []grpc.DialOption{grpc.WithTransportCredentials(creds)}, 
nil
+       }
+
+       // Fallback to static file reading if reloader is not available
        opts, err := grpchelper.SecureOptions(nil, p.tlsEnabled, false, 
p.caCertPath)
        if err != nil {
                return nil, fmt.Errorf("failed to load TLS config: %w", err)
        }
        return opts, nil
 }
 
+// reconnectAllClients reconnects all active clients when CA certificate is 
updated.
+func (p *pub) reconnectAllClients() {
+       // Collect nodes from p.register with lock
+       p.mu.Lock()
+       nodesToReconnect := make([]schema.Metadata, 0, len(p.registered))
+       for _, node := range p.registered {
+               md := schema.Metadata{
+                       TypeMeta: schema.TypeMeta{
+                               Kind: schema.KindNode,
+                       },
+                       Spec: node,
+               }
+               nodesToReconnect = append(nodesToReconnect, md)
+       }
+       p.mu.Unlock()
+
+       // Reconnect all nodes using OnDelete and OnAddOrUpdate
+       for _, md := range nodesToReconnect {
+               // Call OnDelete to properly clean up the old connection
+               p.OnDelete(md)
+               // Call OnAddOrUpdate to reconnect with new certificate

Review Comment:
   The `reconnectAllClients` function has a logic error that prevents actual 
reconnection. When `OnDelete` is called followed immediately by `OnAddOrUpdate`:
   
   1. `OnDelete` (line 532) removes the node from `p.registered` 
(client.go:235) but the connection remains in `p.active`
   2. `OnAddOrUpdate` (line 534) re-adds the node to `p.registered` 
(client.go:196)
   3. However, `OnAddOrUpdate` checks if the node is already in `p.active` 
(client.go:161-162) and returns early without creating a new connection
   4. The old connection with the old CA certificate remains active
   
   As a result, no reconnection actually occurs, and clients continue using the 
old CA certificate.
   
   To fix this, you need to ensure the connection is actually closed and 
removed from `p.active` before calling `OnAddOrUpdate`. Consider this approach:
   
   ```go
   func (p *pub) reconnectAllClients() {
       // Collect nodes and close connections
       p.mu.Lock()
       nodesToReconnect := make([]schema.Metadata, 0, len(p.registered))
       for name, node := range p.registered {
           if client, ok := p.active[name]; ok {
               _ = client.conn.Close()
               delete(p.active, name)
               p.deleteClient(client.md)
           }
           md := schema.Metadata{
               TypeMeta: schema.TypeMeta{Kind: schema.KindNode},
               Spec:     node,
           }
           nodesToReconnect = append(nodesToReconnect, md)
       }
       p.mu.Unlock()
   
       // Reconnect with new credentials
       for _, md := range nodesToReconnect {
           p.OnAddOrUpdate(md)
       }
   }
   ```
   ```suggestion
        // Collect nodes and close connections
        p.mu.Lock()
        nodesToReconnect := make([]schema.Metadata, 0, len(p.registered))
        for name, node := range p.registered {
                if client, ok := p.active[name]; ok {
                        _ = client.conn.Close()
                        delete(p.active, name)
                        p.deleteClient(client.md)
                }
                md := schema.Metadata{
                        TypeMeta: schema.TypeMeta{Kind: schema.KindNode},
                        Spec:     node,
                }
                nodesToReconnect = append(nodesToReconnect, md)
        }
        p.mu.Unlock()
   
        // Reconnect with new credentials
        for _, md := range nodesToReconnect {
   ```



##########
docs/operation/security.md:
##########
@@ -123,19 +123,33 @@ BanyanDB supports enabling TLS for the internal gRPC 
queue between liaison and d
 
 The following flags are used to configure internal TLS:
 
-- `--internal-tls`: Enable TLS on the internal queue client inside Liaison; if 
false, the queue uses plain TCP.
-- `--internal-ca-cert <path>`: PEM‑encoded CA (or bundle) that the queue 
client uses to verify Data‑Node server certificates.
+- `--data-client-tls`: Enable TLS on the internal queue client inside Liaison; 
if false, the queue uses plain TCP.
+- `--data-client-ca-cert`: PEM‑encoded CA (or bundle) that the queue client 
uses to verify Data‑Node server certificates.
 
 Each Liaison/Data process still advertises its certificate with the public 
flags (`--tls`, `--cert-file`, `--key-file`). The same certificate/key pair can 
be reused for both external traffic and the internal queue.
 
 **Example: Enable internal TLS between liaison and data nodes**
 
 ```shell
-banyand liaison --internal-tls=true --internal-ca-cert=ca.crt --tls=true 
--cert-file=server.crt --key-file=server.key
+banyand liaison --data-client-tls=true --data-client-ca-cert=ca.crt --tls=true 
--cert-file=server.crt --key-file=server.key
 banyand data --tls=true --cert-file=server.crt --key-file=server.key
 ```
 
-> Note: The `--internal-ca-cert` should point to the CA certificate used to 
sign the data node's server certificate.
+> Note: 
+> - The `--data-client-ca-cert` should point to the CA certificate used to 
sign the data node's server certificate.
+> - Data nodes act as servers and do not need a CA certificate to connect to 
liaison nodes (liaison nodes connect to data nodes, not vice versa).
+> - The flag names use the prefix "data" because liaison nodes connect to data 
nodes. The actual flag names are `--data-client-tls` and 
`--data-client-ca-cert`.

Review Comment:
   [nitpick] Line 141 contains redundant information. The bullet point explains 
that "The flag names use the prefix 'data' because liaison nodes connect to 
data nodes" and then adds "The actual flag names are `--data-client-tls` and 
`--data-client-ca-cert`", but these flag names were already documented in lines 
126-127 above.
   
   Consider removing the redundant part or making it more meaningful:
   
   ```markdown
   > - The flag names use the prefix "data" because liaison nodes connect to 
data nodes as clients.
   ```
   
   Or restructure the note to flow better without repetition.
   ```suggestion
   > - The flag names use the prefix "data" because liaison nodes connect to 
data nodes.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to