Hi everyone,

Over the past while, we've dealt with several CVEs and bug reports related to maliciously crafted packets causing crashes or undefined behavior on unencrypted transports. To drastically minimize this unauthenticated remote attack vector, I am proposing a fundamental shift to a "secure by default" posture for Corosync.

I have opened a PR to enforce encryption at compile time for default builds:
https://github.com/corosync/corosync/pull/821

What this PR does:
In a default build, the code to handle unencrypted traffic is completely omitted. Unencrypted configurations will be rejected, and the legacy totemudp and udpu transports will be entirely unavailable. Encryption is strictly mandatory.

The Escape Hatch:
To be clear, this does not break highly specific edge cases or legacy systems - it just shifts the burden of choice. Package maintainers or users compiling from source who absolutely need the old behavior can consciously opt-in using two new configure flags:

--enable-unencrypted: Allows crypto_cipher and crypto_hash to be set to 'none'.

--enable-udpu: Restores the legacy totemudp and udpu transports (strictly requires --enable-unencrypted).

To ensure administrators can easily audit their binaries, corosync -v will now explicitly display enforce_encryption and without_udpu for standard builds, or unencrypted and udpu if the legacy flags were used.

Anticipated concerns:
I want to proactively address a few valid arguments against this change, and explain why the CVE risk and "secure by default" philosophy still take precedence:

- Private/Isolated Networks:
Many clusters run on private VLANs or dedicated backend cluster networks. While valid, relying solely on the network layer for security is risky (Defense in Depth). A misconfigured VLAN or compromised adjacent machine shatters that isolation. Furthermore, every time a fuzzer finds a flaw in the legacy UDP transport, we have to drop everything for a CVE. Enforcing encryption neutralizes this class of unauthenticated remote attacks. We cannot leave standard deployments vulnerable just to save isolated power users from adding a compile flag.

- Setup Complexity and Maintenance:
Setting up an unencrypted cluster is undeniably easier because you don't have to generate and distribute an authkey. Unencrypted traffic is also easier to inspect with tcpdump when troubleshooting. However, we cannot sacrifice baseline production security for configuration and debugging convenience. Distributing an authkey is a standard, one-time operation easily handled by modern automation, and debugging practices must evolve to match secure standards.

- Lower Memory Footprint of UDP(U):
Legacy UDP and UDPU transports have a smaller memory footprint compared to KNET, which is sometimes preferred in embedded devices or heavily constrained edge environments. If you are running an embedded system where KNET's memory usage is a dealbreaker, the default build is simply not for you—the --enable-udpu flag exists specifically for this hardware profile.

- Performance and Latency Overhead:
Some might argue that forcing crypto adds latency. Modern CPUs handle AES (especially via AES-NI) with incredible efficiency, making the overhead negligible for 99% of workloads. For the 1% (like ultra-low-latency HPC) where every microsecond is critical, the compile-time escape hatch allows you to bypass it.

- Upgrade Path and Distro Breakage:
Existing unencrypted clusters will fail to start if they upgrade to a standard default build of this new version. While we take backwards compatibility seriously, security must eventually take priority over legacy convenience. Distro maintainers can choose to compile with the escape hatch flags if they absolutely must maintain seamless upgrades for a specific release cycle.

Feedback requested:
Before moving forward, I would like to get the community's eyes on this patchset. I am specifically looking for technical and architectural feedback.

Because the compile-time escape hatch exists, niche use cases are fully covered. I'm not looking for "we've always done it this way" feedback, but if there is a fundamental, fact-based technical reason why a standard, modern Corosync deployment cannot enforce encryption by default that I haven't considered above, I definitely want to hear it.

Please take a look at the PR and drop your reviews, ACKs, or technical concerns either here on the list or directly on GitHub.

Thanks,
  Honza

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to