On Tue, Jan 13, 2026 at 01:28:59PM +0100, Thomas Weißschuh wrote:
> The current signature-based module integrity checking has some drawbacks
> in combination with reproducible builds. Either the module signing key
> is generated at build time, which makes the build unreproducible, or a
> static signing key is used, which precludes rebuilds by third parties
> and makes the whole build and packaging process much more complicated.
I think this actually undersells the feature. It's also much simpler
than the signature-based module authentication. The latter relies on
PKCS#7, X.509, ASN.1, OID registry, crypto_sig API, etc in addition to
the implementations of the actual signature algorithm (RSA / ECDSA /
ML-DSA) and at least one hash algorithm.
I've even seen a case where the vmlinux size decreases by almost 200KB
just by disabling module signing. That's not even counting the
signatures themselves, which ML-DSA has increased to 2-5 KB each.
The hashes are much simpler, even accounting for the Merkle tree proofs
that make them scalable. They're less likely to have vulnerabilities
like the PKCS#7 bugs the kernel has had historically. They also
eliminate the dependency on a lot of userspace libcrypto functionality
that has been causing portability problems, such as the CMS functions.
So I think this is how module authentication should have been done
originally, and I'm glad to see this is finally being fixed.
> +struct module_hashes_proof {
> + __be32 pos;
> + u8 hash_sigs[][MODULE_HASHES_HASH_SIZE];
> +} __packed;
Is the choice of big endian for consistency with struct
module_signature? Little endian is the usual choice in new code.
> diff --git a/include/linux/module_signature.h
> b/include/linux/module_signature.h
> index a45ce3b24403..3b510651830d 100644
> --- a/include/linux/module_signature.h
> +++ b/include/linux/module_signature.h
> @@ -18,6 +18,7 @@ enum pkey_id_type {
> PKEY_ID_PGP, /* OpenPGP generated key ID */
> PKEY_ID_X509, /* X.509 arbitrary subjectKeyIdentifier */
> PKEY_ID_PKCS7, /* Signature in PKCS#7 message */
> + PKEY_ID_MERKLE, /* Merkle proof for modules */
I recommend making the hash algorithm explicit:
PKEY_ID_MERKLE_SHA256, /* SHA-256 merkle proof for modules */
While I wouldn't encourage the addition of another hash algorithm
(specifying one good algorithm for now is absolutely the right choice),
if someone ever does need to add another one, we'd want them to be
guided to simply introduce a new value of this enum rather than hack it
in some other way.
> +static void hash_entry(const void *left, const void *right, void *out)
Byte arrays should use u8 instead of void
> diff --git a/scripts/modules-merkle-tree.c b/scripts/modules-merkle-tree.c
[...]
> +struct file_entry {
> + char *name;
> + unsigned int pos;
> + unsigned char hash[EVP_MAX_MD_SIZE];
Considering that the hash algorithm is fixed, EVP_MAX_MD_SIZE can be
replaced with a tighter local definition:
#define MAX_HASH_SIZE 32
> +static struct file_entry *fh_list;
> +static size_t num_files;
> +
> +struct leaf_hash {
> + unsigned char hash[EVP_MAX_MD_SIZE];
> +};
> +
> +struct mtree {
> + struct leaf_hash **l;
> + unsigned int *entries;
> + unsigned int levels;
> +};
'struct leaf_hash' is confusing because it's actually used for the
hashes of internal nodes, not leaf nodes.
Maybe rename it to 'struct hash' and use it for both the hashes and leaf
nodes and internal nodes.
Also, clearer naming would improve readability, e.g.:
struct merkle_tree {
struct hash **level_hashes;
unsigned int level_size;
unsigned int num_levels;
};
> +static void hash_data(void *p, unsigned int pos, size_t size, void *ret_hash)
static void hash_data(const uint8_t *data, unsigned int pos,
size_t size, struct hash *ret_hash)
> + unsigned char magic = 0x01;
uint8_t
Also, when defining these magic numbers, maybe explicitly document that
they are domain separation prefixes:
uint8_t magic = 0x01; /* domain separation prefix */
> + unsigned int pos_be;
uint32_t
> +static void hash_entry(void *left, void *right, void *ret_hash)
Could use stronger typing:
static void hash_entry(const struct hash *left, const struct hash *right,
struct hash *ret_hash)
> +static struct mtree *build_merkle(struct file_entry *fh, size_t num)
Could use clearer names, and constify the file_entry array:
static struct merkle_tree *build_merkle(const struct file_entry *files,
size_t num_files)
> + /* First level of pairs */
> + for (unsigned int i = 0; i < num; i += 2) {
> + if (i == num - 1) {
> + /* Odd number of files, no pair. Hash with itself */
> + hash_entry(fh[i].hash, fh[i].hash, mt->l[0][i /
> 2].hash);
> + } else {
> + hash_entry(fh[i].hash, fh[i + 1].hash, mt->l[0][i /
> 2].hash);
> + }
> + }
> + for (unsigned int i = 1; i < mt->levels; i++) {
> + int odd = 0;
> +
> + if (le & 1) {
> + le++;
> + odd++;
> + }
> +
> + mt->entries[i] = le / 2;
> + mt->l[i] = xcalloc(sizeof(**mt->l), le);
> +
> + for (unsigned int n = 0; n < le; n += 2) {
> + if (n == le - 2 && odd) {
> + /* Odd number of pairs, no pair. Hash with
> itself */
> + hash_entry(mt->l[i - 1][n].hash, mt->l[i -
> 1][n].hash,
> + mt->l[i][n / 2].hash);
> + } else {
> + hash_entry(mt->l[i - 1][n].hash, mt->l[i - 1][n
> + 1].hash,
> + mt->l[i][n / 2].hash);
> + }
> + }
> + le = mt->entries[i];
> + }
There should be an assertion at the end that we ended up with exactly 1
hash in the root level.
It might also be possible to refactor this code such that the leaf nodes
and internal nodes are handled in the same loop, rather than handling
the leaf nodes as a special case.
> +static void write_merkle_root(struct mtree *mt, const char *fp)
fp => filename since it's a string, not e.g. a 'FILE *'.
> +{
> + char buf[1024];
> + unsigned int levels;
> + unsigned char *h;
> + FILE *f;
> +
> + if (mt) {
> + levels = mt->levels;
> + h = mt->l[mt->levels - 1][0].hash;
> + } else {
> + levels = 0;
> + h = xcalloc(1, hash_size);
> + }
> +
> + f = fopen(fp, "w");
> + if (!f)
> + err(1, "Failed to create %s", buf);
Above should log the name of the file. 'buf' should be removed.
> +static char *xstrdup_replace_suffix(const char *str, const char *new_suffix)
> +{
> + const char *current_suffix;
> + size_t base_len;
> +
> + current_suffix = strchr(str, '.');
> + if (!current_suffix)
> + errx(1, "No existing suffix in '%s'", str);
This doesn't handle base names that contain a period. strrchr() would
work if the old suffix always contains exactly one period. Otherwise
another solution would be needed to identify the old suffix.
> +static __attribute__((noreturn))
> +void format(void)
usage()
> +{
> + fprintf(stderr,
> + "Usage: scripts/modules-merkle-tree <root definition>\n");
> + exit(2);
This should show both parameters, <root hash> <new suffix>
But they probably should be flipped to put the output second.
Though, is <new suffix> needed at all? It looks like it doesn't
actually affect the output.
> + hash_evp = EVP_get_digestbyname("sha256");
EVP_sha256()
> + hash_size = EVP_MD_get_size(hash_evp);
The old name 'EVP_MD_size()' would have wider compatibility.
> + ERR(hash_size <= 0, "EVP_get_digestbyname");
Log message should say EVP_MD_size
> + for (unsigned int i = 0; i < num_files; i++) {
size_t, for consistency with the type of num_files
> + fd = open(signame, O_WRONLY | O_CREAT | O_TRUNC, 0644);
> + if (fd < 0)
> + err(1, "Can't create %s", signame);
> +
> + build_proof(mt, i, fd);
> + append_module_signature_magic(fd, lseek(fd, 0, SEEK_CUR));
Maybe build_and_append_proof()?
- Eric