On Tue, 2020-05-05 at 19:44 +0200, Christian Brauner wrote: > Jan reported an issue where an interaction between sign-extending clone's > flag argument on ppc64le and the new CLONE_INTO_CGROUP feature causes > clone() to consistently fail with EBADF. [] > Let's fix this by always capping the upper 32 bits for the legacy clone() > syscall. This ensures that we can't reach clone3() only features by > accident via legacy clone as with the sign extension case and also that > legacy clone() works exactly like before, i.e. ignoring any unknown flags. > This solution risks no regressions and is also pretty clean. > > I've chosen u32 and not unsigned int to visually indicate that we're > capping this to 32 bits.
Perhaps use the lower_32_bits macro? > diff --git a/kernel/fork.c b/kernel/fork.c [] > @@ -2569,12 +2569,21 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, > unsigned long, newsp, > unsigned long, tls) > #endif > { > + /* > + * On 64 bit unsigned long can be used by userspace to > + * pass flag values only useable with clone3(). So cap > + * the flag argument to the lower 32 bits. This is fine, > + * since legacy clone() has traditionally ignored unknown > + * flag values. So don't break userspace workloads that > + * (on accident or on purpose) rely on this. > + */ > + u32 flags = (u32)clone_flags; > struct kernel_clone_args args = { > - .flags = (clone_flags & ~CSIGNAL), > + .flags = (flags & ~CSIGNAL), so: .flags = lower_32_bits(clone_flags) & ~CSIGNAL; > .pidfd = parent_tidptr, > .child_tid = child_tidptr, > .parent_tid = parent_tidptr, > - .exit_signal = (clone_flags & CSIGNAL), > + .exit_signal = (flags & CSIGNAL), .exit_signal = lower_32_bits(clone_flags) & CSIGNAL; > .stack = newsp, > .tls = tls, > }; > > base-commit: 0e698dfa282211e414076f9dc7e83c1c288314fd