On Mon, 5 Jan 2026 11:05:18 +0000 Ryan Roberts <[email protected]> wrote:
> On 04/01/2026 23:01, David Laight wrote: > > On Fri, 2 Jan 2026 13:11:54 +0000 > > Ryan Roberts <[email protected]> wrote: > > > >> Previously different architectures were using random sources of > >> differing strength and cost to decide the random kstack offset. A number > >> of architectures (loongarch, powerpc, s390, x86) were using their > >> timestamp counter, at whatever the frequency happened to be. Other > >> arches (arm64, riscv) were using entropy from the crng via > >> get_random_u16(). > >> > >> There have been concerns that in some cases the timestamp counters may > >> be too weak, because they can be easily guessed or influenced by user > >> space. And get_random_u16() has been shown to be too costly for the > >> level of protection kstack offset randomization provides. > >> > >> So let's use a common, architecture-agnostic source of entropy; a > >> per-cpu prng, seeded at boot-time from the crng. This has a few > >> benefits: > >> > >> - We can remove choose_random_kstack_offset(); That was only there to > >> try to make the timestamp counter value a bit harder to influence > >> from user space. > >> > >> - The architecture code is simplified. All it has to do now is call > >> add_random_kstack_offset() in the syscall path. > >> > >> - The strength of the randomness can be reasoned about independently > >> of the architecture. > >> > >> - Arches previously using get_random_u16() now have much faster > >> syscall paths, see below results. > >> > >> There have been some claims that a prng may be less strong than the > >> timestamp counter if not regularly reseeded. But the prng has a period > >> of about 2^113. So as long as the prng state remains secret, it should > >> not be possible to guess. If the prng state can be accessed, we have > >> bigger problems. > > > > If you have 128 bits of output from consecutive outputs I think you > > can trivially determine the full state using (almost) 'school boy' maths > > that could be done on pencil and paper. > > (Most of the work only has to be done once.) > > > > The underlying problem is that the TAUSWORTHE() transformation is 'linear' > > So that TAUSWORTHE(x ^ y) == TAUSWORTHE(x) ^ TAUSWORTHE(y). > > (This is true of a LFSR/CRC and TOUSWORTH() is doing some subset of CRCs.) > > This means that each output bit is the 'xor' of some of the input bits. > > The four new 'state' values are just xor of the the bits of the old ones. > > The final xor of the four states gives a 32bit value with each bit just > > an xor of some of the 128 state bits. > > Get four consecutive 32 bit values and you can solve the 128 simultaneous > > equations (by trivial substitution) and get the initial state. > > The solution gives you the 128 128bit constants for: > > u128 state = 0; > > u128 val = 'value returned from 4 calls'; > > for (int i = 0; i < 128; i++) > > state |= parity(const128[i] ^ val) << i; > > What is const128[] here? Some values you prepared earlier :-) > > You done need all 32bits, just accumulate 128 bits. > > So if you can get the 5bit stack offset from 26 system calls you know the > > value that will be used for all the subsequent calls. > > It's not immediately obvious to me how user space would do this, but I'll take > it on faith that it may be possible. It shouldn't be possible, but anything that leaks a stack address would give it away. It is also pretty much why you care about the cycle length of the PRNG. (If the length is short a rogue application can remember all the values.) > > > > Simply changing the final line to use + not ^ makes the output non-linear > > and solving the equations a lot harder. > > There has been pushback on introducing new primitives [1] but I don't think > that's a reason not to considder it. That is a more general issue with the PRNG. ISTR it was true for the previous version that explicitly used four CRC. Jason should know more about whether the xor are a good idea. > > [1] https://lore.kernel.org/all/[email protected]/ > > > > > I might sit down tomorrow and see if I can actually code it... > > Thanks for the analysis! I look forward to seeing your conclusion... although > I'm not sure I'll be qualified to evaluate it mathematically. I need to drag out the brian cells from when I learnt about CRC (actually relating to burst error correction) over 40 years ago... > FWIW, I previously had a go at various schemes using siphash to calculate some > random bits. I found it to be significantly slower than this prng. I've so far > taken the view that 6 bits of randomness is not much of a defence against > brute > force so we really shouldn't be spending too many cycles to generate the bits. > If we can get to approach to work, I think that's best. Indeed. A single 32bit CRC using (crc + (crc >> 16)) & 0x3f could be 'good enough'. Especially if the value is 'perturbed' during (say) context switch. The '16' might need adjusting for the actual CRC, especially if TAUSWORTHE() is used - you don't want the value to match one of the shifts it uses. prandom_u32_state() is defined as: #define TAUSWORTHE(s, a, b, c, d) ((s & c) << d) ^ (((s << a) ^ s) >> b) state->s1 = TAUSWORTHE(state->s1, 6U, 13U, 4294967294U, 18U); state->s2 = TAUSWORTHE(state->s2, 2U, 27U, 4294967288U, 2U); state->s3 = TAUSWORTHE(state->s3, 13U, 21U, 4294967280U, 7U); state->s4 = TAUSWORTHE(state->s4, 3U, 12U, 4294967168U, 13U); return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4); This is equivalent to: #define TAUSWORTHE(s, a, b, c, d) ((s & ~c) << d) ^ (s >> a) ^ (s >> b) state->s1 = TAUSWORTHE(state->s1, 7, 13, 1, 18); state->s2 = TAUSWORTHE(state->s2, 25, 27, 7, 2); state->s3 = TAUSWORTHE(state->s3, 8, 21, 15, 7); state->s4 = TAUSWORTHE(state->s4, 9, 12, 127, 13); which makes it clear that some low bits of each 's' get discarded reducing the length of each CRC to (I think) 31, 29, 28 and 25. Since 'b + d' matches the bits discarded by 'c', two of those shifts are actually just a rotate, so there isn't really much 'bit stirring' going on. By comparison CRC-16 (for hdlc comms like x25, isdn and ss7) reduces to: u32 crc_step(u32 crc, u8 byte_val) { u8 t = crc ^ byte_val; t = (t ^ t << 4); return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4; } Much more 'stirring'. David
