On 02/01/2026 13:39, Jason A. Donenfeld wrote:
> Hi Ryan,
> 
> On Fri, Jan 2, 2026 at 2:12 PM Ryan Roberts <[email protected]> wrote:
>> context. Given the function is just a handful of operations and doesn't
> 
> How many? What's this looking like in terms of assembly? 

25 instructions on arm64:

0000000000000000 <prandom_u32_state>:
   0:   29401403        ldp     w3, w5, [x0]
   4:   aa0003e1        mov     x1, x0
   8:   29410002        ldp     w2, w0, [x0, #8]
   c:   531e74a4        lsl     w4, w5, #2
  10:   530e3468        lsl     w8, w3, #18
  14:   4a0400a5        eor     w5, w5, w4
  18:   4a031863        eor     w3, w3, w3, lsl #6
  1c:   53196047        lsl     w7, w2, #7
  20:   53134806        lsl     w6, w0, #13
  24:   4a023442        eor     w2, w2, w2, lsl #13
  28:   4a000c00        eor     w0, w0, w0, lsl #3
  2c:   121b6884        and     w4, w4, #0xffffffe0
  30:   120d3108        and     w8, w8, #0xfff80000
  34:   121550e7        and     w7, w7, #0xfffff800
  38:   120c2cc6        and     w6, w6, #0xfff00000
  3c:   2a456c85        orr     w5, w4, w5, lsr #27
  40:   2a433504        orr     w4, w8, w3, lsr #13
  44:   2a4254e3        orr     w3, w7, w2, lsr #21
  48:   2a4030c2        orr     w2, w6, w0, lsr #12
  4c:   4a020066        eor     w6, w3, w2
  50:   4a050080        eor     w0, w4, w5
  54:   4a0000c0        eor     w0, w6, w0
  58:   29001424        stp     w4, w5, [x1]
  5c:   29010823        stp     w3, w2, [x1, #8]
  60:   d65f03c0        ret

> It'd also be
> nice to have some brief analysis of other call sites to have
> confirmation this isn't blowing up other users.

I compiled defconfig before and after this patch on arm64 and compared the text
sizes:

$ ./scripts/bloat-o-meter -t vmlinux.before vmlinux.after
add/remove: 3/4 grow/shrink: 4/1 up/down: 836/-128 (708)
Function                                     old     new   delta
prandom_seed_full_state                      364     932    +568
pick_next_task_fair                         1940    2036     +96
bpf_user_rnd_u32                             104     196     +92
prandom_bytes_state                          204     260     +56
e843419@0f2b_00012d69_e34                      -       8      +8
e843419@0db7_00010ec3_23ec                     -       8      +8
e843419@02cb_00003767_25c                      -       8      +8
bpf_prog_select_runtime                      448     444      -4
e843419@0aa3_0000cfd1_1580                     8       -      -8
e843419@0aa2_0000cfba_147c                     8       -      -8
e843419@075f_00008d8c_184                      8       -      -8
prandom_u32_state                            100       -    -100
Total: Before=19078072, After=19078780, chg +0.00%

So 708 bytes more after inlining. The main cost is prandom_seed_full_state(),
which calls prandom_u32_state() 10 times (via prandom_warmup()). I expect we
could turn that into a loop to reduce ~450 bytes overall.

I'm not really sure if 708 is good or bad...

> 
>> +static __always_inline u32 prandom_u32_state(struct rnd_state *state)
> 
> Why not just normal `inline`? Is gcc disagreeing with the inlinability
> of this function?

Given this needs to be called from a noinstr function, I didn't want to give the
compiler the opportunity to decide not to inline it, since in that case, some
instrumentation might end up being applied to the function body which would blow
up when called in the noinstr context.

I think the other 2 options are to keep prandom_u32_state() in the c file but
mark it noinstr or rearrange all the users so that thay don't call it until
instrumentation is allowable. The latter is something I was trying to avoid.

There is some previous discussion of this at [1].

[1] https://lore.kernel.org/all/aS65LFUfdgRPKv1l@J2N7QTR9R3/

Perhaps keeping prandom_u32_state() in the c file and making it noinstr is the
best compromise?

Thanks,
Ryan

> 
> Jason


Reply via email to