On Wed, Aug 19, 2020 at 03:33:20PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-08-19 15:15:07 [+0200], pet...@infradead.org wrote:

> If you want to optimize further, we could move PF_IO_WORKER to an lower
> bit. x86 can test for both via
> (gcc-10)
> |         testl   $536870944, 44(%rbp)    #, _11->flags
> |         jne     .L1635  #,
> 
> (clang-9)
> |         testl   $536870944, 44(%rbx)    # imm = 0x20000020
> |         je      .LBB112_6
> 
> 
> but ARM can't and does
> |          ldr     r1, [r5, #16]   @ tsk_3->flags, tsk_3->flags
> |         mov     r2, #32 @ tmp157,
> |         movt    r2, 8192        @ tmp157,
> |         tst     r2, r1  @ tmp157, tsk_3->flags
> |         beq     .L998           @,
> 
> same ARM64
> |         ldr     w0, [x20, 60]   //, _11->flags
> |         and     w0, w0, 1073741792      // tmp117, _11->flags,
> |         and     w0, w0, -536870849      // tmp117, tmp117,
> |         cbnz    w0, .L453       // tmp117,
> 
> using 0x10 for PF_IO_WORKER instead will turn this into:
> |         ldr     w0, [x20, 60]   //, _11->flags
> |         tst     w0, 48  // _11->flags,
> |         bne     .L453           //,
> 
> ARM:
> |         ldr     r2, [r5, #16]   @ tsk_3->flags, tsk_3->flags
> |         tst     r2, #48 @ tsk_3->flags,
> |         beq     .L998           @,

Good point, AFAICT there's a number of low bits still open (and we can
shuffle if we have to), so sure put a patch in to that effect while
you're at it.


Reply via email to