Hi all,

This series adds API for 128-bit memory IO access and enables it for ARM64.
The original motivation for 128-bit API came from new Cavium network device
driver. The hardware requires 128-bit access to make things work. See
description in patch 3 for details.

Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
API for 128-bit access would be helpful in core arm64 code.

This series is RFC. I'd like to collect opinions on idea and implementation
details.
* I didn't implement all 128-bit operations existing for 64-bit variables
and other types (__swab128p etc). Do we need them all right now, or we
can add them when actually needed?
* u128 name is already used in crypto code. So here I use __uint128_t that
comes from GCC for 128-bit types. Should I rename existing type in crypto
and make core code for 128-bit variables consistent with u64, u32 etc? (I
think yes, but would like to ask crypto people for it.)
* Some compilers don't support __uint128_t, so I protected all generic code
with config option HAVE_128BIT_ACCESS. I think it's OK, but... 
* For 128-bit read/write functions I take suffix 'o', which means read/write
the octet of bytes. Is this name OK?
* my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
don't have other BE setup on hand, so BE case is formally not tested.
BE code for arm64 is looking well though.

With all that, this example code:

static int __init 128bit_test(void)
{
        __uint128_t v;
        __uint128_t addr;
        __uint128_t val = (__uint128_t) 0x1234567890abc;

        val |= ((__uint128_t) 0xdeadbeaf) << 64;

        writeo(val, &addr);
        v = reado(&addr);

        pr_err("%llx%llx\n", (u64) (val >> 64), (u64) val);
        pr_err("%llx%llx\n", (u64) (v >> 64), (u64) v);
        return v != val;
}

Generates this listing for arm64-le:

0000000000000000 <128bit_test>:
   0:   a9bb7bfd        stp     x29, x30, [sp, #-80]!
   4:   910003fd        mov     x29, sp
   8:   a90153f3        stp     x19, x20, [sp, #16]
   c:   a9025bf5        stp     x21, x22, [sp, #32]
  10:   f9001bf7        str     x23, [sp, #48]
  14:   d5033e9f        dsb     st
  18:   d2815797        mov     x23, #0xabc                     // #2748
  1c:   d297d5f6        mov     x22, #0xbeaf                    // #48815
  20:   f2acf137        movk    x23, #0x6789, lsl #16
  24:   f2bbd5b6        movk    x22, #0xdead, lsl #16
  28:   f2c468b7        movk    x23, #0x2345, lsl #32
  2c:   f2e00037        movk    x23, #0x1, lsl #48
  30:   a9045bb7        stp     x23, x22, [x29, #64]
  34:   a94453b3        ldp     x19, x20, [x29, #64]
  38:   d5033d9f        dsb     ld
  3c:   90000015        adrp    x21, 0 <128bit_test>
  40:   910002b5        add     x21, x21, #0x0
  44:   aa1703e2        mov     x2, x23
  48:   aa1603e1        mov     x1, x22
  4c:   aa1503e0        mov     x0, x21
  50:   94000000        bl      0 <printk>
  54:   aa1303e2        mov     x2, x19
  58:   aa1403e1        mov     x1, x20
  5c:   ca170273        eor     x19, x19, x23
  60:   ca160294        eor     x20, x20, x22
  64:   aa1503e0        mov     x0, x21
  68:   aa140273        orr     x19, x19, x20
  6c:   94000000        bl      0 <printk>
  70:   f9401bf7        ldr     x23, [sp, #48]
  74:   f100027f        cmp     x19, #0x0
  78:   a94153f3        ldp     x19, x20, [sp, #16]
  7c:   1a9f07e0        cset    w0, ne  // ne = any
  80:   a9425bf5        ldp     x21, x22, [sp, #32]
  84:   a8c57bfd        ldp     x29, x30, [sp], #80
  88:   d65f03c0        ret

And for arm64-be:

0000000000000000 <128bit_test>:
   0:   a9bb7bfd        stp     x29, x30, [sp, #-80]!
   4:   910003fd        mov     x29, sp
   8:   a90153f3        stp     x19, x20, [sp, #16]
   c:   a9025bf5        stp     x21, x22, [sp, #32]
  10:   f9001bf7        str     x23, [sp, #48]
  14:   d5033e9f        dsb     st
  18:   d2802001        mov     x1, #0x100                      // #256
  1c:   d2d5bbc0        mov     x0, #0xadde00000000             // 
#191168994344960
  20:   f2a8a461        movk    x1, #0x4523, lsl #16
  24:   f2f5f7c0        movk    x0, #0xafbe, lsl #48
  28:   f2d12ce1        movk    x1, #0x8967, lsl #32
  2c:   f2f78141        movk    x1, #0xbc0a, lsl #48
  30:   a90407a0        stp     x0, x1, [x29, #64]
  34:   a94453b3        ldp     x19, x20, [x29, #64]
  38:   dac00e73        rev     x19, x19
  3c:   dac00e94        rev     x20, x20
  40:   d5033d9f        dsb     ld
  44:   d2815796        mov     x22, #0xabc                     // #2748
  48:   90000015        adrp    x21, 0 <128bit_test>
  4c:   f2acf136        movk    x22, #0x6789, lsl #16
  50:   910002b5        add     x21, x21, #0x0
  54:   f2c468b6        movk    x22, #0x2345, lsl #32
  58:   d297d5f7        mov     x23, #0xbeaf                    // #48815
  5c:   f2e00036        movk    x22, #0x1, lsl #48
  60:   f2bbd5b7        movk    x23, #0xdead, lsl #16
  64:   aa1603e2        mov     x2, x22
  68:   aa1703e1        mov     x1, x23
  6c:   aa1503e0        mov     x0, x21
  70:   94000000        bl      0 <printk>
  74:   aa1403e2        mov     x2, x20
  78:   aa1303e1        mov     x1, x19
  7c:   ca160294        eor     x20, x20, x22
  80:   ca170273        eor     x19, x19, x23
  84:   aa1503e0        mov     x0, x21
  88:   aa140273        orr     x19, x19, x20
  8c:   94000000        bl      0 <printk>
  90:   f9401bf7        ldr     x23, [sp, #48]
  94:   f100027f        cmp     x19, #0x0
  98:   a94153f3        ldp     x19, x20, [sp, #16]
  9c:   1a9f07e0        cset    w0, ne  // ne = any
  a0:   a9425bf5        ldp     x21, x22, [sp, #32]
  a4:   a8c57bfd        ldp     x29, x30, [sp], #80
  a8:   d65f03c0        ret

I tested LE kernel with this, and it works OK for me. BE version adds
few extra instructions to swap bytes, but generated code looks reasonable. 
We can avoid byteswapping, if not needed, by using __raw_reado() and 
__raw_writeo().

Yury Norov (3):
  UAPI: Introduce 128-bit types and byteswap operations
  asm-generic/io.h: API for 128-bit I/O accessors
  arm64: enable 128-bit memory read/write support

 arch/Kconfig                                 |   7 ++
 arch/arm64/include/asm/io.h                  |  31 ++++++
 include/asm-generic/io.h                     | 147 +++++++++++++++++++++++++++
 include/linux/byteorder/generic.h            |   4 +
 include/uapi/asm-generic/int-ll64.h          |   8 ++
 include/uapi/linux/byteorder/big_endian.h    |   2 +
 include/uapi/linux/byteorder/little_endian.h |   4 +
 include/uapi/linux/swab.h                    |  22 ++++
 include/uapi/linux/types.h                   |   4 +
 9 files changed, 229 insertions(+)

-- 
2.11.0

Reply via email to