On 25/09/2023 20:04, Clément Léger wrote:
> 
> 
> On 25/09/2023 18:04, Beau Belgrave wrote:
>> On Mon, Sep 25, 2023 at 09:53:16AM +0200, Clément Léger wrote:
>>>
>>>
>>> On 22/09/2023 21:22, Beau Belgrave wrote:
>>>> On Tue, Sep 19, 2023 at 02:59:12PM +0200, Clément Léger wrote:
>>>>>
>>>>>
>>>>> On 14/09/2023 19:29, Steven Rostedt wrote:
>>>>>> On Thu, 14 Sep 2023 13:17:00 -0400
>>>>>> Steven Rostedt <rost...@goodmis.org> wrote:
>>>>>>
>>>>>>> Now lets look at big endian layout:
>>>>>>>
>>>>>>>  uaddr = 0xbeef0004
>>>>>>>  enabler = 1;
>>>>>>>
>>>>>>>  memory at 0xbeef0000:  00 00 00 00 00 00 00 02
>>>>>>>                                     ^
>>>>>>>                                     addr: 0xbeef0004
>>>>>>>
>>>>>>>                                 (enabler is set )
>>>>>>>
>>>>>>>         bitoffset = uaddr & (sizeof(unsigned long) - 1); bitoffset = 4
>>>>>>>         bit_offset *= 8;                                 bitoffset = 32
>>>>>>>         uaddr &= ~(sizeof(unsigned long) - 1);           uaddr = 
>>>>>>> 0xbeef0000
>>>>>>>
>>>>>>>         ptr = kaddr + (uaddr & ~PAGE_MASK);
>>>>>>>
>>>>>>>         clear_bit(1 + 32, ptr);
>>>>>>>
>>>>>>>  memory at 0xbeef0000:  00 00 00 00 00 00 00 02
>>>>>>>                                   ^
>>>>>>>                                 bit 33 of 0xbeef0000
>>>>>>>
>>>>>>> I don't think that's what you expected!
>>>>>>
>>>>>> I believe the above can be fixed with:
>>>>>>
>>>>>>  bit_offset = uaddr & (sizeof(unsigned long) - 1);
>>>>>>  if (bit_offset) {
>>>>>> #ifdef CONFIG_CPU_BIG_ENDIAN
>>>>>>          bit_offest = 0;
>>>>>> #else
>>>>>>          bit_offset *= BITS_PER_BYTE;
>>>>>> #endif
>>>>>>          uaddr &= ~(sizeof(unsigned long) - 1);
>>>>>>  }
>>>>>>
>>>>>> -- Steve
>>>>>
>>>>>
>>>>> Actually, after looking more in depth at that, it seems like there are
>>>>> actually 2 problems that can happen.
>>>>>
>>>>> First one is atomic access misalignment due to enable_size == 4 and
>>>>> uaddr not being aligned on a (long) boundary on 64 bits architecture.
>>>>> This can generate misaligned exceptions on various architectures. This
>>>>> can be fixed in a more general way according to Masami snippet.
>>>>>
>>>>> Second one that I can see is on 64 bits, big endian architectures with
>>>>> enable_size == 4. In that case, the bit provided by the userspace won't
>>>>> be correctly set since this code kind of assume that the atomic are done
>>>>> on 32bits value. Since the kernel assume long sized atomic operation, on
>>>>> big endian 64 bits architecture, the updated bit will actually be in the
>>>>> next 32 bits word.
>>>>>
>>>>> Can someone confirm my understanding ?
>>>>>
>>>>
>>>> I have a ppc 64bit BE VM I've been validating this on. If we do the
>>>> shifting within user_events (vs a generic set_bit_aligned approach)
>>>> 64bit BE does not need additional bit manipulation. However, if we were
>>>> to blindly pass the address and bit as is to set_bit_aligned() it
>>>> assumes the bit number is for a long, not a 32 bit word. So for that
>>>> approach we would need to offset the bit in the unaligned case.
>>>>
>>>> Here's a patch I have that I've validated on ppc64 BE, aarch64 LE, and
>>>> x86_64 LE. I personally feel more comfortable with this approach than
>>>> the generic set_bit_aligned() one.
>>>>
>>>> Thanks,
>>>> -Beau
>>>>
>>>> diff --git a/kernel/trace/trace_events_user.c 
>>>> b/kernel/trace/trace_events_user.c
>>>> index e3f2b8d72e01..ae854374d0b7 100644
>>>> --- a/kernel/trace/trace_events_user.c
>>>> +++ b/kernel/trace/trace_events_user.c
>>>> @@ -162,6 +162,23 @@ struct user_event_validator {
>>>>    int                     flags;
>>>>  };
>>>>  
>>>> +static inline void align_addr_bit(unsigned long *addr, int *bit)
>>>> +{
>>>> +  if (IS_ALIGNED(*addr, sizeof(long)))
>>>> +          return;
>>>> +
>>>> +  *addr = ALIGN_DOWN(*addr, sizeof(long));
>>>> +
>>>> +  /*
>>>> +   * We only support 32 and 64 bit values. The only time we need
>>>> +   * to align is a 32 bit value on a 64 bit kernel, which on LE
>>>> +   * is always 32 bits, and on BE requires no change.
>>>> +   */
>>>> +#ifdef __LITTLE_ENDIAN
>>>> +  *bit += 32;
>>>> +#endif
>>>
>>> Hi Beau, except the specific alignment that is basically what I ended up
>>> with for the BE 64bit case (ie just bit += 32). Regarding the generic
>>> alignment, depends on what the maintainers wishes (generic of user_event
>>> specific). I also feel like this shoulmd be handle specifically for
>>> user_events which uses set_bit in some non standard way. Any suggestion ?
>>>
>>
>> Looking at this deeper, user_events needs to track some of this
>> alignment requirements within each enabler. This is needed because when
>> enablers are disabled/removed they do not have the actual size. The size
>> is needed to know if we need to update the bits, etc. (IE: If originally
>> a 32-bit value was used and it's aligned and it's on BE, it needs the
>> bits shifted.)
>>
>> My final version that I played around with looked like this for just the
>> alignment requirements:
>> +static inline void align_addr_bit(unsigned long *addr, int *bit,
>> +                                 unsigned long *flags)
>> +{
>> +       if (IS_ALIGNED(*addr, sizeof(long))) {
>> +#ifdef __BIG_ENDIAN
>> +               if (test_bit(ENABLE_VAL_32_BIT, flags))
>> +                       *bit += 32;
>> +#endif
>> +               return;
>> +       }
>> +
>> +       *addr = ALIGN_DOWN(*addr, sizeof(long));
>> +
>> +       /*
>> +        * We only support 32 and 64 bit values. The only time we need
>> +        * to align is a 32 bit value on a 64 bit kernel, which on LE
>> +        * is always 32 bits, and on BE requires no change.
>> +        */
>> +#ifdef __LITTLE_ENDIAN
>> +       *bit += 32;
>> +#endif
>> +}
>>
>> Below is the full patch, which is currently appearing to pass everything
>> on BE (Please note, the abi_test in selftests needs fixes that I plan to
>> push out. I'd like to get Steven to take those changes along with yours
>> together from tracing to ensure we can test BE with these changes
>> properly).
>>
>> As you'll see, it requires a bit more work than just a generic unaligned
>> solution due to the bits having to move for 32-bit on BE and the
>> tracking requirement on when to do so during delete/clear.
> 
> I actually had the same kind of handling (using a size field rather than
> a flag though). However, the generic set_bit/clear bit requires a
> different handling of the 32 bits on BE 64 bits which does not sounds
> quite right, ie unconditionally add 32 bits to fixup the offset which is
>  implicit in case you are not aligned with your code (ie shifting the
> address actually give the correct bit on BE 64 bits).
> 
> Since you already wrote the code, I think we can proceed with your
> version as I actually thinks its more clearer to understand.

FWIW, here is my series, as I said, the generic set/clear bit makes it
less clear in the BE 64 bits case:


diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..b247c24695b9 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -290,6 +290,65 @@ static __always_inline void __assign_bit(long nr,
volatile unsigned long *addr,
                __clear_bit(nr, addr);
 }

+#define LONG_ALIGN_DIFF(p)     ((unsigned long)(p)) & (sizeof(long) -1)
+#define LONG_ALIGNED_PTR(p) \
+               (unsigned long *)(((unsigned long)(p)) & ~(sizeof(long) - 1))
+
+static inline unsigned long *__fix_unaligned(int *bit, void *ptr)
+{
+       int offs = LONG_ALIGN_DIFF(ptr) * BITS_PER_BYTE;
+       unsigned long *uptr;
+
+       if (offs == 0)
+               return uptr;
+
+       uptr = LONG_ALIGNED_PTR(ptr);
+
+#ifdef __BIG_ENDIAN
+       if (*bit >= offs)
+               *bit -= offs;
+       else
+               *bit += BITS_PER_LONG + offs;
+#else
+       *bit += offs;
+#endif
+
+       return uptr;
+}
+
+/**
+ * set_bit_unaligned - Atomically set a bit in memory starting from an
unaligned
+ *                    pointer
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * This is a relaxed atomic operation (no implied memory barriers).
+ *
+ * Note that @nr may be almost arbitrarily large; this function is not
+ * restricted to acting on a single-word quantity.
+ */
+static inline void set_bit_unaligned(int bit, void *ptr)
+{
+       unsigned long *uptr = __fix_unaligned(&bit, ptr);
+
+       set_bit(bit, uptr);
+}
+
+/**
+ * clear_bit_unaligned - Clears a bit in memory starting from an unaligned
+ *                      pointer
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ *
+ * This is a relaxed atomic operation (no implied memory barriers).
+ */
+static inline void clear_bit_unaligned(int bit, unsigned long *ptr)
+{
+       unsigned long *uptr = __fix_unaligned(&bit, ptr);
+
+       clear_bit(bit, uptr);
+}
+
 /**
  * __ptr_set_bit - Set bit in a pointer's value
  * @nr: the bit to set
diff --git a/kernel/trace/trace_events_user.c
b/kernel/trace/trace_events_user.c
index 6f046650e527..641ab6dedb30 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -113,6 +113,7 @@ struct user_event_enabler {
        struct list_head        mm_enablers_link;
        struct user_event       *event;
        unsigned long           addr;
+       u8                      size;

        /* Track enable bit, flags, etc. Aligned for bitops. */
        unsigned long           values;
@@ -481,9 +482,15 @@ static int user_event_enabler_write(struct
user_event_mm *mm,
        unsigned long uaddr = enabler->addr;
        unsigned long *ptr;
        struct page *page;
+       int bit = ENABLER_BIT(enabler);
        void *kaddr;
        int ret;

+#if defined(CONFIG_CPU_BIG_ENDIAN) && defined(CONFIG_64BIT)
+       if (enabler->size == 4)
+               bit += 32;
+#endif
+
        lockdep_assert_held(&event_mutex);
        mmap_assert_locked(mm->mm);

@@ -515,9 +522,9 @@ static int user_event_enabler_write(struct
user_event_mm *mm,

        /* Update bit atomically, user tracers must be atomic as well */
        if (enabler->event && enabler->event->status)
-               set_bit(ENABLE_BIT(enabler), ptr);
+               set_bit_unaligned(bit, ptr);
        else
-               clear_bit(ENABLE_BIT(enabler), ptr);
+               clear_bit_unaligned(bit, ptr);

        kunmap_local(kaddr);
        unpin_user_pages_dirty_lock(&page, 1, true);
@@ -849,6 +856,7 @@ static struct user_event_enabler
        enabler->event = user;
        enabler->addr = uaddr;
        enabler->values = reg->enable_bit;
+       enabler->size = reg->enable_size;
 retry:
        /* Prevents state changes from racing with new enablers */
        mutex_lock(&event_mutex);
@@ -2377,7 +2385,8 @@ static long user_unreg_get(struct user_unreg
__user *ureg,
 }

 static int user_event_mm_clear_bit(struct user_event_mm *user_mm,
-                                  unsigned long uaddr, unsigned char bit)
+                                  unsigned long uaddr, unsigned char bit,
+                                  u8 size)
 {
        struct user_event_enabler enabler;
        int result;
@@ -2386,6 +2395,7 @@ static int user_event_mm_clear_bit(struct
user_event_mm *user_mm,
        memset(&enabler, 0, sizeof(enabler));
        enabler.addr = uaddr;
        enabler.values = bit;
+       enabler.size = size;
 retry:
        /* Prevents state changes from racing with new enablers */
        mutex_lock(&event_mutex);
@@ -2416,6 +2426,7 @@ static long user_events_ioctl_unreg(unsigned long
uarg)
        struct user_event_enabler *enabler, *next;
        struct user_unreg reg;
        long ret;
+       u8 size;

        ret = user_unreg_get(ureg, &reg);

@@ -2441,6 +2452,8 @@ static long user_events_ioctl_unreg(unsigned long
uarg)
                    ENABLE_BIT(enabler) == reg.disable_bit) {
                        set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));

+                       size = enabler->size;
+
                        if (!test_bit(ENABLE_VAL_FAULTING_BIT, 
ENABLE_BITOPS(enabler)))
                                user_event_enabler_destroy(enabler, true);

@@ -2454,7 +2467,7 @@ static long user_events_ioctl_unreg(unsigned long
uarg)
        /* Ensure bit is now cleared for user, regardless of event status */
        if (!ret)
                ret = user_event_mm_clear_bit(mm, reg.disable_addr,
-                                             reg.disable_bit);
+                                             reg.disable_bit, size);

        return ret;
 }

--

Clément

> 
> Thanks,
> 
> Clément
> 
>>
>> Thanks,
>> -Beau
>>
>> diff --git a/kernel/trace/trace_events_user.c 
>> b/kernel/trace/trace_events_user.c
>> index 6f046650e527..b05db15eaea6 100644
>> --- a/kernel/trace/trace_events_user.c
>> +++ b/kernel/trace/trace_events_user.c
>> @@ -127,8 +127,11 @@ struct user_event_enabler {
>>  /* Bit 7 is for freeing status of enablement */
>>  #define ENABLE_VAL_FREEING_BIT 7
>>  
>> +/* Bit 8 is for marking 32-bit on 64-bit */
>> +#define ENABLE_VAL_32_BIT 8
>> +
>>  /* Only duplicate the bit value */
>> -#define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK
>> +#define ENABLE_VAL_DUP_MASK (ENABLE_VAL_BIT_MASK | 1 << ENABLE_VAL_32_BIT)
>>  
>>  #define ENABLE_BITOPS(e) (&(e)->values)
>>  
>> @@ -174,6 +177,29 @@ struct user_event_validator {
>>      int                     flags;
>>  };
>>  
>> +static inline void align_addr_bit(unsigned long *addr, int *bit,
>> +                              unsigned long *flags)
>> +{
>> +    if (IS_ALIGNED(*addr, sizeof(long))) {
>> +#ifdef __BIG_ENDIAN
>> +            if (test_bit(ENABLE_VAL_32_BIT, flags))
>> +                    *bit += 32;
>> +#endif
>> +            return;
>> +    }
>> +
>> +    *addr = ALIGN_DOWN(*addr, sizeof(long));
>> +
>> +    /*
>> +     * We only support 32 and 64 bit values. The only time we need
>> +     * to align is a 32 bit value on a 64 bit kernel, which on LE
>> +     * is always 32 bits, and on BE requires no change.
>> +     */
>> +#ifdef __LITTLE_ENDIAN
>> +    *bit += 32;
>> +#endif
>> +}
>> +
>>  typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter 
>> *i,
>>                                 void *tpdata, bool *faulted);
>>  
>> @@ -482,6 +508,7 @@ static int user_event_enabler_write(struct user_event_mm 
>> *mm,
>>      unsigned long *ptr;
>>      struct page *page;
>>      void *kaddr;
>> +    int bit = ENABLE_BIT(enabler);
>>      int ret;
>>  
>>      lockdep_assert_held(&event_mutex);
>> @@ -497,6 +524,8 @@ static int user_event_enabler_write(struct user_event_mm 
>> *mm,
>>                   test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler))))
>>              return -EBUSY;
>>  
>> +    align_addr_bit(&uaddr, &bit, ENABLE_BITOPS(enabler));
>> +
>>      ret = pin_user_pages_remote(mm->mm, uaddr, 1, FOLL_WRITE | FOLL_NOFAULT,
>>                                  &page, NULL);
>>  
>> @@ -515,9 +544,9 @@ static int user_event_enabler_write(struct user_event_mm 
>> *mm,
>>  
>>      /* Update bit atomically, user tracers must be atomic as well */
>>      if (enabler->event && enabler->event->status)
>> -            set_bit(ENABLE_BIT(enabler), ptr);
>> +            set_bit(bit, ptr);
>>      else
>> -            clear_bit(ENABLE_BIT(enabler), ptr);
>> +            clear_bit(bit, ptr);
>>  
>>      kunmap_local(kaddr);
>>      unpin_user_pages_dirty_lock(&page, 1, true);
>> @@ -849,6 +878,12 @@ static struct user_event_enabler
>>      enabler->event = user;
>>      enabler->addr = uaddr;
>>      enabler->values = reg->enable_bit;
>> +
>> +#if BITS_PER_LONG >= 64
>> +    if (reg->enable_size == 4)
>> +            set_bit(ENABLE_VAL_32_BIT, ENABLE_BITOPS(enabler));
>> +#endif
>> +
>>  retry:
>>      /* Prevents state changes from racing with new enablers */
>>      mutex_lock(&event_mutex);
>> @@ -2377,7 +2412,8 @@ static long user_unreg_get(struct user_unreg __user 
>> *ureg,
>>  }
>>  
>>  static int user_event_mm_clear_bit(struct user_event_mm *user_mm,
>> -                               unsigned long uaddr, unsigned char bit)
>> +                               unsigned long uaddr, unsigned char bit,
>> +                               unsigned long flags)
>>  {
>>      struct user_event_enabler enabler;
>>      int result;
>> @@ -2385,7 +2421,7 @@ static int user_event_mm_clear_bit(struct 
>> user_event_mm *user_mm,
>>  
>>      memset(&enabler, 0, sizeof(enabler));
>>      enabler.addr = uaddr;
>> -    enabler.values = bit;
>> +    enabler.values = bit | flags;
>>  retry:
>>      /* Prevents state changes from racing with new enablers */
>>      mutex_lock(&event_mutex);
>> @@ -2415,6 +2451,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>      struct user_event_mm *mm = current->user_event_mm;
>>      struct user_event_enabler *enabler, *next;
>>      struct user_unreg reg;
>> +    unsigned long flags;
>>      long ret;
>>  
>>      ret = user_unreg_get(ureg, &reg);
>> @@ -2425,6 +2462,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>      if (!mm)
>>              return -ENOENT;
>>  
>> +    flags = 0;
>>      ret = -ENOENT;
>>  
>>      /*
>> @@ -2441,6 +2479,10 @@ static long user_events_ioctl_unreg(unsigned long 
>> uarg)
>>                  ENABLE_BIT(enabler) == reg.disable_bit) {
>>                      set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));
>>  
>> +                    /* We must keep compat flags for the clear */
>> +                    if (test_bit(ENABLE_VAL_32_BIT, ENABLE_BITOPS(enabler)))
>> +                            flags |= 1 << ENABLE_VAL_32_BIT;
>> +
>>                      if (!test_bit(ENABLE_VAL_FAULTING_BIT, 
>> ENABLE_BITOPS(enabler)))
>>                              user_event_enabler_destroy(enabler, true);
>>  
>> @@ -2454,7 +2496,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>      /* Ensure bit is now cleared for user, regardless of event status */
>>      if (!ret)
>>              ret = user_event_mm_clear_bit(mm, reg.disable_addr,
>> -                                          reg.disable_bit);
>> +                                          reg.disable_bit, flags);
>>  
>>      return ret;
>>  }

Reply via email to