On 19.02.2024 15:00, Oleksii wrote:
> On Sun, 2024-02-18 at 19:00 +0000, Julien Grall wrote:
>> On 05/02/2024 15:32, Oleksii Kurochko wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/include/asm/cmpxchg.h
>>> @@ -0,0 +1,237 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/* Copyright (C) 2014 Regents of the University of California */
>>> +
>>> +#ifndef _ASM_RISCV_CMPXCHG_H
>>> +#define _ASM_RISCV_CMPXCHG_H
>>> +
>>> +#include <xen/compiler.h>
>>> +#include <xen/lib.h>
>>> +
>>> +#include <asm/fence.h>
>>> +#include <asm/io.h>
>>> +#include <asm/system.h>
>>> +
>>> +#define ALIGN_DOWN(addr, size)  ((addr) & (~((size) - 1)))
>>> +
>>> +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier,
>>> acquire_barrier) \
>>> +({ \
>>> +    asm volatile( \
>>> +        release_barrier \
>>> +        " amoswap" sfx " %0, %2, %1\n" \
>>> +        acquire_barrier \
>>> +        : "=r" (ret), "+A" (*ptr) \
>>> +        : "r" (new) \
>>> +        : "memory" ); \
>>> +})
>>> +
>>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
>>> acquire_barrier) \
>>> +({ \
>>> +    uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned
>>> long)ptr, 4); \
>>> +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr)))
>>> * BITS_PER_BYTE; \
>>> +    uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \
>>> +    uint8_t mask_h = mask_l + mask_size - 1; \
>>> +    unsigned long mask = GENMASK(mask_h, mask_l); \
>>> +    unsigned long new_ = (unsigned long)(new) << mask_l; \
>>> +    unsigned long ret_; \
>>> +    unsigned long rc; \
>>> +    \
>>> +    asm volatile( \
>>> +        release_barrier \
>>> +        "0: lr.d %0, %2\n" \
>>
>> I was going to ask why this is lr.d rather than lr.w. But I see Jan 
>> already asked. I agree with him that it should probably be a lr.w and
>> ...
>>
>>> +        "   and  %1, %0, %z4\n" \
>>> +        "   or   %1, %1, %z3\n" \
>>> +        "   sc.d %1, %1, %2\n" \
>>
>> ... respectively sc.w. The same applies for cmpxchg.
> 
> I agree that it would be better, and my initial attempt was to handle
> 4-byte or 8-byte boundary crossing during 2-byte access:
> 
> 0 1 2 3 4 5 6 7 8
> X X X 1 1 X X X X
> 
> In this case, if I align address 3 to address 0 and then read 4 bytes
> instead of 8 bytes, I will not process the byte at address 4. This was
> the reason why I started to read 8 bytes.
> 
> I also acknowledge that there could be an issue in the following case:
> 
> X  4094 4095 4096
> X    1   1    X
> In this situation, when I read 8 bytes, there could be an issue where
> the new page (which starts at 4096) will not be mapped. It seems
> correct in this case to check that variable is within one page and read
> 4 bytes instead of 8.
> 
> One more thing I am uncertain about is if we change everything to read
> 4 bytes with 4-byte alignment, what should be done with the first case?
> Should we panic? (I am not sure if this is an option.)

Counter question (raised elsewhere already): What if a 4-byte access
crosses a word / cache line / page boundary? Ideally exactly the
same would happen for a 2-byte access crossing a respective boundary.
(Which you can achieve relatively easily by masking off only address
bit 1, keeping address bit 0 unaltered.)

> Should we
> perform the operation twice for addresses 0x0 and 0x4?

That wouldn't be atomic then.

Jan

Reply via email to