On 19.02.2024 15:29, Oleksii wrote:
> On Mon, 2024-02-19 at 12:22 +0100, Jan Beulich wrote:
>> On 15.02.2024 14:41, Oleksii wrote:
>>> As I mentioned above with 4-byte alignment and then reading and
>>> working
>>> with 8-byte then crossing a word or double-word boundary shouldn't
>>> be
>>> an issue.
>>>
>>> I am not sure that I know how to check that we are crossing cache
>>> line
>>> boundary.
>>>
>>> Regarding page boundary, if the next page is mapped then all should
>>> work fine, otherwise it will be an exception.
>>
>> Are you sure lr.d / sc.d are happy to access across such a boundary,
>> when both pages are mapped?
> If they are mapped, my expectation that lr.d and sc.d should be happy.

How does this expectation of yours fit with the A extension doc having
this:

"For LR and SC, the A extension requires that the address held in rs1 be
 naturally aligned to the size of the operand (i.e., eight-byte aligned
 for 64-bit words and four-byte aligned for 32-bit words). If the
 address is not naturally aligned, an address-misaligned exception or an
 access-fault exception will be generated."

It doesn't even say "may"; it says "will".

>> To me it seems pretty clear that for atomic accesses you want to
>> demand natural alignment, i.e. 2-byte alignment for 2-byte accesses.
>> This way you can be sure no potentially problematic boundaries will
>> be crossed.
> It makes sense, but I am not sure that I can guarantee that a user of
> macros will always have 2-byte alignment (except during a panic) in the
> future.
> 
> Even now, I am uncertain that everyone will be willing to add
> __alignment(...) to struct vcpu->is_urgent
> (xen/include/xen/sched.h:218) and other possible cases to accommodate
> RISC-V requirements.

->is_urgent is bool, i.e. 1 byte and hence okay at any address. For all
normal variables and fields the compiler will guarantee suitable
(natural) alignment. What you prohibit by requiring aligned items is
use of fields of e.g. packed structures.

>>> As 1- and 2-byte cases are emulated I decided that is not to
>>> provide
>>> sfx argument for emulation macros as it will not have to much
>>> affect on
>>> emulated types and just consume more performance on acquire and
>>> release
>>> version of sc/ld instructions.
>>
>> Question is whether the common case (4- and 8-byte accesses)
>> shouldn't
>> be valued higher, with 1- and 2-byte emulation being there just to
>> allow things to not break altogether.
> If I understand you correctly, it would make sense to add the 'sfx'
> argument for the 1/2-byte access case, ensuring that all options are
> available for 1/2-byte access case as well.

That's one of the possibilities. As said, I'm not overly worried about
the emulated cases. For the initial implementation I'd recommend going
with what is easiest there, yielding the best possible result for the
4- and 8-byte cases. If later it turns out repeated acquire/release
accesses are a problem in the emulation loop, things can be changed
to explicit barriers, without touching the 4- and 8-byte cases.

Jan

Reply via email to