melver added a comment.

In D103958#2809145 <https://reviews.llvm.org/D103958#2809145>, @efriedma wrote:

> In D103958#2808966 <https://reviews.llvm.org/D103958#2808966>, @melver wrote:
>
>> In D103958#2808861 <https://reviews.llvm.org/D103958#2808861>, @efriedma 
>> wrote:
>>
>>> I don't like using metadata like this.  Dropping metadata should generally 
>>> preserve the semantics of the code.
>>
>> Anything better for this without introducing new instructions? Would an 
>> argument be reasonable?
>
> If we really want to make it part of the branch, maybe add an intrinsic that 
> can be used with callbr.  Not something we've done before, but the 
> infrastructure should be mostly there.
>
> That said, I'm not sure this is the best approach.  Alternative proposal:
>
> We could add a regular intrinsic.  Just ignore the control flow at the IR 
> level, and come up with a straight-line blob that just does the right thing.  
> I think we'd want to actually perform the load as part of the intrinsic, to 
> avoid worrying about the consume dependency.  So we'd have an intrinsic 
> "__builtin_load_with_control_dependency()". It would lower to something along 
> the lines of  `asm("ldr %0, [%1]; cbnz %0, .+4":"=r"(dest):"r"(x):"memory");` 
> on AArch64.  The differences between the intrinsic and just using the asm I 
> wrote:
>
> 1. We weaken the "memory" clobber to something that more accurately matches 
> what we need.
> 2. We add a compiler transform to check if the branch is redundant, late in 
> the optimization pipeline, and remove it if it is.
>
> I think this produces the code you want, and it should be easier to 
> understand and maintain.

Interesting, but probably doesn't quite work if I understood right -- however, 
perhaps it could solve something related (not part of this work, see below 
[footnote]).

Not every `READ_ONCE()` the kernel has needs to be a 
`load_with_control_dependency()`, which if I read it right, would happen if we 
just do, e.g.:

  #define READ_ONCE __builtin_load_with_control_dependency
  int x;
  void foo(void) { return READ_ONCE(x); }

And they really want to avoid introducing another set of primitives, like 
`READ_ONCE_CTRL()`, because if we did that, I think it'd be reasonable to 
upgrade all `READ_ONCE_CTRL()` to acquires and we're done (suggested by Will 
Deacon in [1]). Yet upgrading all `READ_ONCE()` to acquire is not acceptable in 
general (do note, it's not just AArch64, also POWER and Armv7).  For now, it'd 
be good to avoid this -- in particular, existing code like the following would 
become less clear or less optimal:

  x = READ_ONCE(..);
  y  = READ_ONCE(..);
  ... lots of other code ...
  if (y) { ... do other stuff ... } // <--- no control dependency here
  if (x && y) { WRITE_ONCE(..) } // <-- only want control  dependency here

Which is why the kernel folks probably wouldn't be too happy with more 
primitives as it likely penalizes more than with just marking the branch 
itself. Per [1] new load-primitives are probably a last resort assuming the 
compiler can deliver a nice mechanism to ensure control-dependencies remain 
(this work here).

Thanks.

----

[1] https://lore.kernel.org/linux-arch/20210607160252.GA7580@willie-the-truck/

[footnote] Re the "memory" clobber, Linus asks for more fine-grained asm 
clobber: 
https://lore.kernel.org/linux-arch/CAHk-=wjwxs5+sozgtaz0bp9nsoa+pymacge4cbdvx3edguc...@mail.gmail.com/
If you see a way to support this, I think it'd help in other places (e.g. 
kernel's one-directional memory barriers).

Tangentially, per this presentation:
https://github.com/ClangBuiltLinux/plumbers-2020-slides/blob/master/LPC_2020_--_Dependency_ordering.pdf
there is another problem, which are address dependencies aka 
`memory_order_consume`. In reality the kernel wants every `READ_ONCE()` be 
something very close to `memory_order_consume`, with the compiler figuring out 
the optimal thing to do. Unfortunately, this is not the reality today. Paul 
McKenney et al. has been exploring this in: 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html -- but 
since addr-deps are much less likely to be optimized away, I think the kernel 
will do nothing about it in the near term. ctrl-deps on the other hand are more 
of a worry to the Linux kernel community right now.

I can't summarize this well enough here, so I would strongly recommend: 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0124r6.html#Control%20Dependencies


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103958/new/

https://reviews.llvm.org/D103958

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to