arsenm added a comment.

In D79744#2040731 <https://reviews.llvm.org/D79744#2040731>, @rjmccall wrote:

> In D79744#2040434 <https://reviews.llvm.org/D79744#2040434>, @jdoerfert wrote:
>
> > In D79744#2040380 <https://reviews.llvm.org/D79744#2040380>, @rjmccall 
> > wrote:
> >
> > > In D79744#2040348 <https://reviews.llvm.org/D79744#2040348>, @jdoerfert 
> > > wrote:
> > >
> > > > Drive by, I haven't read the entire history.
> > > >
> > > > In D79744#2040208 <https://reviews.llvm.org/D79744#2040208>, @rjmccall 
> > > > wrote:
> > > >
> > > > > I don't understand why `noalias` is even a concern if the whole 
> > > > > buffer passed to the kernel is read-only.  `noalias` is primarily 
> > > > > about proving non-interference, but if you can tell IR that the 
> > > > > buffer is read-only, that's a much more powerful statement.
> > > >
> > > >
> > > > The problem is that it is a "per-pointer" attribute and not 
> > > > "per-object". Given two argument pointers, where one is marked 
> > > > `readonly`, may still alias. Similarly, an access to a global, indirect 
> > > > accesses, ... can write the "readonly" memory.
> > >
> > >
> > > Oh, do we really not have a way to mark that memory is known to be truly 
> > > immutable for a time?  That seems like a really bad limitation.  It 
> > > should be doable with a custom alias analysis at least.
> >
> >
> > `noalias` + `readone` on an argument basically implies immutable for the 
> > function scope. I think we have invariant intrinsics that could do the 
> > trick as well, though I'm not too familiar with those. I was eventually 
> > hoping for paired/scoped `llvm.assumes` which would allow `noalias` + 
> > `readnone` again. Then there is `invariant` which can be placed on a load 
> > instruction. Finally, TBAA has a "constant memory" tag (or something like 
> > that), but again it is a per-access thing. That are all the in-tree ways I 
> > can think of right now.
> >
> > Custom alias analysis can probably be used to some degree but except 
> > address spaces I'm unsure we have much that you can attach to a pointer and 
> > that "really sticks".
> >
> > >>> Regardless, if you do need `noalias`, you should be able to emit the 
> > >>> same IR that we'd emit if pointers to all the fields were assigned into 
> > >>> `restrict` local variables and then only those variables were 
> > >>> subsequently used.
> > >> 
> > >> We are still debating the local restrict pointer support. Once we merge 
> > >> that in it should be usable here.
> > > 
> > > I thought that was finished a few years ago; is it really not considered 
> > > usable yet?  Or does "we" not just mean LLVM here?
> >
> > Yes, I meant "we" = LLVM here. Maybe we talk about different things. I was 
> > referring to local restrict qualified variables, e.g., `double * __restrict 
> > Ptr = ...`. The proposal to not just ignore the restrict (see 
> > https://godbolt.org/z/jLzjR3) came last year, it was a big one and progress 
> > unfortunately stalled (partly my fault). Now we are just about to see a 
> > second push to get it done.
> >  Is that what you meant too?
>
>
> I thought I remembered Hal doing a lot of work on local restrict a few years 
> ago.  I'm probably just misremembering, or I didn't realize that the work 
> never landed or got pulled out later.
>
> Okay.  So where we're at is that you'd like to add a new argument-passing 
> convention that's basically "passed indirect but immutable", implying that 
> the frontend has to copy it in order to create the mutable local parameter.  
> That's not actually a totally ridiculous convention in principle, although it 
> has poor worst-case behavior (copies on both sides), and that happens to be 
> what the frontend will often have to conservatively emit.  I would still 
> prefer not to add new argument-passing conventions just to satisfy short-term 
> optimization needs, though.  Are there any other reasonable options?


For the purpose here, only the callee exists. This is essentially a 
freestanding function, the entry point to the program. There is no caller 
function, and in the future I would like to make a call to amdgpu_kernel an IR 
verifier error (technically OpenCL device enqueue is an exception to this, but 
we don't treat this as a call. Instead there's a lot of library magic to invoke 
the kernel. From the perspective of the callee nothing changes, since it's 
still not allowed to modify the incoming argument buffer or aware it was called 
this way).

The load-from-constant nature needs to be exposed earlier, so I think this 
necessarily involves changing the convention lowering in some way and it's just 
a question of what it looks like. To summarize the 2.5 options I've come up 
with are

1. Use constant byval parameters, as this patch does. This requires the least 
implementation effort but doesn't exactly fit in with byval as defined.
2. Replace all IR argument uses with loads from a constant offset from an 
intrinsic call. This still needs to leave the IR arguments in place because we 
do need to know the original argument sizes and offsets, but they would never 
have a use (or I would need to invent some other method of tracking this 
information)
3. Keep clang IR generation unchanged, but move the pass that lowers arguments 
to loads earlier and hack out aggregate IR loads before SROA makes things 
worse. This is really just a kludgier version of option 2. We do ultimately do 
this late in the backend to enable vectorization, but it does seem to make the 
middle end optimizer unhappy




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79744/new/

https://reviews.llvm.org/D79744



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to