[PATCH] D79744: clang: Add address space to indirect abi info and use it for kernels

John McCall via Phabricator via cfe-commits Thu, 21 May 2020 23:21:32 -0700

rjmccall added a comment.

In D79744#2047482 <https://reviews.llvm.org/D79744#2047482>, @arsenm wrote:


> In D79744#2040731 <https://reviews.llvm.org/D79744#2040731>, @rjmccall wrote:
>
> > In D79744#2040434 <https://reviews.llvm.org/D79744#2040434>, @jdoerfert 
> > wrote:
> >
> > > In D79744#2040380 <https://reviews.llvm.org/D79744#2040380>, @rjmccall 
> > > wrote:
> > >
> > > > In D79744#2040348 <https://reviews.llvm.org/D79744#2040348>, @jdoerfert 
> > > > wrote:
> > > >
> > > > > Drive by, I haven't read the entire history.
> > > > >
> > > > > In D79744#2040208 <https://reviews.llvm.org/D79744#2040208>, 
> > > > > @rjmccall wrote:
> > > > >
> > > > > > I don't understand why `noalias` is even a concern if the whole 
> > > > > > buffer passed to the kernel is read-only.  `noalias` is primarily 
> > > > > > about proving non-interference, but if you can tell IR that the 
> > > > > > buffer is read-only, that's a much more powerful statement.
> > > > >
> > > > >
> > > > > The problem is that it is a "per-pointer" attribute and not 
> > > > > "per-object". Given two argument pointers, where one is marked 
> > > > > `readonly`, may still alias. Similarly, an access to a global, 
> > > > > indirect accesses, ... can write the "readonly" memory.
> > > >
> > > >
> > > > Oh, do we really not have a way to mark that memory is known to be 
> > > > truly immutable for a time?  That seems like a really bad limitation.  
> > > > It should be doable with a custom alias analysis at least.
> > >
> > >
> > > `noalias` + `readone` on an argument basically implies immutable for the 
> > > function scope. I think we have invariant intrinsics that could do the 
> > > trick as well, though I'm not too familiar with those. I was eventually 
> > > hoping for paired/scoped `llvm.assumes` which would allow `noalias` + 
> > > `readnone` again. Then there is `invariant` which can be placed on a load 
> > > instruction. Finally, TBAA has a "constant memory" tag (or something like 
> > > that), but again it is a per-access thing. That are all the in-tree ways 
> > > I can think of right now.
> > >
> > > Custom alias analysis can probably be used to some degree but except 
> > > address spaces I'm unsure we have much that you can attach to a pointer 
> > > and that "really sticks".
> > >
> > > >>> Regardless, if you do need `noalias`, you should be able to emit the 
> > > >>> same IR that we'd emit if pointers to all the fields were assigned 
> > > >>> into `restrict` local variables and then only those variables were 
> > > >>> subsequently used.
> > > >> 
> > > >> We are still debating the local restrict pointer support. Once we 
> > > >> merge that in it should be usable here.
> > > > 
> > > > I thought that was finished a few years ago; is it really not 
> > > > considered usable yet?  Or does "we" not just mean LLVM here?
> > >
> > > Yes, I meant "we" = LLVM here. Maybe we talk about different things. I 
> > > was referring to local restrict qualified variables, e.g., `double * 
> > > __restrict Ptr = ...`. The proposal to not just ignore the restrict (see 
> > > https://godbolt.org/z/jLzjR3) came last year, it was a big one and 
> > > progress unfortunately stalled (partly my fault). Now we are just about 
> > > to see a second push to get it done.
> > >  Is that what you meant too?
> >
> >
> > I thought I remembered Hal doing a lot of work on local restrict a few 
> > years ago.  I'm probably just misremembering, or I didn't realize that the 
> > work never landed or got pulled out later.
> >
> > Okay.  So where we're at is that you'd like to add a new argument-passing 
> > convention that's basically "passed indirect but immutable", implying that 
> > the frontend has to copy it in order to create the mutable local parameter. 
> >  That's not actually a totally ridiculous convention in principle, although 
> > it has poor worst-case behavior (copies on both sides), and that happens to 
> > be what the frontend will often have to conservatively emit.  I would still 
> > prefer not to add new argument-passing conventions just to satisfy 
> > short-term optimization needs, though.  Are there any other reasonable 
> > options?
>
>
> For the purpose here, only the callee exists. This is essentially a 
> freestanding function, the entry point to the program.


I'm definitely not going to let you add a new "generic" argument-passing 
convention and then only implement exactly the parts you need; that's just 
adding technical debt.

> The load-from-constant nature needs to be exposed earlier, so I think this 
> necessarily involves changing the convention lowering in some way and it's 
> just a question of what it looks like. To summarize the 2.5 options I've come 
> up with are
> 
> 1. Use constant byval parameters, as this patch does. This requires the least 
> implementation effort but doesn't exactly fit in with byval as defined.

I assume you generate a `byval` caller in some later pass, which will then 
implicitly copy.  Or do you lower `byval` in a nonstandard way in your backend?

> 1. Replace all IR argument uses with loads from a constant offset from an 
> intrinsic call. This still needs to leave the IR arguments in place because 
> we do need to know the original argument sizes and offsets, but they would 
> never have a use (or I would need to invent some other method of tracking 
> this information)

I guess passing aggregate arguments using a normal indirect convention has this 
same tracking problem.  You'd also have to copy on the caller side to maintain 
semantics, which is probably hard to eliminate, but I would guess `byval` has 
the same problem?

> 1. Keep clang IR generation unchanged, but move the pass that lowers 
> arguments to loads earlier and hack out aggregate IR loads before SROA makes 
> things worse. This is really just a kludgier version of option 2. We do 
> ultimately do this late in the backend to enable vectorization, but it does 
> seem to make the middle end optimizer unhappy

Yeah, Clang tries to avoid first-class aggregates for exactly this reason, LLVM 
does a terrible job generating code for them.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79744/new/

https://reviews.llvm.org/D79744



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79744: clang: Add address space to indirect abi info and use it for kernels

Reply via email to