----- Original Message ----- > From: "Richard Smith via cfe-commits" <cfe-commits@lists.llvm.org> > To: "Matt Arsenault" <arse...@gmail.com> > Cc: "Clang Commits" <cfe-commits@lists.llvm.org> > Sent: Monday, May 9, 2016 4:45:04 PM > Subject: Re: [Clang] Convergent Attribute
> On Mon, May 9, 2016 at 2:43 PM, Richard Smith < rich...@metafoo.co.uk > > wrote: > > On Sun, May 8, 2016 at 12:43 PM, Matt Arsenault via cfe-commits < > > cfe-commits@lists.llvm.org > wrote: > > > > > On May 6, 2016, at 18:12, Richard Smith via cfe-commits < > > > > cfe-commits@lists.llvm.org > wrote: > > > > > > > > > > On Fri, May 6, 2016 at 4:20 PM, Matt Arsenault via cfe-commits > > > > < > > > > cfe-commits@lists.llvm.org > wrote: > > > > > > > > > > > On 05/06/2016 02:42 PM, David Majnemer via cfe-commits wrote: > > > > > > > > > > > > > > > > This example looks wrong to me. It doesn't seem meaningful > > > > > > for > > > > > > a > > > > > > function to be both readonly and convergent, because > > > > > > convergent > > > > > > means the call has some side-effect visible to other > > > > > > threads > > > > > > and > > > > > > readonly means the call has no side-effects visible outside > > > > > > the > > > > > > function. > > > > > > > > > > > > > > > > > > > > This s not correct. It is valid for convergent operations to > > > > > be > > > > > readonly/readnone. Barriers are a common case which do have > > > > > side > > > > > effects, but there are also classes of GPU instructions which > > > > > do > > > > > not > > > > > access memory and still need the convergent semantics. > > > > > > > > > > > > > > Can you give an example? It's not clear to me how a function > > > > could > > > > be > > > > both convergent and satisfy the readnone requirement that it > > > > not > > > > "access[...] any mutable state (e.g. memory, control registers, > > > > etc) > > > > visible to caller functions". Synchronizing with other threads > > > > seems > > > > like it would cause such a state change in an abstract sense. > > > > Is > > > > the > > > > critical distinction here that the state mutation is visible to > > > > the > > > > code that spawned the gang of threads, but not to other threads > > > > within the gang? (This seems like a bug in the definition of > > > > readonly if so, because it means that a readonly call whose > > > > result > > > > is unused cannot be deleted.) > > > > > > > > > > I care about this because Clang maps __attribute__((pure)) to > > > > LLVM > > > > readonly, and -- irrespective of the LLVM semantics -- a call > > > > to > > > > a > > > > function marked pure is permitted to be deleted if the return > > > > value > > > > is unused, or to have multiple calls CSE'd. As a result, inside > > > > Clang, we use that attribute to determine whether an expression > > > > has > > > > side effects, and Clang's reasoning about these things may also > > > > lead > > > > to miscompiles if a call to a function marked > > > > __attribute__((pure, > > > > convergent)) actually can have a side effect. > > > > _______________________________________________ > > > > > > > > > > cfe-commits mailing list > > > > > > > > > > cfe-commits@lists.llvm.org > > > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits > > > > > > > > > These are communication operations between lanes that do not > > > require > > > synchronization within the wavefront. These are mostly cross lane > > > communication instructions. An example would be the > > > amdgcn.mov.dpp > > > instruction, which reads a register from a neighboring lane, or > > > the > > > CUDA warp vote functions. > > > > > Those both appear to technically fail to satisfy the requirements > > of > > an __attribute__((pure)) function. If I understand correctly, the > > DPP function effectively stores a value into some state that is > > shared with another lane (from Clang and LLVM's perspectives, state > > that is visible to a function evaluation outside the current one), > > and then reads a value from another such shared storage location. > > The CUDA warp vote functions effectively store a value into some > > state that is shared with all other threads in the warp and then > > read some summary information about the values stored by all the > > threads. In both cases, the function mutates state that is visible > > to other functions running on other threads, and so is not > > __attribute__((pure)) / readonly, as far as I can see. > > (And just to be clear, the fact that no actual storage is used for > this is irrelevant to the notional semantics of the operation. Note > that the definition of the pure attribute also covers "control > registers, etc".) > > It seems to me that this change weakens the definition of these > > attributes when combined with the convergent attribute to mean that > > the function *is* still allowed to store to mutable state that's > > shared with other lanes / other threads in the same warp, but only > > via convergent combined store/load primitives. That makes some > > sense, given that the behavior of the *execution* model does not > > (necessarily) treat each notional lane as a separate thread, and > > from that perspective the instruction can be viewed as operating on > > a vector and communicating only with itself, but it doesn't match > > the current definitions of the semantics of these attributes (which > > are specified in terms of the *source* model, in which each > > notional > > lane is a separate invocation of the function). So I'd like at > > least > > for some documentation to be added for our "pure" and "const" > > attributes, saying something like "if this is combined with the > > "convergent" attribute, the function may still communicate with > > other lanes through convergent operations, even though such > > communication notionally involves modification of mutable state > > visible to the other lanes". I'd suggest a similar change also be > > made to LLVM's LangRef. > +1 It seems like we need to be explicit, however, that the modified state is only accessible via the return value of the function. -Hal > > I've checked through how clang is using the "pure" attribute, and > > it > > seems like it should mostly do the right thing in this case. There > > are a few places where (using your amdgcn.mov.dpp example) we would > > cause a dpp instruction to be emitted where the source code called > > the relevant operation from within an operand that we do not > > notionally evaluate (for instance, the operand of a __assume or > > __builtin_object_size). Contrived example: > > > int arr[N]; > > > int dpp(int n) __attribute__((convergent, pure)); > > > void f(int id) { > > > int x = 0; > > > if (id % 2) x = __builtin_object_size(&arr[dpp(id)], 0); > > > } > > > We'll emit code to call the dpp function here, because we believe > > it > > has no side-effects. However, in both cases where we do this, we > > require the relevant expression to have defined behaviour (even > > though we say we won't perform any side-effects contained within > > it) > > so it wouldn't be valid to call a convergent function except from a > > convergent point in the surrounding function. So I think the worst > > effect of this would be that we would emit extra convergent > > operations; the resulting code should still be correct. > > > > There is no synchronization required, and there is no other way > > > for > > > the same item to access that information private to the other > > > workitem. There’s no observable global state from the perspective > > > of > > > a single lane. The individual registers changed aren’t visible to > > > the spawning host program (perhaps with the exception of some > > > debug > > > hardware inspecting all of the individual registers). Deleting > > > these > > > would be perfectly acceptable if the result is unused. > > > > > > -Matt > > > > > > _______________________________________________ > > > > > > cfe-commits mailing list > > > > > > cfe-commits@lists.llvm.org > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits > > > > _______________________________________________ > cfe-commits mailing list > cfe-commits@lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits