[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Alex Voicu via cfe-commits Wed, 18 Jun 2025 11:48:19 -0700

================
@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
     QualType FromType = OnlyArg->getType();
+    // __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+    // type, although this is almost always an error and we advise against it
----------------
AlexVlx wrote:


> > The __has_builtin counter-example actually does not work and cannot work, 
> > please see: https://gcc.godbolt.org/z/7G5Y1d85b.
> 
> I cannot imagine a situation in which this isn't indicative of a bug, but 
> perhaps this situation is the same one that necessitated [this 
> PR](https://github.com/llvm/llvm-project/pull/126324#issuecomment-2706655366) 
> which eventually concluded that we should change the behavior of 
> `__has_builtin` rather than introduce a new builtin.
> 

This is not actually a bug, it's intended behaviour. To obtain what you expect 
the `b` would have to be `constexpr`, and then the `if` itself would have to be 
`if constexpr`. Otherwise there's no binding commitment to evaluate this at 
compile time (and, in effect, if this gets trivially evaluated / removed in the 
FE, it induces dependence on optimisation level).

> > furthermore, the ...later... bit is pretty important: what happens on that 
> > path?
> 
> Anything in the world besides changing the value of `has_builtin` to 
> something other than what `__has_builtin` returned.
> 
> > if you do any of those, your distant has_builtin variable no longer 
> > reflects the predicate, which is the issue;
> 
> Why is that an issue? If the variable no longer reflects the predicate, 
> that's not on the compiler to figure out how to deal with, that's "play silly 
> games, win silly prizes".
>

It is a difficult conversation to have and not exactly what users want to hear, 
so making it as hard as possible to end up in an exchange where you have to say 
"welp, that was a silly game" cannot hurt. If anything, it's compassionate 
behaviour!

> Backing up a step.. my expectation is that this eventually lowers down to a 
> test and jump which jumps past the target code if the test fails. e.g.,
> 
> ```
>   %0 = load i8, ptr %b, align 1
>   %loadedv = trunc i8 %0 to i1
>   br i1 %loadedv, label %if.then, label %if.end
> 
> if.then:
>   # the target-specific instructions live here
>   br label %if.end
> 
> if.end:
>   ret void
> ```
> 
> So we'd be generating instructions for the target which may be invalid if the 
> test lies. If something did change that value so it no longer represents the 
> predicate, I think that's UB (and we could help users catch that UB via a 
> sanitizer check if we wanted to, rather than try to make the backend have to 
> try to figure it out at compile time).
> 

This cannot work reliably (e.g. there are instructions that would simply fail 
at ISEL, and a run time jump doesn't mean that you do not lower to ISA the 
jumped around block), and introducing dependence on sanitizers seems not ideal. 
Furthermore, a run time jump isn't free, which is a concern for us, and we also 
already have a mechanism for that case (`__attribute__((target))`).  Note that 
these can also control e.g. resource allocation, so actually generating both 
might lead to arbitrary exhaustion of a limited resource, and spurious 
compilation failures, consider e.g. (I'll use CUDA/HIP syntax):

```cpp
// This is a bit odd, and technically a race because multiple lanes write to 
shared_buf
void foo() {
  __shared__ int* shared_buf;
  if (__builtin_amdgcn_processor_is("gfx950") {
    __shared__ int buf[70 * 1024];
    shared_buf = buf;
  } else {
    __shared__ int buf[60 * 1024];
    shared_buf = buf;
  }

  __syncthreads();
  // use shared_buf
```

If we tried to lower that we'd exhaust LDS, and spuriously fail to compile. 
This would have originated from perfectly valid uses of `#if 
defined(__gfx950__) #else`. We'd like these to work, so we must unambiguously 
do the fold ourselves.

> > if for a chain from point of cast to final point of use folding fails 
> > (because you passed your value to an
> > opaque function, modified it based on a run time value etc.), you get an 
> > error and a diagnostic.
> 
> I was thinking you would not get a diagnostic; you'd get the behavior you 
> asked for, which may be utter nonsense.
> 

One of the difficulties here (ignoring that the utter nonsense behaviour at run 
time might be nasal demons - GPUs aren't always as polite as to issue a 
`SIGILL` and graciously die:)) is that not all constructs / IR sequences / ASM 
uses lower into ISA, so what the user is more likely to get is an ICE with an 
error that makes no sense unless they work on LLVM. That's fairly grim user 
experience, IMHO, and one that we have the ability to prevent.

> Am I missing something still? If so, maybe it would be quicker for us to hop 
> in a telecon call? I'm going to be out of the office until Monday, but I'm 
> happy to meet with you if that's more productive.

I would be absolutely happy to if you think it'd help. I regret not coming to 
the Sofia meeting, we could've probably sorted this out directly with a laptop:)



https://github.com/llvm/llvm-project/pull/134016
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

Reply via email to