michaelselehov wrote:

> Device libs should _only_ be used with a build from the freshly built 
> compiler. This is user error

@arsenm I wish it were that easy. Unfortunately, even fresh AOMP and TheRock 
builds (where device-libs and clang come out of the same CI invocation) keep 
the old `half(half)` signature in device-libs, and the test still fails. We hit 
this in the nightly TheRock build originally; that's how we got here.

The reason a rebuild doesn't change anything: this PR touches the 
classification of *aggregates* on the caller side, but `__ockl_wfred_*_f16` in 
`rocm-device-libs` is sourced as a flat `half` parameter -- not an aggregate. 
Rebuilding `ockl.bc` with a clang that has this PR applied still emits `define 
half @__ockl_wfred_min_f16(half ...)`. I diff-compared the function signature 
in `ockl.bc` coming from /opt/rocm, /COD/2026-04-19, and TheRock #1390 -- 
byte-identical for this function.

The mismatch is structural, between the HIP header type and the device-libs 
source type:

* HIP declares `__half = struct { union { _Float16 } }` (aggregate).
* device-libs' `__ockl_wfred_*_f16` takes plain `half` (scalar).

Pre-#185083 the aggregate was coerced to a small integer / scalar shape that 
the inliner could happily bitcast through, so the wrapper was inlined into the 
kernel and its DPP / ds_bpermute ran under the caller's full-wave EXEC. 
Post-#185083 the aggregate is kept as `{half}` all the way to the call site; 
the inliner won't rewrite `{half}` -> `half` SSA, the wrapper stays 
out-of-line, and the now-divergent call serializes EXEC. The DPP intrinsics 
then read from neighbour lanes that never entered the callee and return 
whatever stale data sits in those VGPRs -- which is exactly the failure pattern 
we see (`min` collapses to 0, `max` picks up garbage neighbours).

https://github.com/llvm/llvm-project/pull/185083
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to