[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Lower than native alignment is legal in AMDGPU hardware and it's possible to 
work around in the `printf` implementation, closing.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

Ah yes, libc code doing the equivalent of va_arg assuming natural alignment 
when the underlying buffer is a packed struct with fields padded to four bytes 
would not work. That would be "fixed" by changing the compiler to match the 
assumption made by libc, but it seems much better for libc to do the misaligned 
load instead. Then there's no wavesize*align-padding stack space burned at 
runtime.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Patch should not land. Need to know what bug this was trying to address to 
> guess at what the right fix would be.

My understanding was that the variadics did lowering to a struct with a minimum 
alignment of four. This currently *doesn't* do that, hence my confusion. The 
current lowering provides no padding, which I now see is a deliberate choice to 
save on stack presumably. The issue I had was that I laid out my `printf` code 
with the assumption that the buffer was a struct, so now it won't work when 
copied to another target via RPC because it's not the correct alignment.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Jon Chesterfield via cfe-commits

https://github.com/JonChesterfield requested changes to this pull request.

Patch should not land. Need to know what bug this was trying to address to 
guess at what the right fix would be.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-07-01 Thread Jon Chesterfield via cfe-commits

JonChesterfield wrote:

This PR is invalid.

First, the alignment on the eight byte pointer is supposed to be four. 
Increasing it to 8 makes things worse.

Second, I can't see any support for the claim that the code is incrementing by 
the alignment of the value, as opposed to the size.

The frame is setup as a struct instance with explicit padding by Int8Tys and 
the calculation there is correct.

The va_arg increment is done in CodeGen:emitVoidPtrVAArg, where DirectSize is 
ValueInfo.Width, aligned to the 4 byte slot size, then stored. It does not 
increment the iterator by the alignment of the type.

The lowering pass is doing exactly what was intended.


https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Hm, that's what I'm doing in the `printf` implementation and it doesn't 
> > work without that patch. When I look at the varargs struct it didn't have 
> > any padding, which explained why `alignTo(ptr + size, align)` was wrong. 
> > So, I was trying to do the following, `printf("%d%ld", 1, 1l)`. With this 
> > patch I get the following,
> 
> For what IR? Is the small struct getting expanded into individual scalar 
> pieces?

The implementation intentionally states that the alignment is always `4` in 
this case, which results in there being no padding between the four byte and 
eight byte values when put into a `void *` buffer for varargs. You should be 
able to see it in the changes to the tests. Unfortunately, @JonChesterfield is 
on vacation so I can't ask about this as it seems to be a deliberate choice, 
but I don't know how it's correct.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> > > > Incrementing by align is just a bug, of course the size is the real 
> > > > value. Whether we want to continue wasting space is another 
> > > > not-correctness discussion
> > > 
> > > 
> > > Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
> > > wouldn't mind it so much if it didn't require me to know which vendor I 
> > > was dealing with in the RPC implementation, but I suppose I could store 
> > > that information somewhere if we want to use a compressed option and we 
> > > know it works.
> > 
> > 
> > It's not about struct padding, but the base alignment. Any pointer 
> > increment should be alignTo(ptr + size, align), not += align. The += align 
> > won't even work for large structs
> 
> Hm, that's what I'm doing in the `printf` implementation and it doesn't work 
> without that patch. When I look at the varargs struct it didn't have any 
> padding, which explained why `alignTo(ptr + size, align)` was wrong. So, I 
> was trying to do the following, `printf("%d%ld", 1, 1l)`. With this patch I 
> get the following,

For what IR? Is the small struct getting expanded into individual scalar 
pieces? 

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > > Incrementing by align is just a bug, of course the size is the real 
> > > value. Whether we want to continue wasting space is another 
> > > not-correctness discussion
> > 
> > 
> > Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
> > wouldn't mind it so much if it didn't require me to know which vendor I was 
> > dealing with in the RPC implementation, but I suppose I could store that 
> > information somewhere if we want to use a compressed option and we know it 
> > works.
> 
> It's not about struct padding, but the base alignment. Any pointer increment 
> should be alignTo(ptr + size, align), not += align. The += align won't even 
> work for large structs

Hm, that's what I'm doing in the `printf` implementation and it doesn't work 
without that patch. When I look at the varargs struct it didn't have any 
padding, which explained why `alignTo(ptr + size, align)` was wrong. So, I was 
trying to do the following, `printf("%d%ld", 1, 1l)`. With this patch I get the 
following,
```
0xbebebebe0001
0x0001
```
Without this patch, I get this. As you can see there's no struct padding so the 
8 byte value is right next to the 4 byte one.
```
0x00010001
0xbebebebe
```

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> > Incrementing by align is just a bug, of course the size is the real value. 
> > Whether we want to continue wasting space is another not-correctness 
> > discussion
> 
> Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
> wouldn't mind it so much if it didn't require me to know which vendor I was 
> dealing with in the RPC implementation, but I suppose I could store that 
> information somewhere if we want to use a compressed option and we know it 
> works.

It's not about struct padding, but the base alignment. Any pointer increment 
should be alignTo(ptr + size, align), not += align. The += align won't even 
work for large structs 

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> Incrementing by align is just a bug, of course the size is the real value. 
> Whether we want to continue wasting space is another not-correctness 
> discussion

Struct padding is pretty universal, AMDGPU seems the odd one out here. I 
wouldn't mind it so much if it didn't require me to know which vendor I was 
dealing with in the RPC implementation, but I suppose I could store that 
information somewhere if we want to use a compressed option and we know it 
works.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits

arsenm wrote:

Incrementing by align is just a bug, of course the size is the real value. 
Whether we want to continue wasting space is another not-correctness discussion 

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Here, because the minimum alignment is 4, we will only increment the
> > buffer by 4,
> 
> It should be incrementing by the size? 4 byte aligned access of 8 byte type 
> should work fine

Guess that's an AMD thing, so I'm going to assume that @JonChesterfield wrote 
this intentionally to save on stack space? I suppose the issue I'm having with 
my `printf` implementation is that we then want to copy this struct and because 
it doesn't follow natural alignment the person printing it doesn't know where 
these are stored in a common sense. I suppose I could change the code to just 
be `ptr += sizeof(T)` instead of doing the alignment, but I feel like some 
architectures require strict alignment for these and it wouldn't work in the 
general case.

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Here, because the minimum alignment is 4, we will only increment the
buffer by 4, 

It should be incrementing by the size? 4 byte aligned access of 8 byte type 
should work fine

https://github.com/llvm/llvm-project/pull/96370
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-21 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Joseph Huber (jhuber6)


Changes

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.


---

Patch is 56.64 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/96370.diff


4 Files Affected:

- (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+3-8) 
- (modified) clang/test/CodeGen/amdgpu-variadic-call.c (+12-20) 
- (modified) llvm/lib/Transforms/IPO/ExpandVariadics.cpp (+5-1) 
- (modified) llvm/test/CodeGen/AMDGPU/expand-variadic-call.ll (+296-278) 


``diff
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 4d3275e17c386..a169a7d920456 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo ) const {
 RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
 QualType Ty, AggValueSlot Slot) const {
   const bool IsIndirect = false;
-  const bool AllowHigherAlign = false;
+  const bool AllowHigherAlign = true;
   return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect,
   getContext().getTypeInfoInChars(Ty),
   CharUnits::fromQuantity(4), AllowHigherAlign, Slot);
@@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType 
Ty, bool Variadic,
 
   Ty = useFirstFieldIfTransparentUnion(Ty);
 
-  if (Variadic) {
-return ABIArgInfo::getDirect(/*T=*/nullptr,
- /*Offset=*/0,
- /*Padding=*/nullptr,
- /*CanBeFlattened=*/false,
- /*Align=*/0);
-  }
+  if (Variadic)
+return ABIArgInfo::getDirect();
 
   if (isAggregateTypeForABI(Ty)) {
 // Records with non-trivial destructors/copy-constructors should not be
diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c 
b/clang/test/CodeGen/amdgpu-variadic-call.c
index 17eda215211a2..0529d6b3171c8 100644
--- a/clang/test/CodeGen/amdgpu-variadic-call.c
+++ b/clang/test/CodeGen/amdgpu-variadic-call.c
@@ -1,4 +1,3 @@
-// REQUIRES: amdgpu-registered-target
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
 // RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s 
-o - | FileCheck %s
 
@@ -179,11 +178,9 @@ typedef struct
 // CHECK-LABEL: define {{[^@]+}}@one_pair_f64
 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double 
[[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue 
[[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0
-// CHECK-NEXT:[[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] 
[[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1
-// CHECK-NEXT:tail call void (...) @sink_0([[STRUCT_PAIR_F64]] 
[[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], 
[[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (...) @sink_0(double [[V0_COERCE0]], double 
[[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double 
[[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) 
#[[ATTR2]]
 // CHECK-NEXT:ret void
 //
 void one_pair_f64(int f0, double f1, pair_f64 v0)
@@ -220,10 +217,9 @@ typedef union
 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], i32 
[[V0_COERCE:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[TMP0:%.*]] = bitcast i32 [[V0_COERCE]] to float
-// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue 
[[UNION_UNION_F32_I32:%.*]] poison, float [[TMP0]], 0
-// CHECK-NEXT:tail call void (...) @sink_0([[UNION_UNION_F32_I32]] 
[[DOTFCA_0_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], 
[[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-21 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/96370

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.


>From 5ee5bccb5dd4bd1d78dc04ead3c334d88b86f4fd Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Fri, 21 Jun 2024 19:17:42 -0500
Subject: [PATCH] [LLVM] Fix incorrect alignment on AMDGPU variadics

Summary:
The variadics lowering for AMDGPU puts all the arguments into a void
pointer struct. The current logic dictates that the minimum alignment is
four regardless of what  the underlying type is. This is incorrect in
the following case.

```c
void foo(int, ...);

void bar() {
  int x;
  void *p;
  foo(0, x, p);
}
```
Here, because the minimum alignment is 4, we will only increment the
buffer by 4, resulting in an incorrect alignment when we then try to
access the void pointer. We need to set a minimum of 4, but increase it
to 8 in cases like this.
---
 clang/lib/CodeGen/Targets/AMDGPU.cpp  |  11 +-
 clang/test/CodeGen/amdgpu-variadic-call.c |  32 +-
 llvm/lib/Transforms/IPO/ExpandVariadics.cpp   |   6 +-
 .../CodeGen/AMDGPU/expand-variadic-call.ll| 574 +-
 4 files changed, 316 insertions(+), 307 deletions(-)

diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp 
b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 4d3275e17c386..a169a7d920456 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo ) const {
 RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr,
 QualType Ty, AggValueSlot Slot) const {
   const bool IsIndirect = false;
-  const bool AllowHigherAlign = false;
+  const bool AllowHigherAlign = true;
   return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect,
   getContext().getTypeInfoInChars(Ty),
   CharUnits::fromQuantity(4), AllowHigherAlign, Slot);
@@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType 
Ty, bool Variadic,
 
   Ty = useFirstFieldIfTransparentUnion(Ty);
 
-  if (Variadic) {
-return ABIArgInfo::getDirect(/*T=*/nullptr,
- /*Offset=*/0,
- /*Padding=*/nullptr,
- /*CanBeFlattened=*/false,
- /*Align=*/0);
-  }
+  if (Variadic)
+return ABIArgInfo::getDirect();
 
   if (isAggregateTypeForABI(Ty)) {
 // Records with non-trivial destructors/copy-constructors should not be
diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c 
b/clang/test/CodeGen/amdgpu-variadic-call.c
index 17eda215211a2..0529d6b3171c8 100644
--- a/clang/test/CodeGen/amdgpu-variadic-call.c
+++ b/clang/test/CodeGen/amdgpu-variadic-call.c
@@ -1,4 +1,3 @@
-// REQUIRES: amdgpu-registered-target
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --function-signature
 // RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s 
-o - | FileCheck %s
 
@@ -179,11 +178,9 @@ typedef struct
 // CHECK-LABEL: define {{[^@]+}}@one_pair_f64
 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double 
[[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue 
[[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0
-// CHECK-NEXT:[[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] 
[[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1
-// CHECK-NEXT:tail call void (...) @sink_0([[STRUCT_PAIR_F64]] 
[[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], 
[[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
-// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (...) @sink_0(double [[V0_COERCE0]], double 
[[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double 
[[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]]
+// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef 
[[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) 
#[[ATTR2]]
 // CHECK-NEXT:ret void
 //
 void one_pair_f64(int f0, double f1, pair_f64 v0)
@@ -220,10 +217,9 @@ typedef union
 // CHECK-SAME: (i32 noundef