[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
jhuber6 wrote: Lower than native alignment is legal in AMDGPU hardware and it's possible to work around in the `printf` implementation, closing. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
JonChesterfield wrote: Ah yes, libc code doing the equivalent of va_arg assuming natural alignment when the underlying buffer is a packed struct with fields padded to four bytes would not work. That would be "fixed" by changing the compiler to match the assumption made by libc, but it seems much better for libc to do the misaligned load instead. Then there's no wavesize*align-padding stack space burned at runtime. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
jhuber6 wrote: > Patch should not land. Need to know what bug this was trying to address to > guess at what the right fix would be. My understanding was that the variadics did lowering to a struct with a minimum alignment of four. This currently *doesn't* do that, hence my confusion. The current lowering provides no padding, which I now see is a deliberate choice to save on stack presumably. The issue I had was that I laid out my `printf` code with the assumption that the buffer was a struct, so now it won't work when copied to another target via RPC because it's not the correct alignment. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
https://github.com/JonChesterfield requested changes to this pull request. Patch should not land. Need to know what bug this was trying to address to guess at what the right fix would be. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
JonChesterfield wrote: This PR is invalid. First, the alignment on the eight byte pointer is supposed to be four. Increasing it to 8 makes things worse. Second, I can't see any support for the claim that the code is incrementing by the alignment of the value, as opposed to the size. The frame is setup as a struct instance with explicit padding by Int8Tys and the calculation there is correct. The va_arg increment is done in CodeGen:emitVoidPtrVAArg, where DirectSize is ValueInfo.Width, aligned to the 4 byte slot size, then stored. It does not increment the iterator by the alignment of the type. The lowering pass is doing exactly what was intended. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
jhuber6 wrote: > > Hm, that's what I'm doing in the `printf` implementation and it doesn't > > work without that patch. When I look at the varargs struct it didn't have > > any padding, which explained why `alignTo(ptr + size, align)` was wrong. > > So, I was trying to do the following, `printf("%d%ld", 1, 1l)`. With this > > patch I get the following, > > For what IR? Is the small struct getting expanded into individual scalar > pieces? The implementation intentionally states that the alignment is always `4` in this case, which results in there being no padding between the four byte and eight byte values when put into a `void *` buffer for varargs. You should be able to see it in the changes to the tests. Unfortunately, @JonChesterfield is on vacation so I can't ask about this as it seems to be a deliberate choice, but I don't know how it's correct. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
arsenm wrote: > > > > Incrementing by align is just a bug, of course the size is the real > > > > value. Whether we want to continue wasting space is another > > > > not-correctness discussion > > > > > > > > > Struct padding is pretty universal, AMDGPU seems the odd one out here. I > > > wouldn't mind it so much if it didn't require me to know which vendor I > > > was dealing with in the RPC implementation, but I suppose I could store > > > that information somewhere if we want to use a compressed option and we > > > know it works. > > > > > > It's not about struct padding, but the base alignment. Any pointer > > increment should be alignTo(ptr + size, align), not += align. The += align > > won't even work for large structs > > Hm, that's what I'm doing in the `printf` implementation and it doesn't work > without that patch. When I look at the varargs struct it didn't have any > padding, which explained why `alignTo(ptr + size, align)` was wrong. So, I > was trying to do the following, `printf("%d%ld", 1, 1l)`. With this patch I > get the following, For what IR? Is the small struct getting expanded into individual scalar pieces? https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
jhuber6 wrote: > > > Incrementing by align is just a bug, of course the size is the real > > > value. Whether we want to continue wasting space is another > > > not-correctness discussion > > > > > > Struct padding is pretty universal, AMDGPU seems the odd one out here. I > > wouldn't mind it so much if it didn't require me to know which vendor I was > > dealing with in the RPC implementation, but I suppose I could store that > > information somewhere if we want to use a compressed option and we know it > > works. > > It's not about struct padding, but the base alignment. Any pointer increment > should be alignTo(ptr + size, align), not += align. The += align won't even > work for large structs Hm, that's what I'm doing in the `printf` implementation and it doesn't work without that patch. When I look at the varargs struct it didn't have any padding, which explained why `alignTo(ptr + size, align)` was wrong. So, I was trying to do the following, `printf("%d%ld", 1, 1l)`. With this patch I get the following, ``` 0xbebebebe0001 0x0001 ``` Without this patch, I get this. As you can see there's no struct padding so the 8 byte value is right next to the 4 byte one. ``` 0x00010001 0xbebebebe ``` https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
arsenm wrote: > > Incrementing by align is just a bug, of course the size is the real value. > > Whether we want to continue wasting space is another not-correctness > > discussion > > Struct padding is pretty universal, AMDGPU seems the odd one out here. I > wouldn't mind it so much if it didn't require me to know which vendor I was > dealing with in the RPC implementation, but I suppose I could store that > information somewhere if we want to use a compressed option and we know it > works. It's not about struct padding, but the base alignment. Any pointer increment should be alignTo(ptr + size, align), not += align. The += align won't even work for large structs https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
jhuber6 wrote: > Incrementing by align is just a bug, of course the size is the real value. > Whether we want to continue wasting space is another not-correctness > discussion Struct padding is pretty universal, AMDGPU seems the odd one out here. I wouldn't mind it so much if it didn't require me to know which vendor I was dealing with in the RPC implementation, but I suppose I could store that information somewhere if we want to use a compressed option and we know it works. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
arsenm wrote: Incrementing by align is just a bug, of course the size is the real value. Whether we want to continue wasting space is another not-correctness discussion https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
jhuber6 wrote: > > Here, because the minimum alignment is 4, we will only increment the > > buffer by 4, > > It should be incrementing by the size? 4 byte aligned access of 8 byte type > should work fine Guess that's an AMD thing, so I'm going to assume that @JonChesterfield wrote this intentionally to save on stack space? I suppose the issue I'm having with my `printf` implementation is that we then want to copy this struct and because it doesn't follow natural alignment the person printing it doesn't know where these are stored in a common sense. I suppose I could change the code to just be `ptr += sizeof(T)` instead of doing the alignment, but I feel like some architectures require strict alignment for these and it wouldn't work in the general case. https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
arsenm wrote: > Here, because the minimum alignment is 4, we will only increment the buffer by 4, It should be incrementing by the size? 4 byte aligned access of 8 byte type should work fine https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen Author: Joseph Huber (jhuber6) Changes Summary: The variadics lowering for AMDGPU puts all the arguments into a void pointer struct. The current logic dictates that the minimum alignment is four regardless of what the underlying type is. This is incorrect in the following case. ```c void foo(int, ...); void bar() { int x; void *p; foo(0, x, p); } ``` Here, because the minimum alignment is 4, we will only increment the buffer by 4, resulting in an incorrect alignment when we then try to access the void pointer. We need to set a minimum of 4, but increase it to 8 in cases like this. --- Patch is 56.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96370.diff 4 Files Affected: - (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+3-8) - (modified) clang/test/CodeGen/amdgpu-variadic-call.c (+12-20) - (modified) llvm/lib/Transforms/IPO/ExpandVariadics.cpp (+5-1) - (modified) llvm/test/CodeGen/AMDGPU/expand-variadic-call.ll (+296-278) ``diff diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index 4d3275e17c386..a169a7d920456 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo ) const { RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr, QualType Ty, AggValueSlot Slot) const { const bool IsIndirect = false; - const bool AllowHigherAlign = false; + const bool AllowHigherAlign = true; return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect, getContext().getTypeInfoInChars(Ty), CharUnits::fromQuantity(4), AllowHigherAlign, Slot); @@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty, bool Variadic, Ty = useFirstFieldIfTransparentUnion(Ty); - if (Variadic) { -return ABIArgInfo::getDirect(/*T=*/nullptr, - /*Offset=*/0, - /*Padding=*/nullptr, - /*CanBeFlattened=*/false, - /*Align=*/0); - } + if (Variadic) +return ABIArgInfo::getDirect(); if (isAggregateTypeForABI(Ty)) { // Records with non-trivial destructors/copy-constructors should not be diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c b/clang/test/CodeGen/amdgpu-variadic-call.c index 17eda215211a2..0529d6b3171c8 100644 --- a/clang/test/CodeGen/amdgpu-variadic-call.c +++ b/clang/test/CodeGen/amdgpu-variadic-call.c @@ -1,4 +1,3 @@ -// REQUIRES: amdgpu-registered-target // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature // RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s @@ -179,11 +178,9 @@ typedef struct // CHECK-LABEL: define {{[^@]+}}@one_pair_f64 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double [[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] { // CHECK-NEXT: entry: -// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0 -// CHECK-NEXT:[[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] [[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1 -// CHECK-NEXT:tail call void (...) @sink_0([[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]] -// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]] -// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]] +// CHECK-NEXT:tail call void (...) @sink_0(double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]] +// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]] +// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]] // CHECK-NEXT:ret void // void one_pair_f64(int f0, double f1, pair_f64 v0) @@ -220,10 +217,9 @@ typedef union // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], i32 [[V0_COERCE:%.*]]) local_unnamed_addr #[[ATTR0]] { // CHECK-NEXT: entry: // CHECK-NEXT:[[TMP0:%.*]] = bitcast i32 [[V0_COERCE]] to float -// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue [[UNION_UNION_F32_I32:%.*]] poison, float [[TMP0]], 0 -// CHECK-NEXT:tail call void (...) @sink_0([[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]] -// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[UNION_UNION_F32_I32]] [[DOTFCA_0_INSERT]]) #[[ATTR2]] -// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32
[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/96370 Summary: The variadics lowering for AMDGPU puts all the arguments into a void pointer struct. The current logic dictates that the minimum alignment is four regardless of what the underlying type is. This is incorrect in the following case. ```c void foo(int, ...); void bar() { int x; void *p; foo(0, x, p); } ``` Here, because the minimum alignment is 4, we will only increment the buffer by 4, resulting in an incorrect alignment when we then try to access the void pointer. We need to set a minimum of 4, but increase it to 8 in cases like this. >From 5ee5bccb5dd4bd1d78dc04ead3c334d88b86f4fd Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 21 Jun 2024 19:17:42 -0500 Subject: [PATCH] [LLVM] Fix incorrect alignment on AMDGPU variadics Summary: The variadics lowering for AMDGPU puts all the arguments into a void pointer struct. The current logic dictates that the minimum alignment is four regardless of what the underlying type is. This is incorrect in the following case. ```c void foo(int, ...); void bar() { int x; void *p; foo(0, x, p); } ``` Here, because the minimum alignment is 4, we will only increment the buffer by 4, resulting in an incorrect alignment when we then try to access the void pointer. We need to set a minimum of 4, but increase it to 8 in cases like this. --- clang/lib/CodeGen/Targets/AMDGPU.cpp | 11 +- clang/test/CodeGen/amdgpu-variadic-call.c | 32 +- llvm/lib/Transforms/IPO/ExpandVariadics.cpp | 6 +- .../CodeGen/AMDGPU/expand-variadic-call.ll| 574 +- 4 files changed, 316 insertions(+), 307 deletions(-) diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp index 4d3275e17c386..a169a7d920456 100644 --- a/clang/lib/CodeGen/Targets/AMDGPU.cpp +++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp @@ -121,7 +121,7 @@ void AMDGPUABIInfo::computeInfo(CGFunctionInfo ) const { RValue AMDGPUABIInfo::EmitVAArg(CodeGenFunction , Address VAListAddr, QualType Ty, AggValueSlot Slot) const { const bool IsIndirect = false; - const bool AllowHigherAlign = false; + const bool AllowHigherAlign = true; return emitVoidPtrVAArg(CGF, VAListAddr, Ty, IsIndirect, getContext().getTypeInfoInChars(Ty), CharUnits::fromQuantity(4), AllowHigherAlign, Slot); @@ -212,13 +212,8 @@ ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty, bool Variadic, Ty = useFirstFieldIfTransparentUnion(Ty); - if (Variadic) { -return ABIArgInfo::getDirect(/*T=*/nullptr, - /*Offset=*/0, - /*Padding=*/nullptr, - /*CanBeFlattened=*/false, - /*Align=*/0); - } + if (Variadic) +return ABIArgInfo::getDirect(); if (isAggregateTypeForABI(Ty)) { // Records with non-trivial destructors/copy-constructors should not be diff --git a/clang/test/CodeGen/amdgpu-variadic-call.c b/clang/test/CodeGen/amdgpu-variadic-call.c index 17eda215211a2..0529d6b3171c8 100644 --- a/clang/test/CodeGen/amdgpu-variadic-call.c +++ b/clang/test/CodeGen/amdgpu-variadic-call.c @@ -1,4 +1,3 @@ -// REQUIRES: amdgpu-registered-target // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature // RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s @@ -179,11 +178,9 @@ typedef struct // CHECK-LABEL: define {{[^@]+}}@one_pair_f64 // CHECK-SAME: (i32 noundef [[F0:%.*]], double noundef [[F1:%.*]], double [[V0_COERCE0:%.*]], double [[V0_COERCE1:%.*]]) local_unnamed_addr #[[ATTR0]] { // CHECK-NEXT: entry: -// CHECK-NEXT:[[DOTFCA_0_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64:%.*]] poison, double [[V0_COERCE0]], 0 -// CHECK-NEXT:[[DOTFCA_1_INSERT:%.*]] = insertvalue [[STRUCT_PAIR_F64]] [[DOTFCA_0_INSERT]], double [[V0_COERCE1]], 1 -// CHECK-NEXT:tail call void (...) @sink_0([[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]] -// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]] -// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], [[STRUCT_PAIR_F64]] [[DOTFCA_1_INSERT]]) #[[ATTR2]] +// CHECK-NEXT:tail call void (...) @sink_0(double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]] +// CHECK-NEXT:tail call void (i32, ...) @sink_1(i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]] +// CHECK-NEXT:tail call void (double, i32, ...) @sink_2(double noundef [[F1]], i32 noundef [[F0]], double [[V0_COERCE0]], double [[V0_COERCE1]]) #[[ATTR2]] // CHECK-NEXT:ret void // void one_pair_f64(int f0, double f1, pair_f64 v0) @@ -220,10 +217,9 @@ typedef union // CHECK-SAME: (i32 noundef