[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rnk added a comment. By the way, I went ahead and reverted this in r333958. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rnk added a comment. In https://reviews.llvm.org/D46042#1121674, @rjmccall wrote: > > I think we should revert this for now. Adding the alignment attribute to > > all Intel vector typedefs is a bigger change than it seems. > > Ugh. That is just an awful language rule. Would it be reasonable to > restrict it to only attributes spelled with `__declspec(align(N))` rather > than `__attribute__((aligned(N)))`, or is that too invasive in the alignment > computation? When we were working on the record layout code, I didn't want to do that because users often structure their portability headers to check for `__clang__` first because clang also defines `_MSC_VER` and `__GNUC__`. I felt it would be best if the alignment attributes were as interchangeable as possible. They are very common. Maybe checking the spelling of the packing attribute would work better. The GCC `__attribute__` spelling would ignore what we called "required alignment", meaning alignment required by explicit attributes and not the normal `alignof`. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rjmccall added a comment. In https://reviews.llvm.org/D46042#1121648, @rnk wrote: > It's the typedef alignment changes that are causing problems for us, not the > MaxVectorAlign changes. That makes more sense. The new alignment attribute > breaks our implementation of `_mm256_loadu_ps`, because the packed struct > ends up with a 32-byte alignment. Here's the implementation: > > static __inline __m256 __DEFAULT_FN_ATTRS > _mm256_loadu_ps(float const *__p) > { > struct __loadu_ps { > __m256 __v; > } __attribute__((__packed__, __may_alias__)); > return ((struct __loadu_ps*)__p)->__v; > } > > > And clang's -fdump-record-layouts says: > > *** Dumping AST Record Layout >0 | struct __loadu_ps >0 | __m256 __v > | [sizeof=32, align=32] > > > I think the problem is that `__attribute__((aligned(N)))` beats > `__attribute__((packed))` on Windows to match MSVC's behavior with > `__declspec(align(N))`. > > I think we should revert this for now. Adding the alignment attribute to all > Intel vector typedefs is a bigger change than it seems. Ugh. That is just an awful language rule. Would it be reasonable to restrict it to only attributes spelled with `__declspec(align(N))` rather than `__attribute__((aligned(N)))`, or is that too invasive in the alignment computation? Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rnk added a comment. It's the typedef alignment changes that are causing problems for us, not the MaxVectorAlign changes. That makes more sense. The new alignment attribute breaks our implementation of `_mm256_loadu_ps`, because the packed struct ends up with a 32-byte alignment. Here's the implementation: static __inline __m256 __DEFAULT_FN_ATTRS _mm256_loadu_ps(float const *__p) { struct __loadu_ps { __m256 __v; } __attribute__((__packed__, __may_alias__)); return ((struct __loadu_ps*)__p)->__v; } And clang's -fdump-record-layouts says: *** Dumping AST Record Layout 0 | struct __loadu_ps 0 | __m256 __v | [sizeof=32, align=32] I think the problem is that `__attribute__((aligned(N)))` beats `__attribute__((packed))` on Windows to match MSVC's behavior with `__declspec(align(N))`. I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rnk added a comment. This change appears to have caused some blink vector math unit tests to fail on Windows. We are tracking it at https://crbug.com/849251. It has a pretty small reproducer: #include __m256 loadit(__m256 *p) { return _mm256_loadu_ps((const float *)p); } Compile for x86_64-windows-msvc with -mavx, and before this change we got this IR: `%0 = load <8 x float>, <8 x float>* %p, align 1` After we get this IR: `%0 = load <8 x float>, <8 x float>* %p, align 32` This is surprising. I'll keep debugging. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rjmccall closed this revision. rjmccall added a comment. Landed as r333791. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rjmccall added a comment. In https://reviews.llvm.org/D46042#1088049, @scanon wrote: > In https://reviews.llvm.org/D46042#1088044, @ab wrote: > > > So, this makes sense to me, but on x86, should we also be worried about the > > fact that the calling convention is based on which features are available? > > (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if > > available). Presumably swift is also affected, no? > > > Swift's calling conventions (will?) always divide larger vectors into 128b > pieces. When interacting with C conventions, yes, this is still an issue. Right, this is just a C ABI issue. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rjmccall added a comment. In https://reviews.llvm.org/D46042#1088044, @ab wrote: > So, this makes sense to me, but on x86, should we also be worried about the > fact that the calling convention is based on which features are available? > (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). > Presumably swift is also affected, no? I'd forgotten about that. I think there's a strong argument that we're required to pass at least the Intel intrinsic vector types that way, yeah. But if we want a stable ABI for other vector types, we really can't. The root problem here is that the Intel ABI seems to imagine that these vector types only exist when they're supported directly by hardware. (And the Intel intrinsic headers do define those types even when AVX is disabled!) So I don't know that we can make a good ABI story for that. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
scanon added a comment. In https://reviews.llvm.org/D46042#1088044, @ab wrote: > So, this makes sense to me, but on x86, should we also be worried about the > fact that the calling convention is based on which features are available? > (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). > Presumably swift is also affected, no? Swift's calling conventions (will?) always divide larger vectors into 128b pieces. When interacting with C conventions, yes, this is still an issue. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
ab added a comment. So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no? Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rjmccall added a comment. I think we should seriously consider making alignment attributes on typedefs (and maybe some other attributes like may_alias) actual type qualifiers that are preserved in the canonical type, mangled, and so on. It would be an ABI break, but it'd also solve a lot of problems. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
ahatanak added a comment. > Note that this sort of attribute is stripped from template arguments in > template substitution, so there's a possibility that code templated over > vectors will produce inadequately-aligned objects. I was wondering whether there is a warning clang issues when the aligned attribute is stripped. If it doesn't warn, should it? I recently came across a case where a 16-byte vector annotated with a 4-byte alignment was passed to std::swap, which caused a crash because the alignment was stripped and the x86 backend decided to emit an 16-byte aligned load to load an unaligned vector. Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
ahatanak accepted this revision. ahatanak added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D46042 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46042: Cap vector alignment at 16 for all Darwin platforms
rjmccall created this revision. Herald added a reviewer: javed.absar. Herald added subscribers: cfe-commits, kristof.beyls. This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. Repository: rC Clang https://reviews.llvm.org/D46042 Files: lib/Basic/Targets/OSTargets.h lib/Basic/Targets/X86.h lib/CodeGen/CGBuiltin.cpp lib/Headers/avx512fintrin.h lib/Headers/avxintrin.h test/CodeGen/arm-swiftcall.c test/CodeGen/vector-alignment.c test/CodeGenCXX/align-avx-complete-objects.cpp Index: test/CodeGenCXX/align-avx-complete-objects.cpp === --- test/CodeGenCXX/align-avx-complete-objects.cpp +++ test/CodeGenCXX/align-avx-complete-objects.cpp @@ -12,7 +12,7 @@ return r[0]; } -// CHECK: [[R:%.*]] = alloca <8 x float>, align 32 +// CHECK: [[R:%.*]] = alloca <8 x float>, align 16 // CHECK-NEXT: [[CALL:%.*]] = call i8* @_Znwm(i64 32) // CHECK-NEXT: [[ZERO:%.*]] = bitcast i8* [[CALL]] to <8 x float>* // CHECK-NEXT: store <8 x float>* [[ZERO]], <8 x float>** [[P:%.*]], align 8 @@ -22,8 +22,8 @@ // CHECK-NEXT: store volatile <8 x float> [[TWO]], <8 x float>* [[THREE]], align 16 // CHECK-NEXT: [[FOUR:%.*]] = load <8 x float>*, <8 x float>** [[P]], align 8 // CHECK-NEXT: [[FIVE:%.*]] = load volatile <8 x float>, <8 x float>* [[FOUR]], align 16 -// CHECK-NEXT: store <8 x float> [[FIVE]], <8 x float>* [[R]], align 32 -// CHECK-NEXT: [[SIX:%.*]] = load <8 x float>, <8 x float>* [[R]], align 32 +// CHECK-NEXT: store <8 x float> [[FIVE]], <8 x float>* [[R]], align 16 +// CHECK-NEXT: [[SIX:%.*]] = load <8 x float>, <8 x float>* [[R]], align 16 // CHECK-NEXT: [[VECEXT:%.*]] = extractelement <8 x float> [[SIX]], i32 0 // CHECK-NEXT: ret float [[VECEXT]] Index: test/CodeGen/vector-alignment.c === --- test/CodeGen/vector-alignment.c +++ test/CodeGen/vector-alignment.c @@ -1,38 +1,68 @@ // RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 \ -// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=SSE +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_SSE // RUN: %clang_cc1 -w -triple i386-apple-darwin10 \ -// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=SSE +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_SSE // RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 -target-feature +avx \ -// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=AVX +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX // RUN: %clang_cc1 -w -triple i386-apple-darwin10 -target-feature +avx \ -// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=AVX +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX // RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 -target-feature +avx512f \ -// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=AVX512 +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX512 // RUN: %clang_cc1 -w -triple i386-apple-darwin10 -target-feature +avx512f \ -// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=AVX512 +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX512 +// RUN: %clang_cc1 -w -triple armv7-apple-ios10 \ +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_ARM32 +// RUN: %clang_cc1 -w -triple arm64-apple-ios10 \ +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_ARM64 + +// RUN: %clang_cc1 -w -triple x86_64-pc-linux \ +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=GENERIC +// RUN: %clang_cc1 -w -triple i386-pc-linux \ +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=GENERIC +// RUN: %clang_cc1 -w -triple x86_64-pc-linux -target-feature +avx \ +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=GENERIC +// RUN: %clang_cc1 -w -triple i386-pc-linux -target-feature +avx \ +// RUN: -emit-llvm -o - %s | FileCheck %s --check-prefix=ALL --check-prefix=GENERIC +// RUN: %cl