from:"Matt Arsenault via cfe\-commits"

[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-04 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> Think it would be useful to put that on functions in the wrapper headers that 
> definitely aren't convergent? E.g. getting a thread id.

You could, but it's trivially inferable in those cases anyway 



https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-04 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-03 Thread Matt Arsenault via cfe-commits



@@ -4106,9 +4106,10 @@ bool CompilerInvocation::ParseLangArgs(LangOptions 
&Opts, ArgList &Args,
   Opts.Blocks = Args.hasArg(OPT_fblocks) || (Opts.OpenCL
 && Opts.OpenCLVersion == 200);
 
-  Opts.ConvergentFunctions = Args.hasArg(OPT_fconvergent_functions) ||
- Opts.OpenCL || (Opts.CUDA && Opts.CUDAIsDevice) ||
- Opts.SYCLIsDevice || Opts.HLSL;
+  Opts.ConvergentFunctions = Args.hasFlag(
+  OPT_fconvergent_functions, OPT_fno_convergent_functions,
+  Opts.OpenMPIsTargetDevice || T.isAMDGPU() || T.isNVPTX() || Opts.OpenCL 
||
+  Opts.CUDAIsDevice || Opts.SYCLIsDevice || Opts.HLSL);

arsenm wrote:

Sort all the language checks together, before the target list. We probably 
should have a hasConvergentOperations() predicate somewhere 

https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Automatically enable `-fconvergent-functions` on GPU targets (PR #111076)

2024-10-03 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> -fno-convergent-functions to opt-out if you want to test broken behavior. 

You may legitimately know there are no convergent functions in the TU. We also 
have the noconvergent source attribute now for this 

https://github.com/llvm/llvm-project/pull/111076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,187 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU
+#endif
+
+#include 
+#include 
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline))
+#else
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _private __attribute__((opencl_private))
+#define _constant __attribute__((opencl_constant))
+#define _local __attribute__((opencl_local))
+#define _global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workgruops in the grid.
+_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() {
+  return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the absolute id of the AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_block_id() {
+  return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() +
+ _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workitems in the workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_num_threads() {
+  return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the absolute id of the thread in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_thread_id() {
+  return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() +
+ _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t _get_lane_size() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() {
+  return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u));
+}
+
+// Returns the bit-mask of active threads in

[clang] [Clang] Implement resource directory headers for common GPU intrinsics (PR #110179)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,187 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU
+#endif
+
+#include 
+#include 
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device)) __attribute__((always_inline))
+#else
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _private __attribute__((opencl_private))
+#define _constant __attribute__((opencl_constant))
+#define _local __attribute__((opencl_local))
+#define _global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t _get_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workgruops in the grid.
+_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() {
+  return _get_num_blocks_x() * _get_num_blocks_y() * _get_num_blocks_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t _get_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the absolute id of the AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_block_id() {
+  return _get_block_id_x() + _get_num_blocks_x() * _get_block_id_y() +
+ _get_num_blocks_x() * _get_num_blocks_y() * _get_block_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t _get_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the total number of workitems in the workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_num_threads() {
+  return _get_num_threads_x() * _get_num_threads_y() * _get_num_threads_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t _get_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the absolute id of the thread in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint64_t _get_thread_id() {
+  return _get_thread_id_x() + _get_num_threads_x() * _get_thread_id_y() +
+ _get_num_threads_x() * _get_num_threads_y() * _get_thread_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t _get_lane_size() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() {

arsenm wrote:

We should really just rip out the convergent source attribute. We should only 
have noconvergent

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))
+return std::pair(Ptr, AddressSpace::CrossWorkgroup);
+
+  return std::pair(nullptr, UINT32_MAX);
+}

arsenm wrote:

This is the fancy stuff that should go into a follow up patch to add assume 
support 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {

arsenm wrote:

Move to separate change, not sure this is necessarily valid for spirv 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -178,6 +266,9 @@ void SPIRVPassConfig::addIRPasses() {
 addPass(createSPIRVStructurizerPass());
   }
 
+  if (TM.getOptLevel() > CodeGenOptLevel::None)
+addPass(createInferAddressSpacesPass(AddressSpace::Generic));

arsenm wrote:

Not sure why this is a pass parameter to InferAddressSpaces, and a TTI hook 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))

arsenm wrote:

Shouldn't be looking at the amdgcn intrinsics? Surely spirv has its own 
operations for this? 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -91,6 +97,88 @@ SPIRVTargetMachine::SPIRVTargetMachine(const Target &T, 
const Triple &TT,
   setRequiresStructuredCFG(false);
 }
 
+enum AddressSpace {
+  Function = storageClassToAddressSpace(SPIRV::StorageClass::Function),
+  CrossWorkgroup =
+  storageClassToAddressSpace(SPIRV::StorageClass::CrossWorkgroup),
+  UniformConstant =
+  storageClassToAddressSpace(SPIRV::StorageClass::UniformConstant),
+  Workgroup = storageClassToAddressSpace(SPIRV::StorageClass::Workgroup),
+  Generic = storageClassToAddressSpace(SPIRV::StorageClass::Generic)
+};
+
+unsigned SPIRVTargetMachine::getAssumedAddrSpace(const Value *V) const {
+  const auto *LD = dyn_cast(V);
+  if (!LD)
+return UINT32_MAX;
+
+  // It must be a load from a pointer to Generic.
+  assert(V->getType()->isPointerTy() &&
+ V->getType()->getPointerAddressSpace() == AddressSpace::Generic);
+
+  const auto *Ptr = LD->getPointerOperand();
+  if (Ptr->getType()->getPointerAddressSpace() != 
AddressSpace::UniformConstant)
+return UINT32_MAX;
+  // For a loaded from a pointer to UniformConstant, we can infer 
CrossWorkgroup
+  // storage, as this could only have been legally initialised with a
+  // CrossWorkgroup (aka device) constant pointer.
+  return AddressSpace::CrossWorkgroup;
+}
+
+std::pair
+SPIRVTargetMachine::getPredicatedAddrSpace(const Value *V) const {
+  using namespace PatternMatch;
+
+  if (auto *II = dyn_cast(V)) {
+switch (II->getIntrinsicID()) {
+case Intrinsic::amdgcn_is_shared:
+  return std::pair(II->getArgOperand(0), AddressSpace::Workgroup);
+case Intrinsic::amdgcn_is_private:
+  return std::pair(II->getArgOperand(0), AddressSpace::Function);
+default:
+  break;
+}
+return std::pair(nullptr, UINT32_MAX);
+  }
+  // Check the global pointer predication based on
+  // (!is_share(p) && !is_private(p)). Note that logic 'and' is commutative and
+  // the order of 'is_shared' and 'is_private' is not significant.
+  Value *Ptr;
+  if (getTargetTriple().getVendor() == Triple::VendorType::AMD &&
+  match(
+  const_cast(V),
+  
m_c_And(m_Not(m_Intrinsic(m_Value(Ptr))),
+m_Not(m_Intrinsic(
+m_Deferred(Ptr))
+return std::pair(Ptr, AddressSpace::CrossWorkgroup);
+
+  return std::pair(nullptr, UINT32_MAX);
+}
+
+bool SPIRVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS,
+ unsigned DestAS) const {
+  if (SrcAS != AddressSpace::Generic && SrcAS != AddressSpace::CrossWorkgroup)
+return false;
+  return DestAS == AddressSpace::Generic ||
+ DestAS == AddressSpace::CrossWorkgroup;
+}

arsenm wrote:

This is separate, I don't think InferAddressSpaces relies on this 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [llvm][opt][Transforms][SPIR-V] Enable `InferAddressSpaces` for SPIR-V (PR #110897)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

arsenm wrote:

You don't need to duplicate all of these tests. You just need some basic 
samples that the target is implemented, the full set is testing pass mechanics 
which can be done on any target 

https://github.com/llvm/llvm-project/pull/110897
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NFC][TableGen] Change `Record::getSuperClasses` to use const Record* (PR #110845)

2024-10-02 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110845
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)

2024-10-02 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110747
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [TableGen] Change `DefInit::Def` to a const Record pointer (PR #110747)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -1660,7 +1660,7 @@ class Record {
   // this record.
   SmallVector Locs;
   SmallVector ForwardDeclarationLocs;
-  SmallVector ReferenceLocs;
+  mutable SmallVector ReferenceLocs;

arsenm wrote:

You have the const_cast on the addition, so this is unnecessary? 

https://github.com/llvm/llvm-project/pull/110747
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

The codegen prepare behavior is still backend code to be tested. You can just 
run codegenprepare as a standalone pass too (usually would have separate llc 
and opt run lines in such a test) 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)

2024-10-02 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,102 @@
+//===-- LLVMTargetMachineC.cpp 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file implements the LLVM-C part of TargetMachine.h that directly
+// depends on the CodeGen library.
+//
+//===--===//
+
+#include "llvm-c/Core.h"
+#include "llvm-c/TargetMachine.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/IR/LegacyPassManager.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+
+static TargetMachine *unwrap(LLVMTargetMachineRef P) {
+  return reinterpret_cast(P);
+}
+
+static Target *unwrap(LLVMTargetRef P) { return reinterpret_cast(P); 
}
+
+static LLVMTargetMachineRef wrap(const TargetMachine *P) {
+  return reinterpret_cast(const_cast(P));
+}
+
+static LLVMTargetRef wrap(const Target *P) {
+  return reinterpret_cast(const_cast(P));
+}
+
+static LLVMBool LLVMTargetMachineEmit(LLVMTargetMachineRef T, LLVMModuleRef M,
+  raw_pwrite_stream &OS,
+  LLVMCodeGenFileType codegen,
+  char **ErrorMessage) {
+  TargetMachine *TM = unwrap(T);
+  Module *Mod = unwrap(M);
+
+  legacy::PassManager pass;
+  MachineModuleInfo MMI(static_cast(TM));
+
+  std::string error;
+
+  Mod->setDataLayout(TM->createDataLayout());
+
+  CodeGenFileType ft;
+  switch (codegen) {
+  case LLVMAssemblyFile:
+ft = CodeGenFileType::AssemblyFile;
+break;
+  default:
+ft = CodeGenFileType::ObjectFile;
+break;
+  }
+  if (TM->addPassesToEmitFile(pass, MMI, OS, nullptr, ft)) {
+error = "TargetMachine can't emit a file of this type";
+*ErrorMessage = strdup(error.c_str());
+return true;
+  }
+
+  pass.run(*Mod);
+
+  OS.flush();
+  return false;
+}
+
+LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M,
+ const char *Filename,
+ LLVMCodeGenFileType codegen,
+ char **ErrorMessage) {
+  std::error_code EC;
+  raw_fd_ostream dest(Filename, EC, sys::fs::OF_None);
+  if (EC) {
+*ErrorMessage = strdup(EC.message().c_str());
+return true;
+  }
+  bool Result = LLVMTargetMachineEmit(T, M, dest, codegen, ErrorMessage);
+  dest.flush();
+  return Result;
+}
+
+LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T,
+ LLVMModuleRef M,
+ LLVMCodeGenFileType codegen,
+ char **ErrorMessage,
+ LLVMMemoryBufferRef *OutMemBuf) {
+  SmallString<0> CodeString;
+  raw_svector_ostream OStream(CodeString);
+  bool Result = LLVMTargetMachineEmit(T, M, OStream, codegen, ErrorMessage);
+
+  StringRef Data = OStream.str();
+  *OutMemBuf =
+  LLVMCreateMemoryBufferWithMemoryRangeCopy(Data.data(), Data.size(), "");
+  return Result;
+}

arsenm wrote:

Missing newline at end of file 

https://github.com/llvm/llvm-project/pull/110443
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-02 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> with the PR pulled in (on top of LLVM's HEAD 
> [aadfba9](https://github.com/llvm/llvm-project/commit/aadfba9b2aa107f9cada2fd9bcbe612cbf560650)),
>  the compilation command is: `clang++ -cl-std=CL2.0 -emit-llvm -c -x cl -g0 
> --target=spir -Xclang -finclude-default-header -O2 test.cl` The output LLVM 
> IR after the optimizations is:

You want spirv, not spir 

> note bitcast to i128 with the following truncation to i96 - those types 
> aren't part of the datalayout, yet some optimization generated them. So 
> something has to be done with it and changing the datalayout is not enough.

Any pass is allowed to introduce any IR type. This field is a pure optimization 
hint. It is not required to do anything, and places no restrictions on any pass

> 
> > This does not mean arbitrary integer bitwidths do not work. The n field is 
> > weird, it's more of an optimization hint.
> 
> And I can imagine that we would want to not only be able to emit 4-bit 
> integers in the frontend, but also allow optimization passes to emit them. 

Just because there's an extension doesn't mean it's desirable to use them. On 
real targets, they'll end up codegenning in wider types anyway

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> 1. Usually (or at least AFAIK) optimization passes won't consider datalayout 
> automatically, 

The datalayout is a widely used global constant. There's no option of "not 
considering it"

>  Do you plan to go over LLVM passes adding this check?

There's nothing new to do here. This has always existed

> 2. Some existing and future extensions might allow extra bit widths for 
> integers. 

This does not mean arbitrary integer bitwidths do not work. The n field is 
weird, it's more of an optimization hint.

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits



@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

> Right but it's relying on a non-guaranteed maybe-optimisation firing, as far 
> as I can tell.

The point is to test the optimization does work. The codegen pipeline is a 
bunch of intertwined IR passes on top of core codegen, and they need to 
cooperate 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-10-01 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> > I would like to avoid adding additional special properties to AS0, or 
> > defining the flat concept.
> 
> How can we add a new specification w/o defining it?

By not defining it in terms of flat addressing. Just make it the undesirable 
address space

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits



@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) {
   // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't
   // mean anything.
   if (Arch == Triple::spirv32)
-return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-"
+   "v256:256-v512:512-v1024:1024-n8:16:32:64-G1";
   if (TT.getVendor() == Triple::VendorType::AMD &&
   TT.getOS() == Triple::OSType::AMDHSA)
-return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0";
-  return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
- "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-"
+   "v512:512-v1024:1024-n32:64-S32-G1-P4-A0";

arsenm wrote:

AMDGPU sets S32 now, which isn't wrong. But the rest of codegen assumes 16-byte 
alignment by default 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits



@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

That is not the nature of this kind of test

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)

2024-10-01 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110198
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)

2024-10-01 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110506
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits



@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

This one is testing codegenprepare as part of the normal codegen pipeline, so 
this one is fine. The other case was a full optimization pipeline + codegen, 
which are more far removed 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits



@@ -1,56 +0,0 @@
-; This test aims to check ability to support "Arithmetic with Overflow" 
intrinsics

arsenm wrote:

Not sure what the problem is with this test, but it's already covered by 
another? 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V (PR #110695)

2024-10-01 Thread Matt Arsenault via cfe-commits



@@ -54,14 +54,14 @@ static std::string computeDataLayout(const Triple &TT) {
   // memory model used for graphics: PhysicalStorageBuffer64. But it shouldn't
   // mean anything.
   if (Arch == Triple::spirv32)
-return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-"
+   "v256:256-v512:512-v1024:1024-n8:16:32:64-G1";
   if (TT.getVendor() == Triple::VendorType::AMD &&
   TT.getOS() == Triple::OSType::AMDHSA)
-return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
-   "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0";
-  return "e-i64:64-v16:16-v24:32-v32:32-v48:64-"
- "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1";
+return "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-"
+   "v512:512-v1024:1024-n32:64-S32-G1-P4-A0";

arsenm wrote:

The stack alignment should be 16 bytes (S128), but that's not mentioned in the 
description. Do this separately? I'm pretty sure this is wrong for the amdgcn 
triples too 

https://github.com/llvm/llvm-project/pull/110695
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [IR] Allow fast math flags on calls with homogeneous FP struct types (PR #110506)

2024-09-30 Thread Matt Arsenault via cfe-commits



@@ -1122,6 +1122,26 @@ define void @fastMathFlagsForArrayCalls([2 x float] %f, 
[2 x double] %d1, [2 x <
   ret void
 }
 
+declare { float, float } @fmf_struct_f32()
+declare { double, double } @fmf_struct_f64()
+declare { <4 x double>, <4 x double> } @fmf_struct_v4f64()
+
+; CHECK-LABEL: fastMathFlagsForStructCalls(
+define void @fastMathFlagsForStructCalls({ float, float } %f, { double, double 
} %d1, { <4 x double>, <4 x double> } %d2) {
+  %call.fast = call fast { float, float } @fmf_struct_f32()
+  ; CHECK: %call.fast = call fast { float, float } @fmf_struct_f32()
+
+  ; Throw in some other attributes to make sure those stay in the right places.
+
+  %call.nsz.arcp = notail call nsz arcp { double, double } @fmf_struct_f64()
+  ; CHECK: %call.nsz.arcp = notail call nsz arcp { double, double } 
@fmf_struct_f64()
+
+  %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x double> } 
@fmf_struct_v4f64()
+  ; CHECK: %call.nnan.ninf = tail call nnan ninf fastcc { <4 x double>, <4 x 
double> } @fmf_struct_v4f64()
+

arsenm wrote:

Can you also add a test with nofpclass attributes on the return / argument? The 
intent was it would be allowed for the same types as FPMathOperator 

https://github.com/llvm/llvm-project/pull/110506
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [mlir] Make Ownership of MachineModuleInfo in Its Wrapper Pass External (PR #110443)

2024-09-30 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> * Move the MC emission functions in `TargetMachine` to `LLVMTargetMachine`. 
> With the changes in this PR, we explicitly assume in both 
> `addPassesToEmitFile` and `addPassesToEmitMC` that the `TargetMachine` is an 
> `LLVMTargetMachine`; Hence it does not make sense for these functions to be 
> present in the `TargetMachine` interface.

Was this already implicitly assumed? IIRC there was some layering reason why 
this is the way it was. There were previous attempts to merge these before, 
which were abandoned: 

https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html

https://reviews.llvm.org/D38482
https://reviews.llvm.org/D38489

https://github.com/llvm/llvm-project/pull/110443
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)

2024-09-30 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/110032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-30 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> With the constrained intrinsics the default is safe because optimizations 
> don't recognize the constrained intrinsic and thus don't know how to optimize 
> it. If we instead rely on the strictfp attribute then we'll need possibly 
> thousands of checks for this attribute, we'll need everyone going forward to 
> remember to check for it, and we'll have no way to verify that this rule is 
> being followed.

The current state already requires you to check this for any library calls. Not 
sure any wide audit of those ever happened. I don't see a better alternative to 
cover those, plus the full set of target intrinsics. 


https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [LLVM][TableGen] Change SeachableTableEmitter to use const RecordKeeper (PR #110032)

2024-09-30 Thread Matt Arsenault via cfe-commits



@@ -1556,7 +1557,7 @@ class RecordVal {
   bool IsUsed = false;
 
   /// Reference locations to this record value.
-  SmallVector ReferenceLocs;
+  mutable SmallVector ReferenceLocs;

arsenm wrote:

Is this removed in later patches? 

https://github.com/llvm/llvm-project/pull/110032
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)

2024-09-30 Thread Matt Arsenault via cfe-commits



@@ -273,6 +273,74 @@ void test_builtin_elementwise_min(int i, short s, double 
d, float4 v, int3 iv, u
   // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was '_Complex float')}}
 }
 
+void test_builtin_elementwise_maximum(int i, short s, float f, double d, 
float4 v, int3 iv, unsigned3 uv, int *p) {
+  i = __builtin_elementwise_maximum(p, d);
+  // expected-error@-1 {{arguments are of different types ('int *' vs 
'double')}}
+
+  struct Foo foo = __builtin_elementwise_maximum(d, d);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of 
incompatible type 'double'}}
+
+  i = __builtin_elementwise_maximum(i);
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
1}}
+
+  i = __builtin_elementwise_maximum();
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
0}}
+
+  i = __builtin_elementwise_maximum(i, i, i);
+  // expected-error@-1 {{too many arguments to function call, expected 2, have 
3}}
+
+  i = __builtin_elementwise_maximum(v, iv);
+  // expected-error@-1 {{arguments are of different types ('float4' (vector of 
4 'float' values) vs 'int3' (vector of 3 'int' values))}}
+
+  i = __builtin_elementwise_maximum(uv, iv);
+  // expected-error@-1 {{arguments are of different types ('unsigned3' (vector 
of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}}
+
+  d = __builtin_elementwise_maximum(d, f);
+
+  v = __builtin_elementwise_maximum(v, v);
+
+  int A[10];
+  A = __builtin_elementwise_maximum(A, A);
+  // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was 'int *')}}
+
+  _Complex float c1, c2;
+  c1 = __builtin_elementwise_maximum(c1, c2);
+  // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was '_Complex float')}}
+}
+
+void test_builtin_elementwise_minimum(int i, short s, float f, double d, 
float4 v, int3 iv, unsigned3 uv, int *p) {
+  i = __builtin_elementwise_minimum(p, d);
+  // expected-error@-1 {{arguments are of different types ('int *' vs 
'double')}}
+
+  struct Foo foo = __builtin_elementwise_minimum(d, d);
+  // expected-error@-1 {{initializing 'struct Foo' with an expression of 
incompatible type 'double'}}
+
+  i = __builtin_elementwise_minimum(i);
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
1}}
+
+  i = __builtin_elementwise_minimum();
+  // expected-error@-1 {{too few arguments to function call, expected 2, have 
0}}
+
+  i = __builtin_elementwise_minimum(i, i, i);
+  // expected-error@-1 {{too many arguments to function call, expected 2, have 
3}}
+
+  i = __builtin_elementwise_minimum(v, iv);
+  // expected-error@-1 {{arguments are of different types ('float4' (vector of 
4 'float' values) vs 'int3' (vector of 3 'int' values))}}
+
+  i = __builtin_elementwise_minimum(uv, iv);
+  // expected-error@-1 {{arguments are of different types ('unsigned3' (vector 
of 3 'unsigned int' values) vs 'int3' (vector of 3 'int' values))}}
+
+  d = __builtin_elementwise_minimum(f, d);
+
+  int A[10];
+  A = __builtin_elementwise_minimum(A, A);
+  // expected-error@-1 {{1st argument must be a vector, integer or floating 
point type (was 'int *')}}

arsenm wrote:

The codegen assumes this is only floating point, so the integer part of the 
message is wrong. Also missing tests using 2 arguments with only integer / 
vector of integer 

https://github.com/llvm/llvm-project/pull/110198
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add __builtin_(elementwise|reduce)_(max|min)imum (PR #110198)

2024-09-26 Thread Matt Arsenault via cfe-commits



@@ -706,6 +706,12 @@ Unless specified otherwise operation(±0) = ±0 and 
operation(±infinity) = ±in
  representable values for the 
signed/unsigned integer type.
  T __builtin_elementwise_sub_sat(T x, T y)   return the difference of x and y, 
clamped to the range ofinteger types
  representable values for the 
signed/unsigned integer type.
+ T __builtin_elementwise_maximum(T x, T y)   return x or y, whichever is 
larger. If exactly one argument is   integer and floating point types
+ a NaN, return the other argument. 
If both arguments are NaNs,

arsenm wrote:

This doesn't fully explain the semantics, and I'd like to avoid trying to 
re-explain all the details in every instance of this. Can you just point this 
to some other description of the semantics? 

https://github.com/llvm/llvm-project/pull/110198
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [cuda][[HIP] `constant` should imply constant (PR #110182)

2024-09-26 Thread Matt Arsenault via cfe-commits


arsenm wrote:

If it's not legal for it to be marked as constant, it's also not legal to use 
constant address space

https://github.com/llvm/llvm-project/pull/110182
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Both in InferAddressSpaces, and in Attributor, you don't really care about 
> whether a flat address-space exists. 

Right, this is more of an undesirable address space. Optimizations don't need 
to know anything about its behavior beyond that.

> In reply to your question above re whether this is a DL or a Target property, 
> I don't have a strong opinion there (it appears @shiltian and @arsenm might). 

I don't really like putting this in the DataLayout. My original idea was to 
move it to TargetMachine, but we want to avoid the dependence on CodeGen. The 
DataLayout is just the other place we have that defines module level target 
information. The simple solution is just have a switch over the target 
architecture in Attributor.

> I do believe that this is a necessary bit of query-able information, 
> especially from a Clang, for correctness reasons (more on that below).

I don't think this buys frontends much. Clang still needs to understand the 
full language address space -> target address space mapping. This would just 
allow populating one entry generically

> Ah, this is part of the challenge - we do indeed assume that 0 is flat, but 
> Targets aren't bound by LangRef to use 0 to denote flat (and some, like SPIR 
> / SPIR-V) do not

As I mentioned above, SPIRV can just work its way out of this problem for its 
IR. SPIR's only reason for existence is bitcode compatibility, so doing 
anything with there will be quite a lot of work which will never realistically 
happen. 

> I'm fine with adding the enforcement in LLVM that AS0 needs to be the flat 
> AS, if a target has it, but the definition of a flat AS still needs to be 
> set. If we do that, how will SPIR/SPIR-V work?
> This is the most generic wording I can come up with so far. Happy to hear 
> more feedbacks.

I would like to avoid adding additional special properties to AS0, or defining 
the flat concept. 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits



@@ -579,7 +579,7 @@ static StringRef computeDataLayout(const Triple &TT) {
  
"-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-"
  "v32:32-v48:64-v96:"
  "128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-"
- "G1-ni:7:8:9";
+ "G1-ni:7:8:9-T0";

arsenm wrote:

No, but yes. We probably should just define 0 to be the flat address space and 
take the same numbers as amdgcn. Flat will just be unsupported in codegen (but 
theoretically someone could go implement software tagged pointers)

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> Just to clarify, does this mean any two non-flat address space pointers 
> _cannot_ alias?

This should change nothing about aliasing. The IR assumption is any address 
space may alias any other 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> There are targets that use a different integer to denote flat (e.g. see SPIR 
> & SPIR-V). Whilst I know that there are objections to that, the fact remains 
> that they had historical reason (wanted to make legacy OCL convention that 
> the default is private work, and given that IR defaults to 0 this was an 
> easy, if possibly costly, way out; 

The SPIRV IR would be better off changing its numbers around like we did in 
AMDGPU ages ago. The only concern would be bitcode compatibility, but given 
it's still an "experimental target" that shouldn't be an issue.

> AMDGPU also borks this for legacy OCL reasons, which has been a source of 
> pain). 

This is only a broken in-clang hack, the backend IR always uses the correct 
address space 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [mlir] [IR] Introduce `T` to `DataLayout` to represent flat address space if a target supports it (PR #108786)

2024-09-25 Thread Matt Arsenault via cfe-commits



@@ -66,12 +66,12 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
   HasFloat16 = true;
 
   if (TargetPointerWidth == 32)
-resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64");
+resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0");

arsenm wrote:

It is 

https://github.com/llvm/llvm-project/pull/108786
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-25 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.

I think we need more thought about how the ABI for this will work, but we need 
to start somewhere 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-25 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> If we can't keep the constrained semantics and near-100% guarantee that no 
> new exceptions will be introduced then operand bundles are not a replacement 
> for the constrained intrinsics.

We would still need a call / function attribute to indicate strictfp calls, and 
such calls would then be annotatable with bundles to relax the assumptions. The 
default would always have to be the most conservative assumption 

https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-25 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-25 Thread Matt Arsenault via cfe-commits



@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
   indicatePessimisticFixpoint();
   return;
 }
+
+for (Instruction &I : instructions(F)) {
+  if (isa(I) &&

arsenm wrote:

Simple example, where the cast is still directly the operand. It could be 
further nested inside another constant expression 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-25 Thread Matt Arsenault via cfe-commits



@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
   indicatePessimisticFixpoint();
   return;
 }
+
+for (Instruction &I : instructions(F)) {
+  if (isa(I) &&

arsenm wrote:

5->3 is an illegal address space cast, but the round trip cast can fold away. 
You don't want the cast back to the original address space. 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-25 Thread Matt Arsenault via cfe-commits


arsenm wrote:

Also it's silly that we need to do bitcode autoupgrade of "experimental" 
intrinsics, but x86 started shipping with strictfp enabled in production before 
they graduated. We might as well drop the experimental bit then 

https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Implement operand bundles for floating-point operations (PR #109798)

2024-09-25 Thread Matt Arsenault via cfe-commits



@@ -357,6 +357,9 @@ class IRBuilderBase {
 
   void setConstrainedFPCallAttr(CallBase *I) {
 I->addFnAttr(Attribute::StrictFP);
+MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();

arsenm wrote:

It shouldn't be necessary to touch the attributes. The set of intrinsic 
attributes are fixed (callsite attributes are another thing, but generally 
should be droppable here) 

https://github.com/llvm/llvm-project/pull/109798
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-25 Thread Matt Arsenault via cfe-commits



@@ -78,15 +78,15 @@ void MCResourceInfo::finalize(MCContext &OutContext) {
 }
 
 MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
-  return OutContext.getOrCreateSymbol("max_num_vgpr");
+  return OutContext.getOrCreateSymbol("amdgcn.max_num_vgpr");

arsenm wrote:

We're usually using amdgpu instead of amdgcn in new fields 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Use std::optional::value_or (NFC) (PR #109894)

2024-09-24 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/109894
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-24 Thread Matt Arsenault via cfe-commits


arsenm wrote:

Superseded by #108853

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-24 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm closed 
https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][codegen] Don't mark "int" TBAA on FP libcalls with indirect args (PR #108853)

2024-09-24 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/108853
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-24 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> If we already have per-function metadata, I'm wondering how difficult it 
> would be to put this handling in the linker. AFAIK there's already handling 
> for `call-graph-profile` which can inform the linker of the call-graph, so we 
> could potentially just walk that graph, find the diameter of the register 
> usage and then emit it in the final HSA metadata. There would still be the 
> issue of LDS usage, but we could probably just state that LDS used by a 
> kernel outside the current TU doesn't work for starters.

That would be the ultimate goal. We need to think harder about what the final 
ABI looks like, instead of creating a new symbol for every individual field 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][codegen] Don't mark "int" TBAA on FP libcalls with indirect args (PR #108853)

2024-09-24 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.

LGTM, but like I mentioned on #107598, it would be good if there was a test 
that requires the argument check, and the return check isn't sufficient 

https://github.com/llvm/llvm-project/pull/108853
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] use default paths with find_program when possible (PR #105969)

2024-09-23 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> So it should be built along with the core of LLVM? Also, we package LLVM per 
> version per subproject.

Yes, it should be built along with the core (but doesn't need to ship in the 
same package as the core). 



https://github.com/llvm/llvm-project/pull/105969
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] use default paths with find_program when possible (PR #105969)

2024-09-23 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> Nixpkgs has no intention of moving away from standalone builds.

I encourage you to acquire that intention. IMO libclc should not support the 
standalone build, and this should be version locked to the exact compiler 
commit. It's compiler data, not a real library 

https://github.com/llvm/llvm-project/pull/105969
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] use default paths with find_program when possible (PR #105969)

2024-09-23 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm commented:

The nix build should probably migrate to using the non-standalone build

https://github.com/llvm/llvm-project/pull/105969
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] use default paths with find_program when possible (PR #105969)

2024-09-23 Thread Matt Arsenault via cfe-commits



@@ -55,7 +55,7 @@ if( LIBCLC_STANDALONE_BUILD OR CMAKE_SOURCE_DIR STREQUAL 
CMAKE_CURRENT_SOURCE_DI
   # Import required tools
   if( NOT EXISTS ${LIBCLC_CUSTOM_LLVM_TOOLS_BINARY_DIR} )
 foreach( tool IN ITEMS clang llvm-as llvm-link opt )
-  find_program( LLVM_TOOL_${tool} ${tool} PATHS ${LLVM_TOOLS_BINARY_DIR} 
NO_DEFAULT_PATH )
+find_program( LLVM_TOOL_${tool} ${tool} PATHS ${LLVM_TOOLS_BINARY_DIR} )

arsenm wrote:

Why does this need to find any binary? Can't it just use the imported targets 
from the find_package(LLVM) above?

https://github.com/llvm/llvm-project/pull/105969
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] use default paths with find_program when possible (PR #105969)

2024-09-23 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/105969
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [mlir] Make MMIWP not have ownership over MMI + Make MMI Only Use an External MCContext (PR #105541)

2024-09-22 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> @aeubanks @arsenm after looking into this in more detail, I realized that the 
> `getContext` method of `MMI` is heavily used in the `AsmPrinter` to create 
> symbols. Also not having it makes it harder for the `MMI` to create machine 
> functions using `getOrCreateMachineFunction`.


The AsmPrinter is just an ordinary ModulePass. The initialization can just set 
a MMI member? 

https://github.com/llvm/llvm-project/pull/105541
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Support] Add scaling support in `indent` (PR #109478)

2024-09-20 Thread Matt Arsenault via cfe-commits



@@ -774,18 +774,27 @@ class buffer_unique_ostream : public raw_svector_ostream {
 // you can use
 // OS << indent(6) << "more stuff";
 // which has better ergonomics (and clang-formats better as well).
+//
+// If indentation is always in increments of a fixed value, you can use Scale
+// to set that value once. So indent(1, 2) will add 2 spaces and
+// indent(1,2) + 1 will add 4 spaces.
 struct indent {
-  unsigned NumSpaces;
-
-  explicit indent(unsigned NumSpaces) : NumSpaces(NumSpaces) {}
-  void operator+=(unsigned N) { NumSpaces += N; }
-  void operator-=(unsigned N) { NumSpaces -= N; }
-  indent operator+(unsigned N) const { return indent(NumSpaces + N); }
-  indent operator-(unsigned N) const { return indent(NumSpaces - N); }
+  // Indentation is represented as `NumIndents` steps of size `Scale` each.
+  unsigned NumIndents;
+  unsigned Scale;
+
+  explicit indent(unsigned NumIndents, unsigned Scale = 1)
+  : NumIndents(NumIndents), Scale(Scale) {}
+
+  // These arithmeric operators preserve scale.
+  void operator+=(unsigned N) { NumIndents += N; }
+  void operator-=(unsigned N) { NumIndents -= N; }
+  indent operator+(unsigned N) const { return indent(NumIndents + N, Scale); }
+  indent operator-(unsigned N) const { return indent(NumIndents - N, Scale); }

arsenm wrote:

I'm surprised there's no guard against underflow here 

https://github.com/llvm/llvm-project/pull/109478
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Support] Add scaling support in `indent` (PR #109478)

2024-09-20 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/109478
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [mlir] Make MMIWP not have ownership over MMI + Make MMI Only Use an External MCContext (PR #105541)

2024-09-20 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> @aeubanks It's not impossible to separate them completely. `MCContext` is 
> needed during initialization and finalization of the 
> `MachineModuleInfoWrapperPass` (and its new pass manager variant) to set the 
> diagnostics handler.
> 
> In theory, you can just pass the context to the wrapper pass instead. @arsenm 
> any thoughts on this?

The MachineModuleInfo is the container for all the MachineFunctions (which do 
hold a reference to the MCContext), so it kind of makes sense to keep it there. 
But it does look like it should be simple to remove the reference here. So I 
would say it's better to just remove it

https://github.com/llvm/llvm-project/pull/105541
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lldb] [llvm] [mlir] [APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (PR #80309)

2024-09-20 Thread Matt Arsenault via cfe-commits



@@ -1806,7 +1806,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N,
   // instructions to perform VALU adds with immediates or inline literals.
   unsigned NumLiterals =
   !TII->isInlineConstant(APInt(32, COffsetVal & 0x)) +
-  !TII->isInlineConstant(APInt(32, COffsetVal >> 32));
+  !TII->isInlineConstant(APInt(32, uint64_t(COffsetVal) >> 32));

arsenm wrote:

These should probably just use Lo_32/Hi_32 

https://github.com/llvm/llvm-project/pull/80309
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lldb] [llvm] [mlir] [APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (PR #80309)

2024-09-20 Thread Matt Arsenault via cfe-commits



@@ -4377,7 +4377,7 @@ 
AMDGPUInstructionSelector::selectGlobalSAddr(MachineOperand &Root) const {
 // instructions to perform VALU adds with immediates or inline 
literals.
 unsigned NumLiterals =
 !TII.isInlineConstant(APInt(32, ConstOffset & 0x)) +
-!TII.isInlineConstant(APInt(32, ConstOffset >> 32));
+!TII.isInlineConstant(APInt(32, uint64_t(ConstOffset) >> 32));

arsenm wrote:

These should probably just use Lo_32/Hi_32

https://github.com/llvm/llvm-project/pull/80309
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][codegen] Don't mark "int" TBAA on FP libcalls with indirect args (PR #108853)

2024-09-18 Thread Matt Arsenault via cfe-commits



@@ -690,23 +690,46 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const 
FunctionDecl *FD,
   const CallExpr *E, llvm::Constant *calleeValue) {
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   CGCallee callee = CGCallee::forDirect(calleeValue, GlobalDecl(FD));
+  llvm::CallBase *callOrInvoke = nullptr;
+  CGFunctionInfo const *FnInfo = nullptr;
   RValue Call =
-  CGF.EmitCall(E->getCallee()->getType(), callee, E, ReturnValueSlot());
+  CGF.EmitCall(E->getCallee()->getType(), callee, E, ReturnValueSlot(),
+   /*Chain=*/nullptr, &callOrInvoke, &FnInfo);
 
   if (unsigned BuiltinID = FD->getBuiltinID()) {
 // Check whether a FP math builtin function, such as BI__builtin_expf
 ASTContext &Context = CGF.getContext();
 bool ConstWithoutErrnoAndExceptions =
 Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID);
+
+auto isDirectOrIgnore = [&](ABIArgInfo const &info) {
+  // For a non-aggregate types direct/extend means the type will be used
+  // directly (or a sign/zero extension of it) on the call (not a
+  // input/output pointer).
+  return info.isDirect() || info.isExtend() || info.isIgnore();
+};
+
+// Before annotating this libcall with "int" TBAA metadata check all
+// arguments/results are passed directly. On some targets, types such as
+// "long double" are passed indirectly via a pointer, and annotating the
+// call with "int" TBAA metadata will lead to set up for those arguments
+// being incorrectly optimized out.
+bool ReturnAndAllArgumentsDirect =
+isDirectOrIgnore(FnInfo->getReturnInfo()) &&
+llvm::all_of(FnInfo->arguments(),
+ [&](CGFunctionInfoArgInfo const &ArgInfo) {
+   return isDirectOrIgnore(ArgInfo.info);
+ });

arsenm wrote:

Make a predicate function and call at the end of the if expression below 

https://github.com/llvm/llvm-project/pull/108853
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-15 Thread Matt Arsenault via cfe-commits



@@ -678,6 +690,37 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
 return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this,
   UsedAssumedInformation);
   }
+
+  // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is
+  // not to be set.
+  bool needFlatScratchInit(Attributor &A) {
+assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still 
set
+
+// This is called on each callee; false means callee shouldn't have
+// no-flat-scratch-init.
+auto CheckForNoFlatScratchInit = [&](Instruction &I) {
+  const auto &CB = cast(I);

arsenm wrote:

I would hope FroAllCallLikeInstructions would have a CallBase typed argument to 
begin with 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-15 Thread Matt Arsenault via cfe-commits



@@ -678,6 +690,37 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
 return !A.checkForAllCallLikeInstructions(DoesNotRetrieve, *this,
   UsedAssumedInformation);
   }
+
+  // Returns true if FlatScratchInit is needed, i.e., no-flat-scratch-init is
+  // not to be set.
+  bool needFlatScratchInit(Attributor &A) {
+assert(isAssumed(FLAT_SCRATCH_INIT)); // only called if the bit is still 
set
+
+// This is called on each callee; false means callee shouldn't have
+// no-flat-scratch-init.
+auto CheckForNoFlatScratchInit = [&](Instruction &I) {
+  const auto &CB = cast(I);
+  const Function *Callee = CB.getCalledFunction();
+
+  // Callee == 0 for inline asm or indirect call with known callees.
+  // In the latter case, updateImpl() already checked the callees and we
+  // know their FLAT_SCRATCH_INIT bit is set.
+  // If function has indirect call with unknown callees, the bit is
+  // already removed in updateImpl() and execution won't reach here.
+  if (!Callee)
+return true;
+  else

arsenm wrote:

No else after return 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (PR #94647)

2024-09-15 Thread Matt Arsenault via cfe-commits



@@ -434,6 +434,15 @@ struct AAAMDAttributesFunction : public AAAMDAttributes {
   indicatePessimisticFixpoint();
   return;
 }
+
+for (Instruction &I : instructions(F)) {
+  if (isa(I) &&

arsenm wrote:

For a nightmare of an edge case, addrspacecasts from private to flat can exist 
somewhere in constant expressions. For now, as long as addrspace(5) globals are 
forbidden, this would only be valid with literal addresses. 

I'm not sure how defined we should consider that case.

But if you follow along with the queue pointer handling, it will work. It 
already has to handle the 3->0 case in constant expressions 

https://github.com/llvm/llvm-project/pull/94647
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-09-15 Thread Matt Arsenault via cfe-commits


arsenm wrote:

Replaced by #108596

https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-09-15 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/92809
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-15 Thread Matt Arsenault via cfe-commits



@@ -40,12 +42,19 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   AMDGPUResourceUsageAnalysis *ResourceUsage;
 
+  MCResourceInfo RI;
+
   SIProgramInfo CurrentProgramInfo;
 
   std::unique_ptr HSAMetadataStream;
 
   MCCodeEmitter *DumpCodeInstEmitter = nullptr;
 
+  // validateMCResourceInfo cannot recompute parts of the occupancy as it does
+  // for other metadata to validate (e.g., NumSGPRs) so a map is necessary if 
we
+  // really want to track and validate the occupancy.

arsenm wrote:

I don't see why this is the case 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-15 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,225 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===--===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK,
+MCContext &OutContext) {
+  auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
+return OutContext.getOrCreateSymbol(FuncName + Twine(Suffix));
+  };
+  switch (RIK) {
+  case RIK_NumVGPR:
+return GOCS(".num_vgpr");
+  case RIK_NumAGPR:
+return GOCS(".num_agpr");
+  case RIK_NumSGPR:
+return GOCS(".numbered_sgpr");
+  case RIK_PrivateSegSize:
+return GOCS(".private_seg_size");
+  case RIK_UsesVCC:
+return GOCS(".uses_vcc");
+  case RIK_UsesFlatScratch:
+return GOCS(".uses_flat_scratch");
+  case RIK_HasDynSizedStack:
+return GOCS(".has_dyn_sized_stack");
+  case RIK_HasRecursion:
+return GOCS(".has_recursion");
+  case RIK_HasIndirectCall:
+return GOCS(".has_indirect_call");
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+ResourceInfoKind RIK,
+MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK, Ctx), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs(MCContext &OutContext) {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol(OutContext);
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol(OutContext);
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol(OutContext);
+
+  auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
+const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::finalize(MCContext &OutContext) {
+  assert(!Finalized && "Cannot finalize ResourceInfo again.");
+  Finalized = true;
+  assignMaxRegs(OutContext);
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}
+
+void MCResourceInfo::assignResourceInfoExpr(
+int64_t LocalValue, ResourceInfoKind RIK, AMDGPUMCExpr::VariantKind Kind,
+const MachineFunction &MF, const SmallVectorImpl 
&Callees,
+MCContext &OutContext) {
+  const MCConstantExpr *LocalConstExpr =
+  MCConstantExpr::create(LocalValue, OutContext);
+  const MCExpr *SymVal = LocalConstExpr;
+  if (!Callees.empty()) {
+SmallVector ArgExprs;
+// Avoid recursive symbol assignment.
+SmallPtrSet Seen;
+ArgExprs.push_back(LocalConstExpr);
+const Function &F = MF.getFunction();
+Seen.insert(&F);
+
+for (const Function *Callee : Callees) {
+  if (Seen.contains(Callee))
+continue;
+  Seen.insert(Callee);

arsenm wrote:

Should combine these by seeing if the insert succeeded 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (PR #102913)

2024-09-15 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,225 @@
+//===- AMDGPUMCResourceInfo.cpp --- MC Resource Info 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+/// \file
+/// \brief MC infrastructure to propagate the function level resource usage
+/// info.
+///
+//===--===//
+
+#include "AMDGPUMCResourceInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCSymbol.h"
+
+using namespace llvm;
+
+MCSymbol *MCResourceInfo::getSymbol(StringRef FuncName, ResourceInfoKind RIK,
+MCContext &OutContext) {
+  auto GOCS = [this, FuncName, &OutContext](StringRef Suffix) {
+return OutContext.getOrCreateSymbol(FuncName + Twine(Suffix));
+  };
+  switch (RIK) {
+  case RIK_NumVGPR:
+return GOCS(".num_vgpr");
+  case RIK_NumAGPR:
+return GOCS(".num_agpr");
+  case RIK_NumSGPR:
+return GOCS(".numbered_sgpr");
+  case RIK_PrivateSegSize:
+return GOCS(".private_seg_size");
+  case RIK_UsesVCC:
+return GOCS(".uses_vcc");
+  case RIK_UsesFlatScratch:
+return GOCS(".uses_flat_scratch");
+  case RIK_HasDynSizedStack:
+return GOCS(".has_dyn_sized_stack");
+  case RIK_HasRecursion:
+return GOCS(".has_recursion");
+  case RIK_HasIndirectCall:
+return GOCS(".has_indirect_call");
+  }
+  llvm_unreachable("Unexpected ResourceInfoKind.");
+}
+
+const MCExpr *MCResourceInfo::getSymRefExpr(StringRef FuncName,
+ResourceInfoKind RIK,
+MCContext &Ctx) {
+  return MCSymbolRefExpr::create(getSymbol(FuncName, RIK, Ctx), Ctx);
+}
+
+void MCResourceInfo::assignMaxRegs(MCContext &OutContext) {
+  // Assign expression to get the max register use to the max_num_Xgpr symbol.
+  MCSymbol *MaxVGPRSym = getMaxVGPRSymbol(OutContext);
+  MCSymbol *MaxAGPRSym = getMaxAGPRSymbol(OutContext);
+  MCSymbol *MaxSGPRSym = getMaxSGPRSymbol(OutContext);
+
+  auto assignMaxRegSym = [this, &OutContext](MCSymbol *Sym, int32_t RegCount) {
+const MCExpr *MaxExpr = MCConstantExpr::create(RegCount, OutContext);
+Sym->setVariableValue(MaxExpr);
+  };
+
+  assignMaxRegSym(MaxVGPRSym, MaxVGPR);
+  assignMaxRegSym(MaxAGPRSym, MaxAGPR);
+  assignMaxRegSym(MaxSGPRSym, MaxSGPR);
+}
+
+void MCResourceInfo::finalize(MCContext &OutContext) {
+  assert(!Finalized && "Cannot finalize ResourceInfo again.");
+  Finalized = true;
+  assignMaxRegs(OutContext);
+}
+
+MCSymbol *MCResourceInfo::getMaxVGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_vgpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxAGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_agpr");
+}
+
+MCSymbol *MCResourceInfo::getMaxSGPRSymbol(MCContext &OutContext) {
+  return OutContext.getOrCreateSymbol("max_num_sgpr");
+}

arsenm wrote:

I wonder if these should be placed in a custom section. In any case, we will 
eventually need custom linker logic to deal with this 

https://github.com/llvm/llvm-project/pull/102913
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-13 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,31 @@
+// RUN: %clang_cc1 %s -O3  -fmath-errno -emit-llvm -triple 
x86_64-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple x86_64-pc-win64 -o - 
%s | FileCheck %s -check-prefixes=CHECK
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple i686-unknown-unknown 
-o - %s | FileCheck %s -check-prefixes=CHECK
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple 
powerpc-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK-PPC
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple 
armv7-none-linux-gnueabi -o - %s | FileCheck %s -check-prefixes=CHECK-TBAA,TBAA
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple 
armv7-none-linux-gnueabihf -o - %s | FileCheck %s -check-prefixes=CHECK-ARM,TBAA
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple 
thumbv7k-apple-watchos2.0 -o - -target-abi aapcs16 %s | FileCheck %s 
-check-prefixes=CHECK-THUMB,TBAA
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple 
aarch64-unknown-unknown -o - %s | FileCheck %s -check-prefixes=CHECK-AARCH,TBAA
+// RUN: %clang_cc1 %s -O3 -fmath-errno -emit-llvm -triple spir -o - %s | 
FileCheck %s -check-prefixes=CHECK-SPIR
+
+_Complex long double foo() {
+  _Complex long double cld;
+  long double v2 = __builtin_cargl(cld);
+  _Complex long double tmp = v2 * cld;
+  return tmp;
+}
+// CHECK: tail call x86_fp80 @cargl(ptr noundef nonnull byval({ {{.*}}, {{.*}} 
})

arsenm wrote:

Should also test a case with a complex return type, and 0 complex arguments? Is 
there such a builtin? 

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-13 Thread Matt Arsenault via cfe-commits



@@ -686,6 +686,20 @@ static Value *EmitSignBit(CodeGenFunction &CGF, Value *V) {
   return CGF.Builder.CreateICmpSLT(V, Zero);
 }
 
+static bool hasPointerArgsOrPointerReturnType(const Value *V) {
+  if (const CallBase *CB = dyn_cast(V)) {
+for (const Value *A : CB->args()) {

arsenm wrote:

could use any_of 

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-13 Thread Matt Arsenault via cfe-commits



@@ -699,9 +713,12 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const 
FunctionDecl *FD,
 bool ConstWithoutErrnoAndExceptions =
 Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID);
 // Restrict to target with errno, for example, MacOS doesn't set errno.
-// TODO: Support builtin function with complex type returned, eg: cacosh
+bool CallWithPointerArgsOrPointerReturnType =
+Call.isScalar() && Call.getScalarVal() &&
+hasPointerArgsOrPointerReturnType(Call.getScalarVal());

arsenm wrote:

Predicate function called in the if? 

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-13 Thread Matt Arsenault via cfe-commits



@@ -686,6 +686,20 @@ static Value *EmitSignBit(CodeGenFunction &CGF, Value *V) {
   return CGF.Builder.CreateICmpSLT(V, Zero);
 }
 
+static bool hasPointerArgsOrPointerReturnType(const Value *V) {
+  if (const CallBase *CB = dyn_cast(V)) {
+for (const Value *A : CB->args()) {
+  if (A->getType()->isPointerTy()) {
+return true;
+  }
+}
+if (CB->getFunctionType()->getReturnType()->isPointerTy()) {
+  return true;
+}

arsenm wrote:

Don't need braces 

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-13 Thread Matt Arsenault via cfe-commits



@@ -251,6 +251,24 @@ SPIRV::MemorySemantics::MemorySemantics 
getMemSemantics(AtomicOrdering Ord) {
   llvm_unreachable(nullptr);
 }
 
+SPIRV::Scope::Scope getMemScope(const LLVMContext &Ctx, SyncScope::ID ID) {
+  SmallVector SSNs;
+  Ctx.getSyncScopeNames(SSNs);
+
+  StringRef MemScope = SSNs[ID];
+  if (MemScope.empty() || MemScope == "all_svm_devices")

arsenm wrote:

Hard disagree, we do not want aliases. System = "" = all_svm_devices 

https://github.com/llvm/llvm-project/pull/106429
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-11 Thread Matt Arsenault via cfe-commits



@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const 
FunctionDecl *FD,
 bool ConstWithoutErrnoAndExceptions =
 Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID);
 // Restrict to target with errno, for example, MacOS doesn't set errno.
-// TODO: Support builtin function with complex type returned, eg: cacosh
+bool CallWithPointerArgsOrPointerReturnType = false;
+if (Call.isScalar() && Call.getScalarVal()) {
+  if (CallBase *CB = dyn_cast(Call.getScalarVal())) {
+for (Value *A : CB->args())
+  if (A->getType()->isPointerTy())
+CallWithPointerArgsOrPointerReturnType = true;
+CallWithPointerArgsOrPointerReturnType =
+CallWithPointerArgsOrPointerReturnType ||
+CB->getFunctionType()->getReturnType()->isPointerTy();
+  }
+}

arsenm wrote:

Turn this into a predicate function 

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Don't emit int TBAA metadata on more complex FP math libcalls. (PR #107598)

2024-09-11 Thread Matt Arsenault via cfe-commits



@@ -699,9 +699,20 @@ static RValue emitLibraryCall(CodeGenFunction &CGF, const 
FunctionDecl *FD,
 bool ConstWithoutErrnoAndExceptions =
 Context.BuiltinInfo.isConstWithoutErrnoAndExceptions(BuiltinID);
 // Restrict to target with errno, for example, MacOS doesn't set errno.
-// TODO: Support builtin function with complex type returned, eg: cacosh
+bool CallWithPointerArgsOrPointerReturnType = false;
+if (Call.isScalar() && Call.getScalarVal()) {
+  if (CallBase *CB = dyn_cast(Call.getScalarVal())) {
+for (Value *A : CB->args())
+  if (A->getType()->isPointerTy())
+CallWithPointerArgsOrPointerReturnType = true;

arsenm wrote:

Should probably be looking at the source signature, not the IR? 

https://github.com/llvm/llvm-project/pull/107598
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-11 Thread Matt Arsenault via cfe-commits



@@ -251,6 +251,24 @@ SPIRV::MemorySemantics::MemorySemantics 
getMemSemantics(AtomicOrdering Ord) {
   llvm_unreachable(nullptr);
 }
 
+SPIRV::Scope::Scope getMemScope(const LLVMContext &Ctx, SyncScope::ID ID) {
+  SmallVector SSNs;
+  Ctx.getSyncScopeNames(SSNs);
+
+  StringRef MemScope = SSNs[ID];
+  if (MemScope.empty() || MemScope == "all_svm_devices")

arsenm wrote:

Just avoid all_svm_devices altogether? It's the same as the default / ""

https://github.com/llvm/llvm-project/pull/106429
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-11 Thread Matt Arsenault via cfe-commits



@@ -58,7 +58,35 @@ class SPIRVTargetCodeGenInfo : public 
CommonSPIRTargetCodeGenInfo {
   SPIRVTargetCodeGenInfo(CodeGen::CodeGenTypes &CGT)
   : CommonSPIRTargetCodeGenInfo(std::make_unique(CGT)) {}
   void setCUDAKernelCallingConvention(const FunctionType *&FT) const override;
+  llvm::SyncScope::ID getLLVMSyncScopeID(const LangOptions &LangOpts,
+ SyncScope Scope,
+ llvm::AtomicOrdering Ordering,
+ llvm::LLVMContext &Ctx) const 
override;
 };
+
+inline StringRef mapClangSyncScopeToLLVM(SyncScope Scope) {
+  switch (Scope) {
+  case SyncScope::HIPSingleThread:
+  case SyncScope::SingleScope:
+return "singlethread";
+  case SyncScope::HIPWavefront:
+  case SyncScope::OpenCLSubGroup:
+  case SyncScope::WavefrontScope:
+return "subgroup";
+  case SyncScope::HIPWorkgroup:
+  case SyncScope::OpenCLWorkGroup:
+  case SyncScope::WorkgroupScope:
+return "workgroup";
+  case SyncScope::HIPAgent:
+  case SyncScope::OpenCLDevice:
+  case SyncScope::DeviceScope:
+return "device";
+  case SyncScope::SystemScope:
+  case SyncScope::HIPSystem:
+  case SyncScope::OpenCLAllSVMDevices:
+return "all_svm_devices";

arsenm wrote:

On the naming point, this preferably would use the names directly from the 
SPIRV spec

https://github.com/llvm/llvm-project/pull/106429
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

2024-09-11 Thread Matt Arsenault via cfe-commits



@@ -766,8 +766,19 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr 
*Expr, Address Dest,
   // LLVM atomic instructions always have synch scope. If clang atomic
   // expression has no scope operand, use default LLVM synch scope.
   if (!ScopeModel) {
+llvm::SyncScope::ID SS;
+if (CGF.getLangOpts().OpenCL)
+  // OpenCL approach is: "The functions that do not have memory_scope
+  // argument have the same semantics as the corresponding functions with
+  // the memory_scope argument set to memory_scope_device." See ref.:
+  // 
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions
+  SS = CGF.getTargetHooks().getLLVMSyncScopeID(CGF.getLangOpts(),
+   SyncScope::OpenCLDevice,
+   Order, 
CGF.getLLVMContext());
+else
+  SS = CGF.getLLVMContext().getOrInsertSyncScopeID("");

arsenm wrote:

Don't need to query this, this can just be llvm::SyncScope::System

https://github.com/llvm/llvm-project/pull/106429
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [TBAA] Emit "omnipotent char" for intrinsics with type cast (PR #107793)

2024-09-11 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> Hi, @paulwalker-arm, ACLE allows users to do instruction-level development, 
> but mixing intrinsic and regular C code may break some of the rules set by 
> the compiler. 

The rules are still there. You can always use a union or copy to avoid 
violating the rules. I don't think it makes sense to special case any 
intrinsics 

https://github.com/llvm/llvm-project/pull/107793
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-08 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [TableGen] Change SetTheory set/vec to use const Record * (PR #107692)

2024-09-08 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.

lgtm assuming the const_cast goes away in a subsequent change 

https://github.com/llvm/llvm-project/pull/107692
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [TBAA] Emit "omnipotent char" for intrinsics with type cast (PR #107793)

2024-09-08 Thread Matt Arsenault via cfe-commits


arsenm wrote:

I don't understand this. The code is a strict aliasing violation, so why should 
clang work around it? 

https://github.com/llvm/llvm-project/pull/107793
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Remove 3-element vector load and store special handling (PR #104661)

2024-09-06 Thread Matt Arsenault via cfe-commits



@@ -45,7 +45,7 @@ void test3(packedfloat3 *p) {
   *p = (packedfloat3) { 3.2f, 2.3f, 0.1f };
 }
 // CHECK: @test3(
-// CHECK: store <4 x float> {{.*}}, align 4
+// CHECK: store <3 x float> {{.*}}, align 4

arsenm wrote:

I'd expect this to be in terms of type, not size but yes.

https://github.com/llvm/llvm-project/pull/104661
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] a291fe5 - clang/AMDGPU: Update test message order

2024-09-06 Thread Matt Arsenault via cfe-commits


Author: Matt Arsenault
Date: 2024-09-06T21:18:41+04:00
New Revision: a291fe5ed44fa37493d038c78ff4d73135fd85a9

URL: 
https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9
DIFF: 
https://github.com/llvm/llvm-project/commit/a291fe5ed44fa37493d038c78ff4d73135fd85a9.diff

LOG: clang/AMDGPU: Update test message order

Order of atomic expansion remarks is backwards since
100d9b89947bb1d42af20010bb594fa4c02542fc

Added: 


Modified: 
clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl
clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl

Removed: 




diff  --git a/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl 
b/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl
index d23005e018f359..72027eda4571da 100644
--- a/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl
@@ -26,10 +26,11 @@ typedef enum memory_scope {
 #endif
 } memory_scope;
 
-// REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at workgroup-one-as memory scope [-Rpass=atomic-expand]
-// REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at agent-one-as memory scope [-Rpass=atomic-expand]
-// REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at one-as memory scope [-Rpass=atomic-expand]
 // REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at wavefront-one-as memory scope [-Rpass=atomic-expand]
+// REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at one-as memory scope [-Rpass=atomic-expand]
+// REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at agent-one-as memory scope [-Rpass=atomic-expand]
+// REMARK: remark: A compare and swap loop was generated for an atomic fadd 
operation at workgroup-one-as memory scope [-Rpass=atomic-expand]
+
 // GFX90A-CAS-LABEL: @atomic_cas
 // GFX90A-CAS: atomicrmw fadd ptr addrspace(1) {{.*}} 
syncscope("workgroup-one-as") monotonic
 // GFX90A-CAS: atomicrmw fadd ptr addrspace(1) {{.*}} 
syncscope("agent-one-as") monotonic

diff  --git a/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl 
b/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl
index 80ad9b4df8f64f..7d684bc185a58d 100644
--- a/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl
@@ -27,9 +27,10 @@ typedef enum memory_scope {
 #endif
 } memory_scope;
 
-// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation 
at memory scope workgroup-one-as due to an unsafe request. [-Rpass=si-lower]
-// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation 
at memory scope agent-one-as due to an unsafe request. [-Rpass=si-lower]
 // GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation 
at memory scope wavefront-one-as due to an unsafe request. [-Rpass=si-lower]
+// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation 
at memory scope agent-one-as due to an unsafe request. [-Rpass=si-lower]
+// GFX90A-HW-REMARK: Hardware instruction generated for atomic fadd operation 
at memory scope workgroup-one-as due to an unsafe request. [-Rpass=si-lower]
+
 // GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc
 // GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc
 // GFX90A-HW-REMARK: global_atomic_add_f32 v0, v[0:1], v2, off glc



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-06 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> > The vector tests should still be added
> 
> sorry. if i remove the change of the vector. i have to remove the testcase. 
> because, for the current code convert between vector type of half and 
> bfloat16, it has a bug. And it will be Assert "Invalid cast!""
> 

OK, LGTM with the else before return fixed. Can you handle the vector case in a 
follow up? 



https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-06 Thread Matt Arsenault via cfe-commits



@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx1200 < %s | FileCheck 
--check-prefix=GCN %s
+; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx1200 < %s | FileCheck 
--check-prefix=GCN %s
+
+declare void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) %rsrc, i32 
%offset, i32 %len)
+
+define amdgpu_ps void @buffer_prefetch_data_imm_offset_sgpr_len(ptr 
addrspace(8) inreg %rsrc, i32 inreg %len) {
+; GCN-LABEL: buffer_prefetch_data_imm_offset_sgpr_len:
+; GCN:   ; %bb.0: ; %entry
+; GCN-NEXT:s_buffer_prefetch_data s[0:3], 0x80, s4, 0
+; GCN-NEXT:s_endpgm
+entry:
+  tail call void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) inreg 
%rsrc, i32 128, i32 %len)
+  ret void
+}
+
+define amdgpu_ps void @buffer_prefetch_data_imm_offset_imm_len(ptr 
addrspace(8) inreg %rsrc) {
+; GCN-LABEL: buffer_prefetch_data_imm_offset_imm_len:
+; GCN:   ; %bb.0: ; %entry
+; GCN-NEXT:s_buffer_prefetch_data s[0:3], 0x0, null, 31
+; GCN-NEXT:s_endpgm
+entry:
+  tail call void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) inreg 
%rsrc, i32 0, i32 31)
+  ret void
+}
+
+define amdgpu_ps void @buffer_prefetch_data_imm_offset_vgpr_len(ptr 
addrspace(8) inreg %rsrc, i32 %len) {
+; GCN-LABEL: buffer_prefetch_data_imm_offset_vgpr_len:
+; GCN:   ; %bb.0: ; %entry
+; GCN-NEXT:v_readfirstlane_b32 s4, v0
+; GCN-NEXT:s_buffer_prefetch_data s[0:3], 0x80, s4, 0
+; GCN-NEXT:s_endpgm
+entry:
+  tail call void @llvm.amdgcn.s.buffer.prefetch.data(ptr addrspace(8) inreg 
%rsrc, i32 128, i32 %len)
+  ret void
+}

arsenm wrote:

Test the behavior with VGPR rsrc 

https://github.com/llvm/llvm-project/pull/107293
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-06 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm approved this pull request.

I think the parent needs some revision for global/flat/infer handling 

https://github.com/llvm/llvm-project/pull/107293
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-06 Thread Matt Arsenault via cfe-commits



@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops);
 return SDValue(NewMI, 0);
   }
+  case Intrinsic::amdgcn_s_prefetch_data: {
+// For non-global address space preserve the chain and remove the call.
+if (!AMDGPU::isFlatGlobalAddrSpace(cast(Op)->getAddressSpace()))
+  return Op.getOperand(0);
+return Op;

arsenm wrote:

Infer can just not do the change if the resulting address space isn't global 

https://github.com/llvm/llvm-project/pull/107293
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (PR #107293)

2024-09-05 Thread Matt Arsenault via cfe-commits



@@ -9934,6 +9934,12 @@ SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
 auto NewMI = DAG.getMachineNode(Opc, DL, Op->getVTList(), Ops);
 return SDValue(NewMI, 0);
   }
+  case Intrinsic::amdgcn_s_prefetch_data: {
+// For non-global address space preserve the chain and remove the call.
+if (!AMDGPU::isFlatGlobalAddrSpace(cast(Op)->getAddressSpace()))
+  return Op.getOperand(0);
+return Op;

arsenm wrote:

I'd expect the private/local cases to be an error. Also 
collectFlatAddressOperands should handle this, to get the flat->global 
optimization in InferAddressSpaces 

https://github.com/llvm/llvm-project/pull/107293
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-04 Thread Matt Arsenault via cfe-commits


https://github.com/arsenm commented:

The vector tests should still be added 

https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-04 Thread Matt Arsenault via cfe-commits



@@ -1431,9 +1431,13 @@ Value *ScalarExprEmitter::EmitScalarCast(Value *Src, 
QualType SrcType,
 return Builder.CreateFPToUI(Src, DstTy, "conv");
   }
 
-  if (DstElementTy->getTypeID() < SrcElementTy->getTypeID())
+  if ((DstElementTy->is16bitFPTy() && SrcElementTy->is16bitFPTy())) {
+Value *FloatVal = Builder.CreateFPExt(Src, Builder.getFloatTy(), "conv");
+return Builder.CreateFPTrunc(FloatVal, DstTy, "conv");
+  } else if (DstElementTy->getTypeID() < SrcElementTy->getTypeID())

arsenm wrote:

Still needs to be resolved. Also should probably get a comment 

https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] fix half && bfloat16 convert node expr codegen (PR #89051)

2024-09-03 Thread Matt Arsenault via cfe-commits


arsenm wrote:

> ok, you mean, i remove the vector testcase for this patch. and just save the 
> scalar testcase?

No, keep the tests. Only keep the scalar behavior change. The previous revision 
was essentially correct and minimal 

https://github.com/llvm/llvm-project/pull/89051
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1199 matches

Mail list logo