[Lldb-commits] [clang] [lld] [libcxx] [clang-tools-extra] [lldb] [mlir] [openmp] [libc] [llvm] [flang] [SLP]Add support for strided loads. (PR #80310)

2024-02-06 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80310

>From 92950afd39034c0184a3c807f8062e0053eead5c Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Thu, 1 Feb 2024 17:22:34 +
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../llvm/Analysis/TargetTransformInfo.h   |  34 ++
 .../llvm/Analysis/TargetTransformInfoImpl.h   |  13 +
 llvm/lib/Analysis/TargetTransformInfo.cpp |  14 +
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  23 +
 .../Target/RISCV/RISCVTargetTransformInfo.h   |  23 +
 .../Transforms/Vectorize/SLPVectorizer.cpp| 397 --
 .../SLPVectorizer/RISCV/complex-loads.ll  | 132 +++---
 .../RISCV/strided-loads-vectorized.ll | 209 +
 .../strided-loads-with-external-use-ptr.ll|   4 +-
 .../SLPVectorizer/RISCV/strided-loads.ll  |  13 +-
 .../X86/gep-nodes-with-non-gep-inst.ll|   2 +-
 .../X86/remark_gather-load-redux-cost.ll  |   2 +-
 12 files changed, 478 insertions(+), 388 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3b615bc700bbb..b0b6dab03fa38 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -781,6 +781,9 @@ class TargetTransformInfo {
   /// Return true if the target supports masked expand load.
   bool isLegalMaskedExpandLoad(Type *DataType) const;
 
+  /// Return true if the target supports strided load.
+  bool isLegalStridedLoad(Type *DataType, Align Alignment) const;
+
   /// Return true if this is an alternating opcode pattern that can be lowered
   /// to a single instruction on the target. In X86 this is for the addsub
   /// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR.
@@ -1412,6 +1415,20 @@ class TargetTransformInfo {
   Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
   const Instruction *I = nullptr) const;
 
+  /// \return The cost of strided memory operations.
+  /// \p Opcode - is a type of memory access Load or Store
+  /// \p DataTy - a vector type of the data to be loaded or stored
+  /// \p Ptr - pointer [or vector of pointers] - address[es] in memory
+  /// \p VariableMask - true when the memory access is predicated with a mask
+  ///   that is not a compile-time constant
+  /// \p Alignment - alignment of single element
+  /// \p I - the optional original context instruction, if one exists, e.g. the
+  ///load/store to transform or the call to the gather/scatter 
intrinsic
+  InstructionCost getStridedMemoryOpCost(
+  unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
+  Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
+  const Instruction *I = nullptr) const;
+
   /// \return The cost of the interleaved memory operation.
   /// \p Opcode is the memory operation code
   /// \p VecTy is the vector type of the interleaved access.
@@ -1848,6 +1865,7 @@ class TargetTransformInfo::Concept {
Align Alignment) = 0;
   virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;
   virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;
+  virtual bool isLegalStridedLoad(Type *DataType, Align Alignment) = 0;
   virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
unsigned Opcode1,
const SmallBitVector ) const = 0;
@@ -2023,6 +2041,11 @@ class TargetTransformInfo::Concept {
  bool VariableMask, Align Alignment,
  TTI::TargetCostKind CostKind,
  const Instruction *I = nullptr) = 0;
+  virtual InstructionCost
+  getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
+ bool VariableMask, Align Alignment,
+ TTI::TargetCostKind CostKind,
+ const Instruction *I = nullptr) = 0;
 
   virtual InstructionCost getInterleavedMemoryOpCost(
   unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef 
Indices,
@@ -2341,6 +2364,9 @@ class TargetTransformInfo::Model final : public 
TargetTransformInfo::Concept {
   bool isLegalMaskedExpandLoad(Type *DataType) override {
 return Impl.isLegalMaskedExpandLoad(DataType);
   }
+  bool isLegalStridedLoad(Type *DataType, Align Alignment) override {
+return Impl.isLegalStridedLoad(DataType, Alignment);
+  }
   bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
const SmallBitVector ) const override {
 return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
@@ -2671,6 +2697,14 @@ class TargetTransformInfo::Model final : public 

[Lldb-commits] [libcxxabi] [lldb] [clang] [flang] [compiler-rt] [lld] [libc] [clang-tools-extra] [libcxx] [llvm] [SLP]Improve findReusedOrderedScalars and graph rotation. (PR #77529)

2024-02-05 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/77529

>From 7440ee8ba235fd871af0999f66d5d6130456400b Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Tue, 9 Jan 2024 21:43:31 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Transforms/Vectorize/SLPVectorizer.cpp| 476 ++
 .../AArch64/extractelements-to-shuffle.ll |  16 +-
 .../AArch64/reorder-fmuladd-crash.ll  |   7 +-
 .../SLPVectorizer/AArch64/tsc-s116.ll |  22 +-
 .../Transforms/SLPVectorizer/X86/pr35497.ll   |  16 +-
 .../SLPVectorizer/X86/reduction-transpose.ll  |  16 +-
 .../X86/reorder-clustered-node.ll |  11 +-
 .../X86/reorder-reused-masked-gather.ll   |   7 +-
 .../SLPVectorizer/X86/reorder-vf-to-resize.ll |   2 +-
 .../X86/scatter-vectorize-reorder.ll  |  17 +-
 .../X86/shrink_after_reorder2.ll  |  11 +-
 11 files changed, 445 insertions(+), 156 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 8e22b54f002d1c..4765cef290b9df 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -858,7 +858,7 @@ static void addMask(SmallVectorImpl , 
ArrayRef SubMask,
 /// values 3 and 7 respectively:
 /// before:  6 9 5 4 9 2 1 0
 /// after:   6 3 5 4 7 2 1 0
-static void fixupOrderingIndices(SmallVectorImpl ) {
+static void fixupOrderingIndices(MutableArrayRef Order) {
   const unsigned Sz = Order.size();
   SmallBitVector UnusedIndices(Sz, /*t=*/true);
   SmallBitVector MaskedIndices(Sz);
@@ -2418,7 +2418,8 @@ class BoUpSLP {
   std::optional
   isGatherShuffledSingleRegisterEntry(
   const TreeEntry *TE, ArrayRef VL, MutableArrayRef Mask,
-  SmallVectorImpl , unsigned Part);
+  SmallVectorImpl , unsigned Part,
+  bool ForOrder);
 
   /// Checks if the gathered \p VL can be represented as multi-register
   /// shuffle(s) of previous tree entries.
@@ -2432,7 +2433,7 @@ class BoUpSLP {
   isGatherShuffledEntry(
   const TreeEntry *TE, ArrayRef VL, SmallVectorImpl ,
   SmallVectorImpl> ,
-  unsigned NumParts);
+  unsigned NumParts, bool ForOrder = false);
 
   /// \returns the scalarization cost for this list of values. Assuming that
   /// this subtree gets vectorized, we may need to extract the values from the
@@ -3756,65 +3757,169 @@ static void reorderOrder(SmallVectorImpl 
, ArrayRef Mask) {
 std::optional
 BoUpSLP::findReusedOrderedScalars(const BoUpSLP::TreeEntry ) {
   assert(TE.State == TreeEntry::NeedToGather && "Expected gather node only.");
-  unsigned NumScalars = TE.Scalars.size();
+  // Try to find subvector extract/insert patterns and reorder only such
+  // patterns.
+  SmallVector GatheredScalars(TE.Scalars.begin(), TE.Scalars.end());
+  Type *ScalarTy = GatheredScalars.front()->getType();
+  int NumScalars = GatheredScalars.size();
+  if (!isValidElementType(ScalarTy))
+return std::nullopt;
+  auto *VecTy = FixedVectorType::get(ScalarTy, NumScalars);
+  int NumParts = TTI->getNumberOfParts(VecTy);
+  if (NumParts == 0 || NumParts >= NumScalars)
+NumParts = 1;
+  SmallVector ExtractMask;
+  SmallVector Mask;
+  SmallVector> Entries;
+  SmallVector> ExtractShuffles 
=
+  tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
+  SmallVector> GatherShuffles =
+  isGatherShuffledEntry(, GatheredScalars, Mask, Entries, NumParts,
+/*ForOrder=*/true);
+  // No shuffled operands - ignore.
+  if (GatherShuffles.empty() && ExtractShuffles.empty())
+return std::nullopt;
   OrdersType CurrentOrder(NumScalars, NumScalars);
-  SmallVector Positions;
-  SmallBitVector UsedPositions(NumScalars);
-  const TreeEntry *STE = nullptr;
-  // Try to find all gathered scalars that are gets vectorized in other
-  // vectorize node. Here we can have only one single tree vector node to
-  // correctly identify order of the gathered scalars.
-  for (unsigned I = 0; I < NumScalars; ++I) {
-Value *V = TE.Scalars[I];
-if (!isa(V))
-  continue;
-if (const auto *LocalSTE = getTreeEntry(V)) {
-  if (!STE)
-STE = LocalSTE;
-  else if (STE != LocalSTE)
-// Take the order only from the single vector node.
-return std::nullopt;
-  unsigned Lane =
-  std::distance(STE->Scalars.begin(), find(STE->Scalars, V));
-  if (Lane >= NumScalars)
-return std::nullopt;
-  if (CurrentOrder[Lane] != NumScalars) {
-if (Lane != I)
+  if (GatherShuffles.size() == 1 &&
+  *GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&
+  Entries.front().front()->isSame(TE.Scalars)) {
+// Exclude nodes for strided geps from analysis, better to reorder them.
+if 

[Lldb-commits] [lldb] [clang] [clang-tools-extra] [libcxxabi] [libc] [flang] [llvm] [lld] [libcxx] [compiler-rt] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-05 Thread Alexey Bataev via lldb-commits


@@ -326,6 +326,48 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp) &&
+  LT.second.getVectorElementType() != MVT::i1) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() &&
+   NumElems * LT.second.getScalarSizeInBits() ==
+   ST->getRealMinVLen() &&
+   Index % NumElems == 0))

alexey-bataev wrote:

Added couple, hope it works

https://github.com/llvm/llvm-project/pull/80164
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [clang] [clang-tools-extra] [libcxxabi] [libc] [flang] [llvm] [lld] [libcxx] [compiler-rt] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-05 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80164

>From cfd0dcfa1f5fabd12cf4d7bf8d5a10bd324ace0a Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Wed, 31 Jan 2024 16:47:49 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  42 +
 .../RISCV/shuffle-extract_subvector.ll| 174 +-
 .../RISCV/shuffle-insert_subvector.ll |  42 ++---
 .../CostModel/RISCV/shuffle-interleave.ll |   6 +-
 4 files changed, 153 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index fe1cdb2dfa423..465a05b6497a2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -326,6 +326,48 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
+case TTI::SK_InsertSubvector:
+  if (auto *FSubTy = dyn_cast(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned SubTpRegs = getRegUsageForType(SubTp);
+unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+Tp->getElementType(), FSubTy->getNumElements() + 1));
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector insert - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0 &&
+(any_of(Args, UndefValue::classof) ||
+ (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&
+  TpRegs / SubTpRegs > 1))) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
 case TTI::SK_PermuteSingleSrc: {
   if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
 MVT EltTp = LT.second.getVectorElementType();
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll 
b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index 76cb1955a2b37..901d66e1124d8 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -9,15 +9,15 @@
 
 define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 ; CHECK-LABEL: 'test_vXf64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an 

[Lldb-commits] [lldb] [libcxx] [clang] [libc] [llvm] [libcxxabi] [flang] [lld] [compiler-rt] [clang-tools-extra] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-05 Thread Alexey Bataev via lldb-commits


@@ -326,6 +326,48 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp) &&
+  LT.second.getVectorElementType() != MVT::i1) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index == 0 || (ST->getRealMaxVLen() == ST->getRealMinVLen() &&
+   NumElems * LT.second.getScalarSizeInBits() ==

alexey-bataev wrote:

Fixed, thanks!

https://github.com/llvm/llvm-project/pull/80164
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [compiler-rt] [libc] [libcxx] [libcxxabi] [llvm] [clang-tools-extra] [lld] [lldb] [clang] [flang] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-02 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80164

>From cfd0dcfa1f5fabd12cf4d7bf8d5a10bd324ace0a Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Wed, 31 Jan 2024 16:47:49 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  42 +
 .../RISCV/shuffle-extract_subvector.ll| 174 +-
 .../RISCV/shuffle-insert_subvector.ll |  42 ++---
 .../CostModel/RISCV/shuffle-interleave.ll |   6 +-
 4 files changed, 153 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index fe1cdb2dfa423..465a05b6497a2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -326,6 +326,48 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
+case TTI::SK_InsertSubvector:
+  if (auto *FSubTy = dyn_cast(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned SubTpRegs = getRegUsageForType(SubTp);
+unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+Tp->getElementType(), FSubTy->getNumElements() + 1));
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector insert - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0 &&
+(any_of(Args, UndefValue::classof) ||
+ (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&
+  TpRegs / SubTpRegs > 1))) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
 case TTI::SK_PermuteSingleSrc: {
   if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
 MVT EltTp = LT.second.getVectorElementType();
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll 
b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index 76cb1955a2b37..901d66e1124d8 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -9,15 +9,15 @@
 
 define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 ; CHECK-LABEL: 'test_vXf64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an 

[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80310

>From 92950afd39034c0184a3c807f8062e0053eead5c Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Thu, 1 Feb 2024 17:22:34 +
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../llvm/Analysis/TargetTransformInfo.h   |  34 ++
 .../llvm/Analysis/TargetTransformInfoImpl.h   |  13 +
 llvm/lib/Analysis/TargetTransformInfo.cpp |  14 +
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  23 +
 .../Target/RISCV/RISCVTargetTransformInfo.h   |  23 +
 .../Transforms/Vectorize/SLPVectorizer.cpp| 397 --
 .../SLPVectorizer/RISCV/complex-loads.ll  | 132 +++---
 .../RISCV/strided-loads-vectorized.ll | 209 +
 .../strided-loads-with-external-use-ptr.ll|   4 +-
 .../SLPVectorizer/RISCV/strided-loads.ll  |  13 +-
 .../X86/gep-nodes-with-non-gep-inst.ll|   2 +-
 .../X86/remark_gather-load-redux-cost.ll  |   2 +-
 12 files changed, 478 insertions(+), 388 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3b615bc700bbb..b0b6dab03fa38 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -781,6 +781,9 @@ class TargetTransformInfo {
   /// Return true if the target supports masked expand load.
   bool isLegalMaskedExpandLoad(Type *DataType) const;
 
+  /// Return true if the target supports strided load.
+  bool isLegalStridedLoad(Type *DataType, Align Alignment) const;
+
   /// Return true if this is an alternating opcode pattern that can be lowered
   /// to a single instruction on the target. In X86 this is for the addsub
   /// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR.
@@ -1412,6 +1415,20 @@ class TargetTransformInfo {
   Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
   const Instruction *I = nullptr) const;
 
+  /// \return The cost of strided memory operations.
+  /// \p Opcode - is a type of memory access Load or Store
+  /// \p DataTy - a vector type of the data to be loaded or stored
+  /// \p Ptr - pointer [or vector of pointers] - address[es] in memory
+  /// \p VariableMask - true when the memory access is predicated with a mask
+  ///   that is not a compile-time constant
+  /// \p Alignment - alignment of single element
+  /// \p I - the optional original context instruction, if one exists, e.g. the
+  ///load/store to transform or the call to the gather/scatter 
intrinsic
+  InstructionCost getStridedMemoryOpCost(
+  unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
+  Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
+  const Instruction *I = nullptr) const;
+
   /// \return The cost of the interleaved memory operation.
   /// \p Opcode is the memory operation code
   /// \p VecTy is the vector type of the interleaved access.
@@ -1848,6 +1865,7 @@ class TargetTransformInfo::Concept {
Align Alignment) = 0;
   virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;
   virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;
+  virtual bool isLegalStridedLoad(Type *DataType, Align Alignment) = 0;
   virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
unsigned Opcode1,
const SmallBitVector ) const = 0;
@@ -2023,6 +2041,11 @@ class TargetTransformInfo::Concept {
  bool VariableMask, Align Alignment,
  TTI::TargetCostKind CostKind,
  const Instruction *I = nullptr) = 0;
+  virtual InstructionCost
+  getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
+ bool VariableMask, Align Alignment,
+ TTI::TargetCostKind CostKind,
+ const Instruction *I = nullptr) = 0;
 
   virtual InstructionCost getInterleavedMemoryOpCost(
   unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef 
Indices,
@@ -2341,6 +2364,9 @@ class TargetTransformInfo::Model final : public 
TargetTransformInfo::Concept {
   bool isLegalMaskedExpandLoad(Type *DataType) override {
 return Impl.isLegalMaskedExpandLoad(DataType);
   }
+  bool isLegalStridedLoad(Type *DataType, Align Alignment) override {
+return Impl.isLegalStridedLoad(DataType, Alignment);
+  }
   bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
const SmallBitVector ) const override {
 return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
@@ -2671,6 +2697,14 @@ class TargetTransformInfo::Model final : public 

[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -3930,30 +4065,68 @@ static LoadsState canVectorizeLoads(ArrayRef 
VL, const Value *VL0,
   std::optional Diff =
   getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
   // Check that the sorted loads are consecutive.
-  if (static_cast(*Diff) == VL.size() - 1)
+  if (static_cast(*Diff) == Sz - 1)
 return LoadsState::Vectorize;
   // Simple check if not a strided access - clear order.
-  IsPossibleStrided = *Diff % (VL.size() - 1) == 0;
+  bool IsPossibleStrided = *Diff % (Sz - 1) == 0;
+  // Try to generate strided load node if:
+  // 1. Target with strided load support is detected.
+  // 2. The number of loads is greater than MinProfitableStridedLoads,
+  // or the potential stride <= MaxProfitableLoadStride and the
+  // potential stride is power-of-2 (to avoid perf regressions for the very
+  // small number of loads) and max distance > number of loads, or 
potential
+  // stride is -1.
+  // 3. The loads are ordered, or number of unordered loads <=
+  // MaxProfitableUnorderedLoads, or loads are in reversed order.
+  // (this check is to avoid extra costs for very expensive shuffles).
+  if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads ||
+  (static_cast(std::abs(*Diff)) <=
+   MaxProfitableLoadStride * Sz &&
+   isPowerOf2_32(std::abs(*Diff &&
+ static_cast(std::abs(*Diff)) > Sz) 
||
+*Diff == -(static_cast(Sz) - 1))) {
+int Stride = *Diff / static_cast(Sz - 1);

alexey-bataev wrote:

This is stride in "scalar elements", here the size in bytes is not important, 
getPointersDiff() handles pointers with different types (sizes). Here we just 
looking that the pointers have proportional constant distances between them.

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -3930,30 +4065,68 @@ static LoadsState canVectorizeLoads(ArrayRef 
VL, const Value *VL0,
   std::optional Diff =
   getPointersDiff(ScalarTy, Ptr0, ScalarTy, PtrN, DL, SE);
   // Check that the sorted loads are consecutive.
-  if (static_cast(*Diff) == VL.size() - 1)
+  if (static_cast(*Diff) == Sz - 1)
 return LoadsState::Vectorize;
   // Simple check if not a strided access - clear order.
-  IsPossibleStrided = *Diff % (VL.size() - 1) == 0;
+  bool IsPossibleStrided = *Diff % (Sz - 1) == 0;
+  // Try to generate strided load node if:
+  // 1. Target with strided load support is detected.
+  // 2. The number of loads is greater than MinProfitableStridedLoads,
+  // or the potential stride <= MaxProfitableLoadStride and the
+  // potential stride is power-of-2 (to avoid perf regressions for the very
+  // small number of loads) and max distance > number of loads, or 
potential
+  // stride is -1.
+  // 3. The loads are ordered, or number of unordered loads <=
+  // MaxProfitableUnorderedLoads, or loads are in reversed order.
+  // (this check is to avoid extra costs for very expensive shuffles).
+  if (IsPossibleStrided && (((Sz > MinProfitableStridedLoads ||
+  (static_cast(std::abs(*Diff)) <=
+   MaxProfitableLoadStride * Sz &&
+   isPowerOf2_32(std::abs(*Diff &&
+ static_cast(std::abs(*Diff)) > Sz) 
||
+*Diff == -(static_cast(Sz) - 1))) {
+int Stride = *Diff / static_cast(Sz - 1);
+if (*Diff == Stride * static_cast(Sz - 1)) {
+  if (TTI.isTypeLegal(VecTy) &&

alexey-bataev wrote:

Removed

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -3878,6 +3883,130 @@ static Align computeCommonAlignment(ArrayRef 
VL) {
   return CommonAlignment;
 }
 
+/// Check if \p Order represents reverse order.
+static bool isReverseOrder(ArrayRef Order) {
+  unsigned Sz = Order.size();
+  return !Order.empty() && all_of(enumerate(Order), [&](const auto ) {
+return Pair.value() == Sz || Sz - Pair.index() - 1 == Pair.value();
+  });
+}
+
+/// Checks if the provided list of pointers \p Pointers represents the strided
+/// pointers for type ElemTy. If they are not, std::nullopt is returned.
+/// Otherwise, if \p Inst is not specified, just initialized optional value is
+/// returned to show that the pointers represent strided pointers. If \p Inst
+/// specified, the runtime stride is materialized before the given \p Inst.
+/// \returns std::nullopt if the pointers are not pointers with the runtime
+/// stride, nullptr or actual stride value, otherwise.
+static std::optional
+calculateRtStride(ArrayRef PointerOps, Type *ElemTy,
+  const DataLayout , ScalarEvolution ,
+  SmallVectorImpl ,
+  Instruction *Inst = nullptr) {
+  SmallVector SCEVs;

alexey-bataev wrote:

Constant strides covered separately, this one checks for non-constant strides 
and it does not care about the order, it sorts them properly

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -397,27 +241,12 @@ define void @test3([48 x float]* %p, float* noalias %s) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P:%.*]], i64 0, i64 0
 ; CHECK-NEXT:[[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr 
[[S:%.*]], i64 0
-; CHECK-NEXT:[[ARRAYIDX4:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 4
-; CHECK-NEXT:[[ARRAYIDX11:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 8
-; CHECK-NEXT:[[ARRAYIDX18:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 12
-; CHECK-NEXT:[[ARRAYIDX25:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 16
-; CHECK-NEXT:[[ARRAYIDX32:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 20
-; CHECK-NEXT:[[ARRAYIDX39:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 24
-; CHECK-NEXT:[[ARRAYIDX46:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 28
 ; CHECK-NEXT:[[ARRAYIDX48:%.*]] = getelementptr inbounds [48 x float], ptr 
[[P]], i64 0, i64 23
-; CHECK-NEXT:[[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr 
[[ARRAYIDX]], i32 0
-; CHECK-NEXT:[[TMP1:%.*]] = insertelement <8 x ptr> [[TMP0]], ptr 
[[ARRAYIDX4]], i32 1
-; CHECK-NEXT:[[TMP2:%.*]] = insertelement <8 x ptr> [[TMP1]], ptr 
[[ARRAYIDX11]], i32 2
-; CHECK-NEXT:[[TMP3:%.*]] = insertelement <8 x ptr> [[TMP2]], ptr 
[[ARRAYIDX18]], i32 3
-; CHECK-NEXT:[[TMP4:%.*]] = insertelement <8 x ptr> [[TMP3]], ptr 
[[ARRAYIDX25]], i32 4
-; CHECK-NEXT:[[TMP5:%.*]] = insertelement <8 x ptr> [[TMP4]], ptr 
[[ARRAYIDX32]], i32 5
-; CHECK-NEXT:[[TMP6:%.*]] = insertelement <8 x ptr> [[TMP5]], ptr 
[[ARRAYIDX39]], i32 6
-; CHECK-NEXT:[[TMP7:%.*]] = insertelement <8 x ptr> [[TMP6]], ptr 
[[ARRAYIDX46]], i32 7
-; CHECK-NEXT:[[TMP8:%.*]] = call <8 x float> 
@llvm.masked.gather.v8f32.v8p0(<8 x ptr> [[TMP7]], i32 4, <8 x i1> , <8 x float> poison)
-; CHECK-NEXT:[[TMP9:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4
-; CHECK-NEXT:[[TMP10:%.*]] = shufflevector <8 x float> [[TMP9]], <8 x 
float> poison, <8 x i32> 
-; CHECK-NEXT:[[TMP11:%.*]] = fsub fast <8 x float> [[TMP10]], [[TMP8]]
-; CHECK-NEXT:store <8 x float> [[TMP11]], ptr [[ARRAYIDX2]], align 4
+; CHECK-NEXT:[[TMP0:%.*]] = call <8 x float> 
@llvm.experimental.vp.strided.load.v8f32.p0.i64(ptr align 4 [[ARRAYIDX]], i64 
16, <8 x i1> , i32 8)
+; CHECK-NEXT:[[TMP1:%.*]] = load <8 x float>, ptr [[ARRAYIDX48]], align 4
+; CHECK-NEXT:[[TMP2:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x 
float> poison, <8 x i32> 

alexey-bataev wrote:

It can, planned for the next patch(es), cannot put all the stuff in a single 
patch

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -7,7 +7,7 @@ define i32 @test(ptr noalias %p, ptr noalias %addr) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[TMP0:%.*]] = insertelement <8 x ptr> poison, ptr 
[[ADDR:%.*]], i32 0
 ; CHECK-NEXT:[[TMP1:%.*]] = shufflevector <8 x ptr> [[TMP0]], <8 x ptr> 
poison, <8 x i32> zeroinitializer
-; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, <8 x ptr> [[TMP1]], <8 x 
i32> 
+; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, <8 x ptr> [[TMP1]], <8 x 
i32> 

alexey-bataev wrote:

Same, TTI for X86 does not support strided loads, so the order is not important

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -30,7 +30,7 @@ define void @test() {
 ; CHECK-SLP-THRESHOLD:   bb:
 ; CHECK-SLP-THRESHOLD-NEXT:[[TMP0:%.*]] = insertelement <4 x ptr> poison, 
ptr [[COND_IN_V]], i32 0
 ; CHECK-SLP-THRESHOLD-NEXT:[[TMP1:%.*]] = shufflevector <4 x ptr> 
[[TMP0]], <4 x ptr> poison, <4 x i32> zeroinitializer
-; CHECK-SLP-THRESHOLD-NEXT:[[TMP2:%.*]] = getelementptr i64, <4 x ptr> 
[[TMP1]], <4 x i64> 
+; CHECK-SLP-THRESHOLD-NEXT:[[TMP2:%.*]] = getelementptr i64, <4 x ptr> 
[[TMP1]], <4 x i64> 

alexey-bataev wrote:

For X86 target it is supposed as not supported currently, so it just produces 
masked gather and the order is not important

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxx] [lldb] [clang] [lld] [libc] [llvm] [mlir] [flang] [openmp] [clang-tools-extra] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits


@@ -17,7 +17,7 @@ define i16 @test() {
 ; CHECK-NEXT:[[TMP4:%.*]] = call <2 x i16> 
@llvm.masked.gather.v2i16.v2p0(<2 x ptr> [[TMP3]], i32 2, <2 x i1> , <2 x i16> poison)
 ; CHECK-NEXT:[[TMP5:%.*]] = extractelement <2 x i16> [[TMP4]], i32 0
 ; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i16> [[TMP4]], i32 1
-; CHECK-NEXT:[[CMP_I178:%.*]] = icmp ult i16 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:[[CMP_I178:%.*]] = icmp ult i16 [[TMP5]], [[TMP6]]
 ; CHECK-NEXT:br label [[WHILE_BODY_I]]
 ;
 entry:

alexey-bataev wrote:

For SLP vectorizer it is not important.

https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [flang] [openmp] [libcxx] [clang-tools-extra] [mlir] [lldb] [llvm] [lld] [libc] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev edited 
https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [libcxx] [libc] [clang-tools-extra] [mlir] [lld] [lldb] [openmp] [clang] [flang] [SLP]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev edited 
https://github.com/llvm/llvm-project/pull/80310
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [llvm] [libcxx] [libc] [clang-tools-extra] [mlir] [lld] [lldb] [openmp] [clang] [flang] [SLP][TTI]Add support for strided loads. (PR #80310)

2024-02-01 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80310

>From 92950afd39034c0184a3c807f8062e0053eead5c Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Thu, 1 Feb 2024 17:22:34 +
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../llvm/Analysis/TargetTransformInfo.h   |  34 ++
 .../llvm/Analysis/TargetTransformInfoImpl.h   |  13 +
 llvm/lib/Analysis/TargetTransformInfo.cpp |  14 +
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  23 +
 .../Target/RISCV/RISCVTargetTransformInfo.h   |  23 +
 .../Transforms/Vectorize/SLPVectorizer.cpp| 397 --
 .../SLPVectorizer/RISCV/complex-loads.ll  | 132 +++---
 .../RISCV/strided-loads-vectorized.ll | 209 +
 .../strided-loads-with-external-use-ptr.ll|   4 +-
 .../SLPVectorizer/RISCV/strided-loads.ll  |  13 +-
 .../X86/gep-nodes-with-non-gep-inst.ll|   2 +-
 .../X86/remark_gather-load-redux-cost.ll  |   2 +-
 12 files changed, 478 insertions(+), 388 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3b615bc700bbb..b0b6dab03fa38 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -781,6 +781,9 @@ class TargetTransformInfo {
   /// Return true if the target supports masked expand load.
   bool isLegalMaskedExpandLoad(Type *DataType) const;
 
+  /// Return true if the target supports strided load.
+  bool isLegalStridedLoad(Type *DataType, Align Alignment) const;
+
   /// Return true if this is an alternating opcode pattern that can be lowered
   /// to a single instruction on the target. In X86 this is for the addsub
   /// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR.
@@ -1412,6 +1415,20 @@ class TargetTransformInfo {
   Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
   const Instruction *I = nullptr) const;
 
+  /// \return The cost of strided memory operations.
+  /// \p Opcode - is a type of memory access Load or Store
+  /// \p DataTy - a vector type of the data to be loaded or stored
+  /// \p Ptr - pointer [or vector of pointers] - address[es] in memory
+  /// \p VariableMask - true when the memory access is predicated with a mask
+  ///   that is not a compile-time constant
+  /// \p Alignment - alignment of single element
+  /// \p I - the optional original context instruction, if one exists, e.g. the
+  ///load/store to transform or the call to the gather/scatter 
intrinsic
+  InstructionCost getStridedMemoryOpCost(
+  unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
+  Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
+  const Instruction *I = nullptr) const;
+
   /// \return The cost of the interleaved memory operation.
   /// \p Opcode is the memory operation code
   /// \p VecTy is the vector type of the interleaved access.
@@ -1848,6 +1865,7 @@ class TargetTransformInfo::Concept {
Align Alignment) = 0;
   virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;
   virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;
+  virtual bool isLegalStridedLoad(Type *DataType, Align Alignment) = 0;
   virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
unsigned Opcode1,
const SmallBitVector ) const = 0;
@@ -2023,6 +2041,11 @@ class TargetTransformInfo::Concept {
  bool VariableMask, Align Alignment,
  TTI::TargetCostKind CostKind,
  const Instruction *I = nullptr) = 0;
+  virtual InstructionCost
+  getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
+ bool VariableMask, Align Alignment,
+ TTI::TargetCostKind CostKind,
+ const Instruction *I = nullptr) = 0;
 
   virtual InstructionCost getInterleavedMemoryOpCost(
   unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef 
Indices,
@@ -2341,6 +2364,9 @@ class TargetTransformInfo::Model final : public 
TargetTransformInfo::Concept {
   bool isLegalMaskedExpandLoad(Type *DataType) override {
 return Impl.isLegalMaskedExpandLoad(DataType);
   }
+  bool isLegalStridedLoad(Type *DataType, Align Alignment) override {
+return Impl.isLegalStridedLoad(DataType, Alignment);
+  }
   bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
const SmallBitVector ) const override {
 return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
@@ -2671,6 +2697,14 @@ class TargetTransformInfo::Model final : public 

[Lldb-commits] [clang-tools-extra] [libcxx] [lldb] [lld] [llvm] [compiler-rt] [libc] [libcxxabi] [flang] [clang] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-02-01 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80164

>From cfd0dcfa1f5fabd12cf4d7bf8d5a10bd324ace0a Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Wed, 31 Jan 2024 16:47:49 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  42 +
 .../RISCV/shuffle-extract_subvector.ll| 174 +-
 .../RISCV/shuffle-insert_subvector.ll |  42 ++---
 .../CostModel/RISCV/shuffle-interleave.ll |   6 +-
 4 files changed, 153 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index fe1cdb2dfa423..465a05b6497a2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -326,6 +326,48 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
+case TTI::SK_InsertSubvector:
+  if (auto *FSubTy = dyn_cast(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned SubTpRegs = getRegUsageForType(SubTp);
+unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+Tp->getElementType(), FSubTy->getNumElements() + 1));
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector insert - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0 &&
+(any_of(Args, UndefValue::classof) ||
+ (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&
+  TpRegs / SubTpRegs > 1))) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
 case TTI::SK_PermuteSingleSrc: {
   if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
 MVT EltTp = LT.second.getVectorElementType();
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll 
b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index 76cb1955a2b37..901d66e1124d8 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -9,15 +9,15 @@
 
 define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 ; CHECK-LABEL: 'test_vXf64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an 

[Lldb-commits] [lldb] [clang-tools-extra] [libcxx] [lld] [libcxxabi] [clang] [libc] [llvm] [compiler-rt] [flang] [SLP]Improve findReusedOrderedScalars and graph rotation. (PR #77529)

2024-02-01 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/77529

>From 7440ee8ba235fd871af0999f66d5d6130456400b Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Tue, 9 Jan 2024 21:43:31 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Transforms/Vectorize/SLPVectorizer.cpp| 476 ++
 .../AArch64/extractelements-to-shuffle.ll |  16 +-
 .../AArch64/reorder-fmuladd-crash.ll  |   7 +-
 .../SLPVectorizer/AArch64/tsc-s116.ll |  22 +-
 .../Transforms/SLPVectorizer/X86/pr35497.ll   |  16 +-
 .../SLPVectorizer/X86/reduction-transpose.ll  |  16 +-
 .../X86/reorder-clustered-node.ll |  11 +-
 .../X86/reorder-reused-masked-gather.ll   |   7 +-
 .../SLPVectorizer/X86/reorder-vf-to-resize.ll |   2 +-
 .../X86/scatter-vectorize-reorder.ll  |  17 +-
 .../X86/shrink_after_reorder2.ll  |  11 +-
 11 files changed, 445 insertions(+), 156 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 8e22b54f002d1..4765cef290b9d 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -858,7 +858,7 @@ static void addMask(SmallVectorImpl , 
ArrayRef SubMask,
 /// values 3 and 7 respectively:
 /// before:  6 9 5 4 9 2 1 0
 /// after:   6 3 5 4 7 2 1 0
-static void fixupOrderingIndices(SmallVectorImpl ) {
+static void fixupOrderingIndices(MutableArrayRef Order) {
   const unsigned Sz = Order.size();
   SmallBitVector UnusedIndices(Sz, /*t=*/true);
   SmallBitVector MaskedIndices(Sz);
@@ -2418,7 +2418,8 @@ class BoUpSLP {
   std::optional
   isGatherShuffledSingleRegisterEntry(
   const TreeEntry *TE, ArrayRef VL, MutableArrayRef Mask,
-  SmallVectorImpl , unsigned Part);
+  SmallVectorImpl , unsigned Part,
+  bool ForOrder);
 
   /// Checks if the gathered \p VL can be represented as multi-register
   /// shuffle(s) of previous tree entries.
@@ -2432,7 +2433,7 @@ class BoUpSLP {
   isGatherShuffledEntry(
   const TreeEntry *TE, ArrayRef VL, SmallVectorImpl ,
   SmallVectorImpl> ,
-  unsigned NumParts);
+  unsigned NumParts, bool ForOrder = false);
 
   /// \returns the scalarization cost for this list of values. Assuming that
   /// this subtree gets vectorized, we may need to extract the values from the
@@ -3756,65 +3757,169 @@ static void reorderOrder(SmallVectorImpl 
, ArrayRef Mask) {
 std::optional
 BoUpSLP::findReusedOrderedScalars(const BoUpSLP::TreeEntry ) {
   assert(TE.State == TreeEntry::NeedToGather && "Expected gather node only.");
-  unsigned NumScalars = TE.Scalars.size();
+  // Try to find subvector extract/insert patterns and reorder only such
+  // patterns.
+  SmallVector GatheredScalars(TE.Scalars.begin(), TE.Scalars.end());
+  Type *ScalarTy = GatheredScalars.front()->getType();
+  int NumScalars = GatheredScalars.size();
+  if (!isValidElementType(ScalarTy))
+return std::nullopt;
+  auto *VecTy = FixedVectorType::get(ScalarTy, NumScalars);
+  int NumParts = TTI->getNumberOfParts(VecTy);
+  if (NumParts == 0 || NumParts >= NumScalars)
+NumParts = 1;
+  SmallVector ExtractMask;
+  SmallVector Mask;
+  SmallVector> Entries;
+  SmallVector> ExtractShuffles 
=
+  tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
+  SmallVector> GatherShuffles =
+  isGatherShuffledEntry(, GatheredScalars, Mask, Entries, NumParts,
+/*ForOrder=*/true);
+  // No shuffled operands - ignore.
+  if (GatherShuffles.empty() && ExtractShuffles.empty())
+return std::nullopt;
   OrdersType CurrentOrder(NumScalars, NumScalars);
-  SmallVector Positions;
-  SmallBitVector UsedPositions(NumScalars);
-  const TreeEntry *STE = nullptr;
-  // Try to find all gathered scalars that are gets vectorized in other
-  // vectorize node. Here we can have only one single tree vector node to
-  // correctly identify order of the gathered scalars.
-  for (unsigned I = 0; I < NumScalars; ++I) {
-Value *V = TE.Scalars[I];
-if (!isa(V))
-  continue;
-if (const auto *LocalSTE = getTreeEntry(V)) {
-  if (!STE)
-STE = LocalSTE;
-  else if (STE != LocalSTE)
-// Take the order only from the single vector node.
-return std::nullopt;
-  unsigned Lane =
-  std::distance(STE->Scalars.begin(), find(STE->Scalars, V));
-  if (Lane >= NumScalars)
-return std::nullopt;
-  if (CurrentOrder[Lane] != NumScalars) {
-if (Lane != I)
+  if (GatherShuffles.size() == 1 &&
+  *GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&
+  Entries.front().front()->isSame(TE.Scalars)) {
+// Exclude nodes for strided geps from analysis, better to reorder them.
+if 

[Lldb-commits] [lld] [llvm] [libc] [lldb] [compiler-rt] [clang-tools-extra] [flang] [libcxx] [clang] [TTI][RISCV]Improve costs for fixed vector whole reg extract/insert. (PR #80164)

2024-01-31 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/80164

>From cfd0dcfa1f5fabd12cf4d7bf8d5a10bd324ace0a Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Wed, 31 Jan 2024 16:47:49 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp |  42 +
 .../RISCV/shuffle-extract_subvector.ll| 174 +-
 .../RISCV/shuffle-insert_subvector.ll |  42 ++---
 .../CostModel/RISCV/shuffle-interleave.ll |   6 +-
 4 files changed, 153 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp 
b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index fe1cdb2dfa423..465a05b6497a2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -326,6 +326,48 @@ InstructionCost 
RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 switch (Kind) {
 default:
   break;
+case TTI::SK_ExtractSubvector:
+  if (isa(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector extract - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
+case TTI::SK_InsertSubvector:
+  if (auto *FSubTy = dyn_cast(SubTp)) {
+unsigned TpRegs = getRegUsageForType(Tp);
+unsigned SubTpRegs = getRegUsageForType(SubTp);
+unsigned NextSubTpRegs = getRegUsageForType(FixedVectorType::get(
+Tp->getElementType(), FSubTy->getNumElements() + 1));
+unsigned NumElems =
+divideCeil(Tp->getElementCount().getFixedValue(), TpRegs);
+// Whole vector insert - just the vector itself + (possible) vsetvli.
+// TODO: consider adding the cost for vsetvli.
+if (Index % NumElems == 0 &&
+(any_of(Args, UndefValue::classof) ||
+ (SubTpRegs != 0 && SubTpRegs != NextSubTpRegs &&
+  TpRegs / SubTpRegs > 1))) {
+  std::pair SubLT =
+  getTypeLegalizationCost(SubTp);
+  return Index == 0
+ ? TTI::TCC_Free
+ : SubLT.first * getRISCVInstructionCost(RISCV::VMV_V_V,
+ SubLT.second,
+ CostKind);
+}
+  }
+  break;
 case TTI::SK_PermuteSingleSrc: {
   if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
 MVT EltTp = LT.second.getVectorElementType();
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll 
b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
index 76cb1955a2b37..901d66e1124d8 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-extract_subvector.ll
@@ -9,15 +9,15 @@
 
 define void @test_vXf64(<4 x double> %src256, <8 x double> %src512) {
 ; CHECK-LABEL: 'test_vXf64'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_01 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V256_23 = shufflevector <4 x double> %src256, <4 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_01 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_23 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_45 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_67 = shufflevector <8 x double> %src512, <8 x double> undef, <2 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_0123 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: 
%V512_2345 = shufflevector <8 x double> %src512, <8 x double> undef, <4 x i32> 

-; CHECK-NEXT:  Cost Model: Found an 

[Lldb-commits] [llvm] [clang-tools-extra] [flang] [libc] [mlir] [lldb] [libcxx] [compiler-rt] [clang] [OpenMP] atomic compare weak : Parser & AST support (PR #79475)

2024-01-31 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev closed 
https://github.com/llvm/llvm-project/pull/79475
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [clang] [llvm] [libc] [compiler-rt] [lld] [libcxx] [clang-tools-extra] [libcxxabi] [flang] [SLP]Improve findReusedOrderedScalars and graph rotation. (PR #77529)

2024-01-30 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev updated 
https://github.com/llvm/llvm-project/pull/77529

>From 7440ee8ba235fd871af0999f66d5d6130456400b Mon Sep 17 00:00:00 2001
From: Alexey Bataev 
Date: Tue, 9 Jan 2024 21:43:31 +
Subject: [PATCH] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20initia?=
 =?UTF-8?q?l=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 .../Transforms/Vectorize/SLPVectorizer.cpp| 476 ++
 .../AArch64/extractelements-to-shuffle.ll |  16 +-
 .../AArch64/reorder-fmuladd-crash.ll  |   7 +-
 .../SLPVectorizer/AArch64/tsc-s116.ll |  22 +-
 .../Transforms/SLPVectorizer/X86/pr35497.ll   |  16 +-
 .../SLPVectorizer/X86/reduction-transpose.ll  |  16 +-
 .../X86/reorder-clustered-node.ll |  11 +-
 .../X86/reorder-reused-masked-gather.ll   |   7 +-
 .../SLPVectorizer/X86/reorder-vf-to-resize.ll |   2 +-
 .../X86/scatter-vectorize-reorder.ll  |  17 +-
 .../X86/shrink_after_reorder2.ll  |  11 +-
 11 files changed, 445 insertions(+), 156 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp 
b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 8e22b54f002d1..4765cef290b9d 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -858,7 +858,7 @@ static void addMask(SmallVectorImpl , 
ArrayRef SubMask,
 /// values 3 and 7 respectively:
 /// before:  6 9 5 4 9 2 1 0
 /// after:   6 3 5 4 7 2 1 0
-static void fixupOrderingIndices(SmallVectorImpl ) {
+static void fixupOrderingIndices(MutableArrayRef Order) {
   const unsigned Sz = Order.size();
   SmallBitVector UnusedIndices(Sz, /*t=*/true);
   SmallBitVector MaskedIndices(Sz);
@@ -2418,7 +2418,8 @@ class BoUpSLP {
   std::optional
   isGatherShuffledSingleRegisterEntry(
   const TreeEntry *TE, ArrayRef VL, MutableArrayRef Mask,
-  SmallVectorImpl , unsigned Part);
+  SmallVectorImpl , unsigned Part,
+  bool ForOrder);
 
   /// Checks if the gathered \p VL can be represented as multi-register
   /// shuffle(s) of previous tree entries.
@@ -2432,7 +2433,7 @@ class BoUpSLP {
   isGatherShuffledEntry(
   const TreeEntry *TE, ArrayRef VL, SmallVectorImpl ,
   SmallVectorImpl> ,
-  unsigned NumParts);
+  unsigned NumParts, bool ForOrder = false);
 
   /// \returns the scalarization cost for this list of values. Assuming that
   /// this subtree gets vectorized, we may need to extract the values from the
@@ -3756,65 +3757,169 @@ static void reorderOrder(SmallVectorImpl 
, ArrayRef Mask) {
 std::optional
 BoUpSLP::findReusedOrderedScalars(const BoUpSLP::TreeEntry ) {
   assert(TE.State == TreeEntry::NeedToGather && "Expected gather node only.");
-  unsigned NumScalars = TE.Scalars.size();
+  // Try to find subvector extract/insert patterns and reorder only such
+  // patterns.
+  SmallVector GatheredScalars(TE.Scalars.begin(), TE.Scalars.end());
+  Type *ScalarTy = GatheredScalars.front()->getType();
+  int NumScalars = GatheredScalars.size();
+  if (!isValidElementType(ScalarTy))
+return std::nullopt;
+  auto *VecTy = FixedVectorType::get(ScalarTy, NumScalars);
+  int NumParts = TTI->getNumberOfParts(VecTy);
+  if (NumParts == 0 || NumParts >= NumScalars)
+NumParts = 1;
+  SmallVector ExtractMask;
+  SmallVector Mask;
+  SmallVector> Entries;
+  SmallVector> ExtractShuffles 
=
+  tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
+  SmallVector> GatherShuffles =
+  isGatherShuffledEntry(, GatheredScalars, Mask, Entries, NumParts,
+/*ForOrder=*/true);
+  // No shuffled operands - ignore.
+  if (GatherShuffles.empty() && ExtractShuffles.empty())
+return std::nullopt;
   OrdersType CurrentOrder(NumScalars, NumScalars);
-  SmallVector Positions;
-  SmallBitVector UsedPositions(NumScalars);
-  const TreeEntry *STE = nullptr;
-  // Try to find all gathered scalars that are gets vectorized in other
-  // vectorize node. Here we can have only one single tree vector node to
-  // correctly identify order of the gathered scalars.
-  for (unsigned I = 0; I < NumScalars; ++I) {
-Value *V = TE.Scalars[I];
-if (!isa(V))
-  continue;
-if (const auto *LocalSTE = getTreeEntry(V)) {
-  if (!STE)
-STE = LocalSTE;
-  else if (STE != LocalSTE)
-// Take the order only from the single vector node.
-return std::nullopt;
-  unsigned Lane =
-  std::distance(STE->Scalars.begin(), find(STE->Scalars, V));
-  if (Lane >= NumScalars)
-return std::nullopt;
-  if (CurrentOrder[Lane] != NumScalars) {
-if (Lane != I)
+  if (GatherShuffles.size() == 1 &&
+  *GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&
+  Entries.front().front()->isSame(TE.Scalars)) {
+// Exclude nodes for strided geps from analysis, better to reorder them.
+if 

[Lldb-commits] [clang] [flang] [mlir] [libc] [llvm] [clang-tools-extra] [compiler-rt] [lldb] [OpenMP] atomic compare weak : Parser & AST support (PR #79475)

2024-01-29 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev approved this pull request.

LG

https://github.com/llvm/llvm-project/pull/79475
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libc] [clang-tools-extra] [openmp] [flang] [compiler-rt] [polly] [llvm] [libcxxabi] [mlir] [clang] [lldb] [libcxx] [OpenMP] Patch for Support to loop bind clause : Checking Parent Regi

2024-01-08 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev approved this pull request.

LG

https://github.com/llvm/llvm-project/pull/76938
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [libcxxabi] [libc] [flang] [llvm] [lldb] [mlir] [compiler-rt] [clang-tools-extra] [libcxx] [openmp] [clang] [OpenMP] Patch for Support to loop bind clause : Checking Parent Region (PR #

2024-01-08 Thread Alexey Bataev via lldb-commits


@@ -6124,35 +6136,39 @@ processImplicitMapsWithDefaultMappers(Sema , 
DSAStackTy *Stack,
 
 bool Sema::mapLoopConstruct(llvm::SmallVector ,
 ArrayRef Clauses,
-OpenMPBindClauseKind BindKind,
+OpenMPBindClauseKind ,
 OpenMPDirectiveKind ,
-OpenMPDirectiveKind ) {
+OpenMPDirectiveKind ,
+SourceLocation StartLoc, SourceLocation EndLoc,
+const DeclarationNameInfo ,
+OpenMPDirectiveKind CancelRegion) {
 
   bool UseClausesWithoutBind = false;
 
   // Restricting to "#pragma omp loop bind"
   if (getLangOpts().OpenMP >= 50 && Kind == OMPD_loop) {
+
+const OpenMPDirectiveKind ParentDirective = DSAStack->getParentDirective();
+
 if (BindKind == OMPC_BIND_unknown) {
   // Setting the enclosing teams or parallel construct for the loop
   // directive without bind clause.
   BindKind = OMPC_BIND_thread; // Default bind(thread) if binding is 
unknown
 
-  const OpenMPDirectiveKind ParentDirective =
-  DSAStack->getParentDirective();
   if (ParentDirective == OMPD_unknown) {
 Diag(DSAStack->getDefaultDSALocation(),
  diag::err_omp_bind_required_on_loop);
   } else if (ParentDirective == OMPD_parallel ||
- ParentDirective == OMPD_target_parallel) {
+ ParentDirective == OMPD_target_parallel)
 BindKind = OMPC_BIND_parallel;
-  } else if (ParentDirective == OMPD_teams ||
- ParentDirective == OMPD_target_teams) {
-BindKind = OMPC_BIND_teams;
-  }
+} else if (ParentDirective == OMPD_teams ||
+   ParentDirective == OMPD_target_teams) {
+  BindKind = OMPC_BIND_teams;

alexey-bataev wrote:

Please restore original code here

https://github.com/llvm/llvm-project/pull/76938
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [llvm] [clang] [OpenACC] Implement initial parsing for `parallel` construct (PR #72661)

2023-11-17 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev approved this pull request.


https://github.com/llvm/llvm-project/pull/72661
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [clang] [llvm] [OpenACC] Implement initial parsing for Construct/Directive Names (PR #72661)

2023-11-17 Thread Alexey Bataev via lldb-commits


@@ -0,0 +1,72 @@
+//===--- OpenACCKinds.h - OpenACC Enums -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+///
+/// \file
+/// Defines some OpenACC-specific enums and functions.
+///
+//===--===//
+
+#ifndef LLVM_CLANG_BASIC_OPENACCKINDS_H
+#define LLVM_CLANG_BASIC_OPENACCKINDS_H
+
+namespace clang {
+// Represents the Construct/Directive kind of a pragma directive. Note the
+// OpenACC standard is inconsistent between calling these Construct vs
+// Directive, but we're calling it a Directive to be consistent with OpenMP.
+enum class OpenACCDirectiveKind {
+  // Compute Constructs.
+  Parallel,

alexey-bataev wrote:

No, not neccesary. The first patch, that lands main infrastructure, better to 
have a small parsing functionality, the future patch(es) may include parsing of 
other stuff, if it small enough

https://github.com/llvm/llvm-project/pull/72661
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [lldb] [llvm] [OpenACC] Implement initial parsing for Construct/Directive Names (PR #72661)

2023-11-17 Thread Alexey Bataev via lldb-commits


@@ -10,18 +10,240 @@
 //
 
//===--===//
 
+#include "clang/Basic/OpenACCKinds.h"
 #include "clang/Parse/ParseDiagnostic.h"
 #include "clang/Parse/Parser.h"
+#include "clang/Parse/RAIIObjectsForParser.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
 
 using namespace clang;
+using namespace llvm;
+
+namespace {
+// An enum that contains the extended 'partial' parsed variants. This type
+// should never escape the initial parse functionality, but is useful for
+// simplifying the implementation.
+enum class OpenACCDirectiveKindEx {
+  Invalid = static_cast(OpenACCDirectiveKind::Invalid),
+  // 'enter data' and 'exit data'
+  Enter,
+  Exit,
+  // 'atomic read', 'atomic write', 'atomic update', and 'atomic capture'.
+  Atomic,
+};
+
+// Translate single-token string representations to the OpenACC Directive Kind.
+// This doesn't completely comprehend 'Compound Constructs' (as it just
+// identifies the first token), and doesn't fully handle 'enter data', 'exit
+// data', nor any of the 'atomic' variants, just the first token of each.  So
+// this should only be used by `ParseOpenACCDirectiveKind`.
+OpenACCDirectiveKindEx GetOpenACCDirectiveKind(StringRef Name) {
+  OpenACCDirectiveKind DirKind =
+  llvm::StringSwitch(Name)
+  .Case("parallel", OpenACCDirectiveKind::Parallel)
+  .Case("serial", OpenACCDirectiveKind::Serial)
+  .Case("kernels", OpenACCDirectiveKind::Kernels)
+  .Case("data", OpenACCDirectiveKind::Data)
+  .Case("host_data", OpenACCDirectiveKind::HostData)
+  .Case("loop", OpenACCDirectiveKind::Loop)
+  .Case("cache", OpenACCDirectiveKind::Cache)
+  .Case("declare", OpenACCDirectiveKind::Declare)
+  .Case("init", OpenACCDirectiveKind::Init)
+  .Case("shutdown", OpenACCDirectiveKind::Shutdown)
+  .Case("set", OpenACCDirectiveKind::Shutdown)
+  .Case("update", OpenACCDirectiveKind::Update)
+  .Case("wait", OpenACCDirectiveKind::Wait)
+  .Case("routine", OpenACCDirectiveKind::Routine)
+  .Default(OpenACCDirectiveKind::Invalid);
+
+  if (DirKind != OpenACCDirectiveKind::Invalid)
+return static_cast(DirKind);
+
+  return llvm::StringSwitch(Name)
+  .Case("enter", OpenACCDirectiveKindEx::Enter)
+  .Case("exit", OpenACCDirectiveKindEx::Exit)
+  .Case("atomic", OpenACCDirectiveKindEx::Atomic)
+  .Default(OpenACCDirectiveKindEx::Invalid);
+}
+
+// "enter data" and "exit data" are permitted as their own constructs. Handle
+// these, knowing the previous token is either 'enter' or 'exit'. The current
+// token should be the one after the "enter" or "exit".
+OpenACCDirectiveKind
+ParseOpenACCEnterExitDataDirective(Parser , Token FirstTok,
+   StringRef FirstTokSpelling,
+   OpenACCDirectiveKindEx ExtDirKind) {
+  Token SecondTok = P.getCurToken();
+  std::string SecondTokSpelling = P.getPreprocessor().getSpelling(SecondTok);
+
+  if (SecondTokSpelling != "data") {
+P.Diag(FirstTok, diag::err_acc_invalid_directive)
+<< 1 << FirstTokSpelling << SecondTokSpelling;
+return OpenACCDirectiveKind::Invalid;
+  }
+
+  P.ConsumeToken();
+  return ExtDirKind == OpenACCDirectiveKindEx::Enter
+ ? OpenACCDirectiveKind::EnterData
+ : OpenACCDirectiveKind::ExitData;
+}
+
+OpenACCDirectiveKind ParseOpenACCAtomicDirective(Parser ) {
+  Token AtomicClauseToken = P.getCurToken();
+  std::string AtomicClauseSpelling =
+  P.getPreprocessor().getSpelling(AtomicClauseToken);
+
+  OpenACCDirectiveKind DirKind =
+  llvm::StringSwitch(AtomicClauseSpelling)
+  .Case("read", OpenACCDirectiveKind::AtomicRead)
+  .Case("write", OpenACCDirectiveKind::AtomicWrite)
+  .Case("update", OpenACCDirectiveKind::AtomicUpdate)
+  .Case("capture", OpenACCDirectiveKind::AtomicCapture)
+  .Default(OpenACCDirectiveKind::Invalid);
+
+  if (DirKind == OpenACCDirectiveKind::Invalid)
+P.Diag(AtomicClauseToken, diag::err_acc_invalid_atomic_clause)
+<< AtomicClauseSpelling;
+
+  P.ConsumeToken();
+  return DirKind;
+}
+
+// Parse and consume the tokens for OpenACC Directive/Construct kinds.
+OpenACCDirectiveKind ParseOpenACCDirectiveKind(Parser ) {
+  Token FirstTok = P.getCurToken();
+  P.ConsumeToken();
+  std::string FirstTokSpelling = P.getPreprocessor().getSpelling(FirstTok);
+
+  OpenACCDirectiveKindEx ExDirKind = GetOpenACCDirectiveKind(FirstTokSpelling);
+
+  Token SecondTok = P.getCurToken();
+  // Go through the Extended kinds to see if we can convert this to the
+  // non-Extended kinds, and handle invalid.
+  switch (ExDirKind) {
+  case OpenACCDirectiveKindEx::Invalid:
+P.Diag(FirstTok, diag::err_acc_invalid_directive) << 0 << FirstTokSpelling;
+return OpenACCDirectiveKind::Invalid;
+  case OpenACCDirectiveKindEx::Enter:
+  

[Lldb-commits] [llvm] [clang] [lldb] [OpenACC] Implement initial parsing for Construct/Directive Names (PR #72661)

2023-11-17 Thread Alexey Bataev via lldb-commits


@@ -10,18 +10,240 @@
 //
 
//===--===//
 
+#include "clang/Basic/OpenACCKinds.h"
 #include "clang/Parse/ParseDiagnostic.h"
 #include "clang/Parse/Parser.h"
+#include "clang/Parse/RAIIObjectsForParser.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
 
 using namespace clang;
+using namespace llvm;
+
+namespace {
+// An enum that contains the extended 'partial' parsed variants. This type
+// should never escape the initial parse functionality, but is useful for
+// simplifying the implementation.
+enum class OpenACCDirectiveKindEx {
+  Invalid = static_cast(OpenACCDirectiveKind::Invalid),
+  // 'enter data' and 'exit data'
+  Enter,
+  Exit,
+  // 'atomic read', 'atomic write', 'atomic update', and 'atomic capture'.
+  Atomic,
+};
+
+// Translate single-token string representations to the OpenACC Directive Kind.
+// This doesn't completely comprehend 'Compound Constructs' (as it just
+// identifies the first token), and doesn't fully handle 'enter data', 'exit
+// data', nor any of the 'atomic' variants, just the first token of each.  So
+// this should only be used by `ParseOpenACCDirectiveKind`.
+OpenACCDirectiveKindEx GetOpenACCDirectiveKind(StringRef Name) {
+  OpenACCDirectiveKind DirKind =
+  llvm::StringSwitch(Name)
+  .Case("parallel", OpenACCDirectiveKind::Parallel)
+  .Case("serial", OpenACCDirectiveKind::Serial)
+  .Case("kernels", OpenACCDirectiveKind::Kernels)
+  .Case("data", OpenACCDirectiveKind::Data)
+  .Case("host_data", OpenACCDirectiveKind::HostData)
+  .Case("loop", OpenACCDirectiveKind::Loop)
+  .Case("cache", OpenACCDirectiveKind::Cache)
+  .Case("declare", OpenACCDirectiveKind::Declare)
+  .Case("init", OpenACCDirectiveKind::Init)
+  .Case("shutdown", OpenACCDirectiveKind::Shutdown)
+  .Case("set", OpenACCDirectiveKind::Shutdown)
+  .Case("update", OpenACCDirectiveKind::Update)
+  .Case("wait", OpenACCDirectiveKind::Wait)
+  .Case("routine", OpenACCDirectiveKind::Routine)
+  .Default(OpenACCDirectiveKind::Invalid);
+
+  if (DirKind != OpenACCDirectiveKind::Invalid)
+return static_cast(DirKind);
+
+  return llvm::StringSwitch(Name)
+  .Case("enter", OpenACCDirectiveKindEx::Enter)
+  .Case("exit", OpenACCDirectiveKindEx::Exit)
+  .Case("atomic", OpenACCDirectiveKindEx::Atomic)
+  .Default(OpenACCDirectiveKindEx::Invalid);
+}
+
+// "enter data" and "exit data" are permitted as their own constructs. Handle
+// these, knowing the previous token is either 'enter' or 'exit'. The current
+// token should be the one after the "enter" or "exit".
+OpenACCDirectiveKind
+ParseOpenACCEnterExitDataDirective(Parser , Token FirstTok,
+   StringRef FirstTokSpelling,
+   OpenACCDirectiveKindEx ExtDirKind) {
+  Token SecondTok = P.getCurToken();
+  std::string SecondTokSpelling = P.getPreprocessor().getSpelling(SecondTok);
+
+  if (SecondTokSpelling != "data") {
+P.Diag(FirstTok, diag::err_acc_invalid_directive)
+<< 1 << FirstTokSpelling << SecondTokSpelling;
+return OpenACCDirectiveKind::Invalid;
+  }
+
+  P.ConsumeToken();
+  return ExtDirKind == OpenACCDirectiveKindEx::Enter
+ ? OpenACCDirectiveKind::EnterData
+ : OpenACCDirectiveKind::ExitData;
+}
+
+OpenACCDirectiveKind ParseOpenACCAtomicDirective(Parser ) {
+  Token AtomicClauseToken = P.getCurToken();
+  std::string AtomicClauseSpelling =
+  P.getPreprocessor().getSpelling(AtomicClauseToken);
+
+  OpenACCDirectiveKind DirKind =
+  llvm::StringSwitch(AtomicClauseSpelling)
+  .Case("read", OpenACCDirectiveKind::AtomicRead)
+  .Case("write", OpenACCDirectiveKind::AtomicWrite)
+  .Case("update", OpenACCDirectiveKind::AtomicUpdate)
+  .Case("capture", OpenACCDirectiveKind::AtomicCapture)
+  .Default(OpenACCDirectiveKind::Invalid);
+
+  if (DirKind == OpenACCDirectiveKind::Invalid)
+P.Diag(AtomicClauseToken, diag::err_acc_invalid_atomic_clause)
+<< AtomicClauseSpelling;
+
+  P.ConsumeToken();
+  return DirKind;
+}
+
+// Parse and consume the tokens for OpenACC Directive/Construct kinds.
+OpenACCDirectiveKind ParseOpenACCDirectiveKind(Parser ) {
+  Token FirstTok = P.getCurToken();
+  P.ConsumeToken();
+  std::string FirstTokSpelling = P.getPreprocessor().getSpelling(FirstTok);
+
+  OpenACCDirectiveKindEx ExDirKind = GetOpenACCDirectiveKind(FirstTokSpelling);
+
+  Token SecondTok = P.getCurToken();
+  // Go through the Extended kinds to see if we can convert this to the
+  // non-Extended kinds, and handle invalid.
+  switch (ExDirKind) {
+  case OpenACCDirectiveKindEx::Invalid:
+P.Diag(FirstTok, diag::err_acc_invalid_directive) << 0 << FirstTokSpelling;
+return OpenACCDirectiveKind::Invalid;
+  case OpenACCDirectiveKindEx::Enter:
+  

[Lldb-commits] [flang] [lld] [libcxx] [clang] [libc] [lldb] [llvm] [clang-tools-extra] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-13 Thread Alexey Bataev via lldb-commits

https://github.com/alexey-bataev approved this pull request.


https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [lld] [libcxx] [clang] [libc] [lldb] [llvm] [clang-tools-extra] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-13 Thread Alexey Bataev via lldb-commits


@@ -3633,6 +3633,22 @@ static void RenderHLSLOptions(const ArgList , 
ArgStringList ,
 CmdArgs.push_back("-finclude-default-header");
 }
 
+static void RenderOpenACCOptions(const Driver , const ArgList ,
+ ArgStringList , types::ID InputType) {
+  if (!Args.hasArg(options::OPT_fopenacc))
+return;
+
+  CmdArgs.push_back("-fopenacc");
+
+  if (Arg *A = Args.getLastArg(options::OPT_openacc_macro_override)) {
+StringRef Value = A->getValue();
+if (llvm::find_if_not(Value, isdigit) == Value.end())
+  A->renderAsInput(Args, CmdArgs);
+else
+  D.Diag(diag::err_drv_clang_unsupported) << Value;

alexey-bataev wrote:

```suggestion
if (Value.getAsInteger(10, Version))
  A->renderAsInput(Args, CmdArgs);
else
  D.Diag(diag::err_drv_clang_unsupported) << Value;
```

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [llvm] [clang] [lld] [libcxx] [lldb] [clang-tools-extra] [libc] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-13 Thread Alexey Bataev via lldb-commits


@@ -4001,6 +4008,14 @@ bool CompilerInvocation::ParseLangArgs(LangOptions 
, ArgList ,
 (T.isNVPTX() || T.isAMDGCN()) &&
 Args.hasArg(options::OPT_fopenmp_cuda_mode);
 
+  // OpenACC Configuration.
+  if (Args.hasArg(options::OPT_fopenacc)) {
+Opts.OpenACC = true;

alexey-bataev wrote:

Maybe keep the version of the ACC in this value instead of having a separate 
string field, just like OpenMP does? 0 means th support is disabled, non-zero - 
supported version number.

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [clang] [libc] [flang] [clang-tools-extra] [llvm] [libcxx] [lld] [lldb] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-13 Thread Alexey Bataev via lldb-commits


@@ -1349,6 +1349,19 @@ def fno_hip_emit_relocatable : Flag<["-"], 
"fno-hip-emit-relocatable">,
   HelpText<"Do not override toolchain to compile HIP source to relocatable">;
 }
 
+// Clang specific/exclusive options for OpenACC.
+def openacc_macro_override

alexey-bataev wrote:

Why do you need this form? Is not EQ form is enough?

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [llvm] [clang] [lld] [libcxx] [lldb] [clang-tools-extra] [libc] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-13 Thread Alexey Bataev via lldb-commits


@@ -229,6 +230,9 @@ class Parser : public CodeCompletionHandler {
   /// Parsing OpenMP directive mode.
   bool OpenMPDirectiveParsing = false;
 
+  /// Parsing OpenACC directive mode.
+  bool OpenACCDirectiveParsing = false;

alexey-bataev wrote:

I think better to introduce it with the patch that actually relies on it, 
currently better to remove it.

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [llvm] [clang] [lld] [libcxx] [lldb] [clang-tools-extra] [libc] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-13 Thread Alexey Bataev via lldb-commits


@@ -229,6 +230,9 @@ class Parser : public CodeCompletionHandler {
   /// Parsing OpenMP directive mode.
   bool OpenMPDirectiveParsing = false;
 
+  /// Parsing OpenACC directive mode.
+  bool OpenACCDirectiveParsing = false;

alexey-bataev wrote:

I mean, it has value false only and is not changed anywhere

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [clang] [libc] [llvm] [clang-tools-extra] [libcxx] [lld] [flang] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-07 Thread Alexey Bataev via lldb-commits


@@ -0,0 +1,14 @@
+// RUN: %clang -S -### -fopenacc %s 2>&1 | FileCheck %s 
--check-prefix=CHECK-DRIVER
+// CHECK-DRIVER: "-cc1" {{.*}} "-fopenacc"
+
+// RUN: %clang -S -### -fopenacc -fexperimental-openacc-macro-override=202211 
%s 2>&1 | FileCheck %s --check-prefix=CHECK-MACRO-OVERRIDE
+// RUN: %clang -S -### -fopenacc -fexperimental-openacc-macro-override 202211 
%s 2>&1 | FileCheck %s --check-prefix=CHECK-MACRO-OVERRIDE

alexey-bataev wrote:

Why do you need this new option fexperimental-openacc-macro-override? Can you 
just rely on -D_OPENACC instead?

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [flang] [llvm] [lld] [libcxx] [lldb] [clang-tools-extra] [libc] [clang] [OpenACC] Initial commits to support OpenACC (PR #70234)

2023-11-07 Thread Alexey Bataev via lldb-commits


@@ -229,6 +230,9 @@ class Parser : public CodeCompletionHandler {
   /// Parsing OpenMP directive mode.
   bool OpenMPDirectiveParsing = false;
 
+  /// Parsing OpenACC directive mode.
+  bool OpenACCDirectiveParsing = false;

alexey-bataev wrote:

It is unused, need to remove it for now.

https://github.com/llvm/llvm-project/pull/70234
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits