subject:"\[llvm\-branch\-commits\] \[llvm\] \[AMDGPU\]\[SILoadStoreOptimizer\] Merge constrained sloads \(PR #96162\)"

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-23 Thread Christudasan Devadasan via llvm-branch-commits


cdevadas wrote:

### Merge activity

* **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96162).


https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Jay Foad via llvm-branch-commits


https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Jay Foad via llvm-branch-commits



@@ -6,7 +6,7 @@ declare i32 @llvm.amdgcn.global.atomic.csub(ptr addrspace(1), 
i32)
 
 ; GCN-LABEL: {{^}}global_atomic_csub_rtn:
 ; PREGFX12: global_atomic_csub v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9:]+}}, 
s{{\[[0-9]+:[0-9]+\]}} glc
-; GFX12PLUS: global_atomic_sub_clamp_u32 v0, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
+; GFX12PLUS: global_atomic_sub_clamp_u32 v{{[0-9]+}}, v{{[0-9]+}}, 
v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} th:TH_ATOMIC_RETURN

jayfoad wrote:

You shouldn't need any changes in this file.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> > I still think it is terribly surprising all of the test diff shows up in 
> > this commit, and not the selection case
> 
> Because the selection support is done in the next PR of the review stack, 
> #96162. This patch takes care of choosing the right opcode while merging the 
> loads.

It's more the lack of vectorization of these loads in the IR that's surprising. 
Ideally we wouldn't have to rely on any of this post-isel load merging 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-17 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

> I still think it is terribly surprising all of the test diff shows up in this 
> commit, and not the selection case

Because the selection support is done in the next PR of the review stack, 
https://github.com/llvm/llvm-project/pull/96162. This patch takes care of 
choosing the right opcode while merging the loads.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-17 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm approved this pull request.

I still think it is terribly surprising all of the test diff shows up in this 
commit, and not the selection case 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas edited 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits



@@ -658,17 +658,17 @@ define amdgpu_kernel void 
@image_bvh_intersect_ray_nsa_reassign(ptr %p_node_ptr,
 ;
 ; GFX1013-LABEL: image_bvh_intersect_ray_nsa_reassign:
 ; GFX1013:   ; %bb.0:
-; GFX1013-NEXT:s_load_dwordx8 s[0:7], s[0:1], 0x24
+; GFX1013-NEXT:s_load_dwordx8 s[4:11], s[0:1], 0x24

cdevadas wrote:

> I guess this code changes because xnack is enabled by default for GFX10.1?
Yes.
> Is there anything we could do to add known alignment info here, to avoid the 
> code pessimization?
I'm not sure what can be done for it.



https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Jay Foad via llvm-branch-commits



@@ -1212,8 +1228,14 @@ void SILoadStoreOptimizer::copyToDestRegs(
 
   // Copy to the old destination registers.
   const MCInstrDesc  = TII->get(TargetOpcode::COPY);
-  const auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
-  const auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+  auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
+  auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+
+  // The constrained sload instructions in S_LOAD_IMM class will have
+  // `early-clobber` flag in the dst operand. Remove the flag before using the
+  // MOs in copies.
+  Dest0->setIsEarlyClobber(false);
+  Dest1->setIsEarlyClobber(false);

jayfoad wrote:

It's a bit ugly to modify in-place the operands of `CI.I` and `Paired.I`. But I 
guess it is harmless since they will be erased soon, when the merged load 
instruction is created.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Jay Foad via llvm-branch-commits



@@ -658,17 +658,17 @@ define amdgpu_kernel void 
@image_bvh_intersect_ray_nsa_reassign(ptr %p_node_ptr,
 ;
 ; GFX1013-LABEL: image_bvh_intersect_ray_nsa_reassign:
 ; GFX1013:   ; %bb.0:
-; GFX1013-NEXT:s_load_dwordx8 s[0:7], s[0:1], 0x24
+; GFX1013-NEXT:s_load_dwordx8 s[4:11], s[0:1], 0x24

jayfoad wrote:

I guess this code changes because xnack is enabled by default for GFX10.1? Is 
there anything we could do to add known alignment info here, to avoid the code 
pessimization?

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits


cdevadas wrote:

Ping

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-04 Thread Christudasan Devadasan via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

cdevadas wrote:

Opened https://github.com/llvm/llvm-project/issues/97715.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-04 Thread Matt Arsenault via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

arsenm wrote:

Can you open an issue for this 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Matt Arsenault via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

arsenm wrote:

LSV should have gotten this case, I don't see why it didn't. Someone should 
look into this 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas edited 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas edited 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

cdevadas wrote:

Unfortunately, that's not happening. The IR load-store-vectorizer doesn't 
combine the two loads.
I still see the two loads after the IR vectorizer and they become two loads in 
the selected code. Can this happen because the alignment for the two loads 
differ and the IR vectorizer safely ignores them?

*** IR Dump before Selection ***
define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) #0 {
  %local_atomic_fadd_v2bf16_noret.kernarg.segment = call nonnull align 16 
dereferenceable(44) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
  %ptr.kernarg.offset = getelementptr inbounds i8, ptr addrspace(4) 
%local_atomic_fadd_v2bf16_noret.kernarg.segment, i64 36, !amdgpu.uniform !0
  **%ptr.load = load ptr addrspace(3), ptr addrspace(4) %ptr.kernarg.offset**, 
align 4, !invariant.load !0
  %data.kernarg.offset = getelementptr inbounds i8, ptr addrspace(4) 
%local_atomic_fadd_v2bf16_noret.kernarg.segment, i64 40, !amdgpu.uniform !0
  **%data.load = load <2 x i16>, ptr addrspace(4) %data.kernarg.offset**, align 
8, !invariant.load !0
  %ret = call <2 x i16> @llvm.amdgcn.ds.fadd.v2bf16(ptr addrspace(3) %ptr.load, 
<2 x i16> %data.load)
  ret void
}
# *** IR Dump After selection ***:
# Machine code for function local_atomic_fadd_v2bf16_noret: IsSSA, 
TracksLiveness
Function Live Ins: $sgpr0_sgpr1 in %1

bb.0 (%ir-block.0):
  liveins: $sgpr0_sgpr1
  %1:sgpr_64(p4) = COPY $sgpr0_sgpr1
  %3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %1:sgpr_64(p4), 36, 0 :: 
(dereferenceable invariant load (s32) from %ir.ptr.kernarg.offset, addrspace 4)
  %4:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %1:sgpr_64(p4), 40, 0 :: 
(dereferenceable invariant load (s32) from %ir.data.kernarg.offset, align 8, 
addrspace 4)
  %5:vgpr_32 = COPY %3:sreg_32_xm0_xexec
  %6:vgpr_32 = COPY %4:sreg_32_xm0_xexec
  DS_PK_ADD_BF16 killed %5:vgpr_32, killed %6:vgpr_32, 0, 0, implicit $m0, 
implicit $exec
  S_ENDPGM 0


https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Matt Arsenault via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

arsenm wrote:

But surely we aren't merging *this* many scalar loads in MIR? The IR vectorizer 
should have gotten most of these?

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas edited 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

cdevadas wrote:

Earlier I wrongly used the dword size (Width) in the the alignment check here 
as Jay pointed out. Now, I fixed it to use Byte size while comparing it with 
the existing alignment of the first load.
https://github.com/llvm/llvm-project/pull/96162/commits/e7e6cbc4abd476a038fd7836e5078565e73d1fe9#diff-35f4d1b6c4c17815f6989f86abbac2e606ca760f9d93f501ff503449048bf760R1730

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Matt Arsenault via llvm-branch-commits



@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr 
addrspace(3) %ptr, <2 x half>
 define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) 
%ptr, <2 x i16> %data) {
 ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret:
 ; GFX940:   ; %bb.0:
-; GFX940-NEXT:s_load_dwordx2 s[0:1], s[0:1], 0x24
+; GFX940-NEXT:s_load_dwordx2 s[2:3], s[0:1], 0x24

arsenm wrote:

Why does this patch have so many test diffs? Before this patch, we would have 
just missed out on a few folds after the _ec variants were introduced? 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Jay Foad via llvm-branch-commits



@@ -1700,19 +1722,29 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
 case 8:
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
-  case S_LOAD_IMM:
+  case S_LOAD_IMM: {
+// If XNACK is enabled, use the constrained opcodes when the first load is
+// under-aligned.
+const MachineMemOperand *MMO = *CI.I->memoperands_begin();
+bool NeedsConstrainedOpc =
+STM->isXNACKEnabled() && MMO->getAlign().value() < (Width << 2);

jayfoad wrote:

```suggestion
STM->isXNACKEnabled() && MMO->getAlign().value() < Width * 4;
```

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-01 Thread Jay Foad via llvm-branch-commits



@@ -1700,19 +1725,30 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
 case 8:
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
-  case S_LOAD_IMM:
+  case S_LOAD_IMM: {
+// If XNACK is enabled, use the constrained opcodes when the first load is
+// under-aligned.
+const MachineMemOperand *MMO = *CI.I->memoperands_begin();
+auto NeedsConstrainedOpc = [, Width](const GCNSubtarget ) {
+  return ST.isXNACKEnabled() && MMO->getAlign().value() < Width;

jayfoad wrote:

This doesn't look right since `Width` is in units of dwords here.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-01 Thread Jay Foad via llvm-branch-commits



@@ -1212,8 +1228,17 @@ void SILoadStoreOptimizer::copyToDestRegs(
 
   // Copy to the old destination registers.
   const MCInstrDesc  = TII->get(TargetOpcode::COPY);
-  const auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
-  const auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+  auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
+  auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+
+  // The constrained sload instructions in S_LOAD_IMM class will have
+  // `early-clobber` flag in the dst operand. Remove the flag before using the
+  // MOs in copies.
+  if (Dest0->isEarlyClobber())
+Dest0->setIsEarlyClobber(false);
+
+  if (Dest1->isEarlyClobber())
+Dest1->setIsEarlyClobber(false);

jayfoad wrote:

```suggestion
  Dest0->setIsEarlyClobber(false);
  Dest1->setIsEarlyClobber(false);
```

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-01 Thread Jay Foad via llvm-branch-commits



@@ -1700,19 +1725,30 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
 case 8:
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
-  case S_LOAD_IMM:
+  case S_LOAD_IMM: {
+// If XNACK is enabled, use the constrained opcodes when the first load is
+// under-aligned.
+const MachineMemOperand *MMO = *CI.I->memoperands_begin();
+auto NeedsConstrainedOpc = [, Width](const GCNSubtarget ) {

jayfoad wrote:

This doesn't need to be a lambda. It is always called, with identical 
arguments. Just calculate the result as a `bool` here.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

rampitec wrote:

> > currently the alignment is picked from the first MMO and that'd definitely 
> > be smaller than the natural align requirement for the new load
> 
> You don't know that - the alignment in the first MMO will be whatever 
> alignment the compiler could deduce, which could be large, e.g. if the 
> pointer used for the first load was known to have a large alignment.

Moreover, it can easily be as large as a page. In a case of scalar load and 
kernarg.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Christudasan Devadasan via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

cdevadas wrote:

> > currently the alignment is picked from the first MMO and that'd definitely 
> > be smaller than the natural align requirement for the new load
> 
> You don't know that - the alignment in the first MMO will be whatever 
> alignment the compiler could deduce, which could be large, e.g. if the 
> pointer used for the first load was known to have a large alignment.

Are you suggesting to check the alignment in the first MMO and see if it is 
still the preferred alignment for the merge-load? 
Use the _ec if the alignment is found to be smaller than the expected value.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Jay Foad via llvm-branch-commits


https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Jay Foad via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

jayfoad wrote:

> currently the alignment is picked from the first MMO and that'd definitely be 
> smaller than the natural align requirement for the new load

You don't know that - the alignment in the first MMO will be whatever alignment 
the compiler could deduce, which could be large.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-23 Thread Christudasan Devadasan via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

cdevadas wrote:

I guess currently the merged load is always under-aligned. While combining the 
MMOs, currently the alignment is picked from the first MMO and that'd 
definitely be smaller than the natural align requirement for the new load. 
That's one reason I conservatively want to emit _ec equivalent when XNACK is 
enabled.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Stanislav Mekhanoshin via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

rampitec wrote:

You can check alignment on the first load if MMO is available and avoid 
producing _ec version if it is sufficient.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Jay Foad via llvm-branch-commits



@@ -967,6 +967,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasLDSFPAtomicAddF32() const { return GFX8Insts; }
   bool hasLDSFPAtomicAddF64() const { return GFX90AInsts; }
+  bool hasXnackReplay() const { return GFX8Insts; }

jayfoad wrote:

We already have a field SupportsXNACK for this which is hooked up to the 
"xnack-support" target feature.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Jay Foad via llvm-branch-commits


jayfoad wrote:

> > This looks like it is affecting codegen even when xnack is disabled? That 
> > should not happen.
> 
> It shouldn't. I put the xnack replay subtarget check before using *_ec 
> equivalents. See the code here: 
> [65eb443#diff-35f4d1b6c4c17815f6989f86abbac2e606ca760f9d93f501ff503449048bf760R1735](https://github.com/llvm/llvm-project/commit/65eb44327cf32a83dbbf13eb70f9d8c03f3efaef#diff-35f4d1b6c4c17815f6989f86abbac2e606ca760f9d93f501ff503449048bf760R1735)

You're checking `STI->hasXnackReplay()` which is true on all GFX8+ targets. You 
should be checking whether xnack support is enabled with 
`STI->isXNACKEnabled()`.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Matt Arsenault via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

arsenm wrote:

One switch and move the condition inside each size case?

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Matt Arsenault via llvm-branch-commits



@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo ,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {

arsenm wrote:

STI should never be null. The conservative default would be to assume ec if it 
were possible 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

I'm still not sure why we have so much in this pass. The load and store 
vectorization should have happened in the IR. This pass originally was for the 
multi offset DS instructions 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

> This looks like it is affecting codegen even when xnack is disabled? That 
> should not happen.

It shouldn't. I put the xnack replay subtarget check before using *_ec 
equivalents. See the code here: 
https://github.com/llvm/llvm-project/commit/65eb44327cf32a83dbbf13eb70f9d8c03f3efaef#diff-35f4d1b6c4c17815f6989f86abbac2e606ca760f9d93f501ff503449048bf760R1735

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Jay Foad via llvm-branch-commits


jayfoad wrote:

This looks like it is affecting codegen even when xnack is disabled? That 
should not happen.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Christudasan Devadasan (cdevadas)


Changes

Consider the constrained multi-dword loads while merging
individual loads to a single multi-dword load.

---

Patch is 1023.60 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/96162.diff


116 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+1) 
- (modified) llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (+63-16) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll (+84-85) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fp-atomics-gfx940.ll (+6-6) 
- (modified) 
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.csub.ll (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll 
(+51-51) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.set.inactive.ll 
(+50-50) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.update.dpp.ll 
(+42-42) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll (+102-102) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll (+65-65) 
- (modified) llvm/test/CodeGen/AMDGPU/add.v2i16.ll (+21-21) 
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+204-204) 
- (modified) llvm/test/CodeGen/AMDGPU/atomics_cond_sub.ll (+24-24) 
- (modified) llvm/test/CodeGen/AMDGPU/bfe-patterns.ll (+20-20) 
- (modified) llvm/test/CodeGen/AMDGPU/bfm.ll (+5-5) 
- (modified) llvm/test/CodeGen/AMDGPU/bitreverse.ll (+157-157) 
- (modified) llvm/test/CodeGen/AMDGPU/build_vector.ll (+25-25) 
- (modified) llvm/test/CodeGen/AMDGPU/calling-conventions.ll (+25-25) 
- (modified) llvm/test/CodeGen/AMDGPU/cluster_stores.ll (+60-60) 
- (modified) llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll (+10-10) 
- (modified) llvm/test/CodeGen/AMDGPU/ctlz.ll (+227-227) 
- (modified) llvm/test/CodeGen/AMDGPU/ctlz_zero_undef.ll (+223-223) 
- (modified) llvm/test/CodeGen/AMDGPU/ctpop16.ll (+52-48) 
- (modified) llvm/test/CodeGen/AMDGPU/ctpop64.ll (+24-24) 
- (modified) llvm/test/CodeGen/AMDGPU/cttz.ll (+185-185) 
- (modified) llvm/test/CodeGen/AMDGPU/cttz_zero_undef.ll (+212-212) 
- (modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+331-334) 
- (modified) llvm/test/CodeGen/AMDGPU/divergence-driven-buildvector.ll (+24-24) 
- (modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+8-8) 
- (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-f16.ll (+25-25) 
- (modified) llvm/test/CodeGen/AMDGPU/fabs.f16.ll (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/fabs.ll (+17-17) 
- (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+31-32) 
- (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f32.ll (+33-33) 
- (modified) llvm/test/CodeGen/AMDGPU/fdiv.ll (+143-143) 
- (modified) llvm/test/CodeGen/AMDGPU/flat_atomics.ll (+20-20) 
- (modified) llvm/test/CodeGen/AMDGPU/flat_atomics_i32_system.ll (+56-56) 
- (modified) llvm/test/CodeGen/AMDGPU/fma-combine.ll (+88-88) 
- (modified) llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll (+9-9) 
- (modified) llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll (+240-240) 
- (modified) llvm/test/CodeGen/AMDGPU/fnearbyint.ll (+9-9) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll (+23-23) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+17-17) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg.ll (+13-13) 
- (modified) llvm/test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll (+26-26) 
- (modified) llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll (+22-22) 
- (modified) llvm/test/CodeGen/AMDGPU/fp-classify.ll (+13-13) 
- (modified) llvm/test/CodeGen/AMDGPU/fp-min-max-buffer-atomics.ll (+33-33) 
- (modified) llvm/test/CodeGen/AMDGPU/fp-min-max-buffer-ptr-atomics.ll (+46-46) 
- (modified) llvm/test/CodeGen/AMDGPU/fp16_to_fp32.ll (+20-20) 
- (modified) llvm/test/CodeGen/AMDGPU/fp16_to_fp64.ll (+20-20) 
- (modified) llvm/test/CodeGen/AMDGPU/fp32_to_fp16.ll (+20-20) 
- (modified) llvm/test/CodeGen/AMDGPU/fp64-min-max-buffer-atomics.ll (+24-24) 
- (modified) llvm/test/CodeGen/AMDGPU/fp64-min-max-buffer-ptr-atomics.ll 
(+38-38) 
- (modified) llvm/test/CodeGen/AMDGPU/fp_to_sint.ll (+19-19) 
- (modified) llvm/test/CodeGen/AMDGPU/fp_to_uint.ll (+19-19) 
- (modified) llvm/test/CodeGen/AMDGPU/fshl.ll (+44-44) 
- (modified) llvm/test/CodeGen/AMDGPU/fshr.ll (+20-20) 
- (modified) llvm/test/CodeGen/AMDGPU/global_atomics.ll (+24-24) 
- (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i32_system.ll (+98-98) 
- (modified) llvm/test/CodeGen/AMDGPU/half.ll (+41-41) 
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+88-88) 
- (modified) llvm/test/CodeGen/AMDGPU/insert_waitcnt_for_precise_memory.ll 
(+124-124) 
- (modified) llvm/test/CodeGen/AMDGPU/kernel-args.ll (+53-53) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pkrtz.ll (+15-15) 
- (modified)

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas ready_for_review 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits


cdevadas wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/96162?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#96163** https://app.graphite.dev/github/pr/llvm/llvm-project/96163?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#96162** https://app.graphite.dev/github/pr/llvm/llvm-project/96162?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#96161** https://app.graphite.dev/github/pr/llvm/llvm-project/96161?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @cdevadas and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

41 matches

Mail list logo