================
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const 
SIMemOpInfo &MOI,
   return Changed;
 }
 
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
+  const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  IsaVersion IV = getIsaVersion(ST.getCPU());
+
+  bool Changed = false;
+
+  for (auto &MBB : MF) {
+    for (auto MI = MBB.begin(); MI != MBB.end();) {
+      MachineInstr &Inst = *MI;
+      ++MI;
+      if (Inst.mayLoadOrStore() == false)
+        continue;
+
+      // Todo: if next insn is an s_waitcnt
+      AMDGPU::Waitcnt Wait;
+
+      if (!(Inst.getDesc().TSFlags & SIInstrFlags::maybeAtomic)) {
+        if (TII->isSMRD(Inst)) {          // scalar
----------------
jwanggit86 wrote:

That's a valid point. However, even though it does similar work, I'd say the 
similarity to SIInsertWaitcnt is only to a certain extent. The differences 
include: (1) This is for a different purpose, i.e. to support the so-called 
"precise memory mode", in particular precise memory exceptions, for certain 
GPUs. (2) This feature is optional while SIInsertWaintcnt is not. (3) The 
counter values in SIInsertWaitcnt are precise, while in this features the 
counters are simply set to 0.
If performance is a concern, pls note that this feature is controlled by a 
command-line option which by default is off. The user has to explicitly give 
the option for it to work. We assume the user knows there's extra work for the 
compiler when the option is turned on.

https://github.com/llvm/llvm-project/pull/79236
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to