[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits


@@ -725,18 +739,22 @@ bool YAMLProfileReader::inferStaleProfile(
   const BinaryFunction::BasicBlockOrderType BlockOrder(
   BF.getLayout().block_begin(), BF.getLayout().block_end());
 
+  // Tracks the number of matched blocks.
+  uint64_t MatchedBlocks;

WenleiHe wrote:

nit: use explicit initialization to be safe. 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-14 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

Looks fairly straightforward with those prerequisites.

https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 01/11] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 02/11] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 03/11] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 04/11] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of 

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 01/10] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 02/10] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 03/10] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 04/10] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of 

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-14 Thread Ahmed Bougacha via llvm-branch-commits

https://github.com/ahmedbougacha updated 
https://github.com/llvm/llvm-project/pull/94394

>From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha 
Date: Tue, 4 Jun 2024 12:41:47 -0700
Subject: [PATCH 1/8] [Support] Reformat SipHash.cpp to match libSupport.

While there, give it our usual file header and an acknowledgement,
and remove the imported README.md.SipHash.
---
 llvm/lib/Support/README.md.SipHash | 126 --
 llvm/lib/Support/SipHash.cpp   | 264 ++---
 2 files changed, 129 insertions(+), 261 deletions(-)
 delete mode 100644 llvm/lib/Support/README.md.SipHash

diff --git a/llvm/lib/Support/README.md.SipHash 
b/llvm/lib/Support/README.md.SipHash
deleted file mode 100644
index 4de3cd1854681..0
--- a/llvm/lib/Support/README.md.SipHash
+++ /dev/null
@@ -1,126 +0,0 @@
-# SipHash
-
-[![License:
-CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
-
-[![License: 
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-
-
-SipHash is a family of pseudorandom functions (PRFs) optimized for speed on 
short messages.
-This is the reference C code of SipHash: portable, simple, optimized for 
clarity and debugging.
-
-SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
-and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
-DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
-
-SipHash is:
-
-* *Simpler and faster* on short messages than previous cryptographic
-algorithms, such as MACs based on universal hashing.
-
-* *Competitive in performance* with insecure non-cryptographic algorithms, 
such as [fhhash](https://github.com/cbreeden/fxhash).
-
-* *Cryptographically secure*, with no sign of weakness despite multiple 
[cryptanalysis](https://eprint.iacr.org/2019/865) 
[projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
-
-* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
-FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL 
libcrypto,
-Sodium, etc.) and applications (Wireguard, Redis, etc.).
-
-As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can 
also be used as a secure message authentication code (MAC).
-But SipHash is *not a hash* in the sense of general-purpose key-less hash 
function such as BLAKE3 or SHA-3.
-SipHash should therefore always be used with a secret key in order to be 
secure.
-
-
-## Variants
-
-The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 
compression
-rounds, 4 finalization rounds, and returns a 64-bit tag.
-
-Variants can use a different number of rounds. For example, we proposed 
*SipHash-4-8* as a conservative version.
-
-The following versions are not described in the paper but were designed and 
analyzed to fulfill applications' needs:
-
-* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with 
specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
-
-* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
-and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
-compression rounds, 4 finalization rounds, and returns a 32-bit tag.
-
-
-## Security
-
-(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the 
maximum PRF
-security for any function with the same key and output size.
-
-The standard PRF security goal allow the attacker access to the output of 
SipHash on messages chosen adaptively by the attacker.
-
-Security is limited by the key size (128 bits for SipHash), such that
-attackers searching 2*s* keys have chance 2*s*−128 of 
finding
-the SipHash key. 
-Security is also limited by the output size. In particular, when
-SipHash is used as a MAC, an attacker who blindly tries 2*s* tags 
will
-succeed with probability 2*s*-*t*, if *t* is that tag's bit size.
-
-
-## Research
-
-* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a 
fast short-input PRF" (accepted at INDOCRYPT 2012)
-* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation 
of SipHash at INDOCRYPT 2012 (Bernstein)
-* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the 
presentation of SipHash at the DIAC workshop (Aumasson)
-
-
-## Usage
-
-Running
-
-```sh
-  make
-```
-
-will build tests for 
-
-* SipHash-2-4-64
-* SipHash-2-4-128
-* HalfSipHash-2-4-32
-* HalfSipHash-2-4-64
-
-
-```C
-  ./test
-```
-
-verifies 64 test vectors, and
-
-```C
-  ./debug
-```
-
-does the same and prints intermediate values.
-
-The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
-with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
-or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
-to many compilers such as below.
-
-```sh
-gcc -Wall 

[llvm-branch-commits] [llvm] [Support] Add SipHash-based 16-bit ptrauth stable hash. (PR #93902)

2024-06-14 Thread Ahmed Bougacha via llvm-branch-commits

https://github.com/ahmedbougacha updated 
https://github.com/llvm/llvm-project/pull/93902

>From bf413d68cff5ad963c43bb584590908bf03bc3ce Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha 
Date: Tue, 4 Jun 2024 12:36:33 -0700
Subject: [PATCH] [Support] Add SipHash-based 16-bit ptrauth stable hash.

This finally wraps the now-lightly-modified SipHash C reference
implementation, for the main interface we need (16-bit ptrauth
discriminators).

This intentionally doesn't expose a raw interface beyond that to
encourage others to carefully consider their use.

The exact algorithm is the little-endian interpretation of the
non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using the
constant seed `b5d4c9eb79104a796fec8b1b428781d4` (big-endian), with the
result reduced by modulo to the range of non-zero discriminators (i.e.
`(rawHash % 65535) + 1`).

By "stable" we mean that the result of this hash algorithm will the same
across different compiler versions and target platforms.

The 16-bit hashes are used extensively for the AArch64 ptrauth ABI,
because AArch64 can efficiently load a 16-bit immediate into the high
bits of a register without disturbing the remainder of the value, which
serves as a nice blend operation.

16 bits is also sufficiently compact to not inflate a loader relocation.
We disallow zero to guarantee a different discriminator from the places
in the ABI that use a constant zero.

Co-Authored-By: John McCall 
---
 llvm/include/llvm/Support/SipHash.h| 39 ++
 llvm/lib/Support/SipHash.cpp   | 35 +++
 llvm/unittests/Support/CMakeLists.txt  |  1 +
 llvm/unittests/Support/SipHashTest.cpp | 33 ++
 4 files changed, 108 insertions(+)
 create mode 100644 llvm/include/llvm/Support/SipHash.h
 create mode 100644 llvm/unittests/Support/SipHashTest.cpp

diff --git a/llvm/include/llvm/Support/SipHash.h 
b/llvm/include/llvm/Support/SipHash.h
new file mode 100644
index 0..91447b2344eeb
--- /dev/null
+++ b/llvm/include/llvm/Support/SipHash.h
@@ -0,0 +1,39 @@
+//===--- SipHash.h - An ABI-stable string SipHash ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// A family of ABI-stable string hash algorithms based on SipHash, currently
+// used to compute ptrauth discriminators.
+//
+//===--===//
+
+#ifndef LLVM_SUPPORT_SIPHASH_H
+#define LLVM_SUPPORT_SIPHASH_H
+
+#include 
+
+namespace llvm {
+class StringRef;
+
+/// Compute a stable non-zero 16-bit hash of the given string.
+///
+/// The exact algorithm is the little-endian interpretation of the
+/// non-doubled (i.e. 64-bit) result of applying a SipHash-2-4 using
+/// a specific seed value which can be found in the source.
+/// This 64-bit result is truncated to a non-zero 16-bit value.
+///
+/// We use a 16-bit discriminator because ARM64 can efficiently load
+/// a 16-bit immediate into the high bits of a register without disturbing
+/// the remainder of the value, which serves as a nice blend operation.
+/// 16 bits is also sufficiently compact to not inflate a loader relocation.
+/// We disallow zero to guarantee a different discriminator from the places
+/// in the ABI that use a constant zero.
+uint16_t getPointerAuthStableSipHash(StringRef S);
+
+} // end namespace llvm
+
+#endif
diff --git a/llvm/lib/Support/SipHash.cpp b/llvm/lib/Support/SipHash.cpp
index ef882ae4d8745..b1b4bede7637d 100644
--- a/llvm/lib/Support/SipHash.cpp
+++ b/llvm/lib/Support/SipHash.cpp
@@ -5,10 +5,23 @@
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 
//===--===//
+//
+//  This file implements an ABI-stable string hash based on SipHash, used to
+//  compute ptrauth discriminators.
+//
+//===--===//
 
+#include "llvm/Support/SipHash.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/Debug.h"
 #include 
 
+using namespace llvm;
+
+#define DEBUG_TYPE "llvm-siphash"
+
 // Lightly adapted from the SipHash reference C implementation by
 // Jean-Philippe Aumasson and Daniel J. Bernstein.
 
@@ -133,3 +146,25 @@ static inline ResultTy siphash(const unsigned char *in, 
uint64_t inlen,
 
   return firstHalf | (ResultTy(secondHalf) << (sizeof(ResultTy) == 8 ? 0 : 
64));
 }
+
+//===--- LLVM-specific wrapper around siphash.
+
+/// Compute an ABI-stable 16-bit hash of the given string.
+uint16_t llvm::getPointerAuthStableSipHash(StringRef Str) {
+  static const uint8_t K[16] = {0xb5, 0xd4, 0xc9, 0xeb, 0x79, 0x10, 0x4a, 0x79,
+   

[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread Joe Nash via llvm-branch-commits


@@ -1608,14 +1598,14 @@ defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX", 
"int_amdgcn_flat_atomic_fmax
 }
 
 let OtherPredicates = [isGFX10Only] in {
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMIN_X2", 
"atomic_load_fmin_global", f64>;
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMAX_X2", 
"atomic_load_fmax_global", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMIN_X2", 
"int_amdgcn_global_atomic_fmin", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMAX_X2", 
"int_amdgcn_global_atomic_fmax", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN_X2", "atomic_load_fmin_flat", 
f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX_X2", "atomic_load_fmax_flat", 
f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMIN_X2", 
"int_amdgcn_flat_atomic_fmin", f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX_X2", 
"int_amdgcn_flat_atomic_fmax", f64>;
+defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", 
"atomic_load_fmin_global", f64>;

Sisyph wrote:

Can you deduplicate these somehow with the patterns at L1641? They look 
essentially the same, just with a different predicate. Otherwise LGTM

https://github.com/llvm/llvm-project/pull/95591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread Joe Nash via llvm-branch-commits

https://github.com/Sisyph edited https://github.com/llvm/llvm-project/pull/95591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread Joe Nash via llvm-branch-commits

https://github.com/Sisyph approved this pull request.


https://github.com/llvm/llvm-project/pull/95591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/9] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/9] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/9] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/9] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-14 Thread Krzysztof Drewniak via llvm-branch-commits


@@ -1582,33 +1603,33 @@ let OtherPredicates = [isGFX12Plus] in {
   }
 }
 
-let OtherPredicates = [isGFX10Plus] in {
+let SubtargetPredicate = HasAtomicFMinFMaxF32GlobalInsts, OtherPredicates = 
[HasFlatGlobalInsts] in {
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMIN", "atomic_load_fmin_global", 
f32>;
 defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMAX", "atomic_load_fmax_global", 
f32>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_flat", f32>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_flat", f32>;
-}
-
-let OtherPredicates = [isGFX10GFX11] in {
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMIN", 
"int_amdgcn_global_atomic_fmin", f32>;
 defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMAX", 
"int_amdgcn_global_atomic_fmax", f32>;
+}
 
+let SubtargetPredicate = HasAtomicFMinFMaxF32FlatInsts in {
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN", "atomic_load_fmin_flat", f32>;
+defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX", "atomic_load_fmax_flat", f32>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMIN", 
"int_amdgcn_flat_atomic_fmin", f32>;
 defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX", 
"int_amdgcn_flat_atomic_fmax", f32>;
 }
 
-let OtherPredicates = [isGFX10Only] in {
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", 
"atomic_load_fmin_global", f64>;
-defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MAX_F64", 
"atomic_load_fmax_global", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MIN_F64", 
"int_amdgcn_global_atomic_fmin", f64>;
-defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_MAX_F64", 
"int_amdgcn_global_atomic_fmax", f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MIN_F64", "atomic_load_fmin_flat", 
f64>;
-defm : FlatSignedAtomicPat <"FLAT_ATOMIC_MAX_F64", "atomic_load_fmax_flat", 
f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MIN_F64", 
"int_amdgcn_flat_atomic_fmin", f64>;
-defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_MAX_F64", 
"int_amdgcn_flat_atomic_fmax", f64>;
-}
+// let OtherPredicates = [isGFX10Only] in { // fixme

krzysz00 wrote:

Why commented out?

https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-14 Thread Krzysztof Drewniak via llvm-branch-commits

https://github.com/krzysz00 edited 
https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-14 Thread Krzysztof Drewniak via llvm-branch-commits

https://github.com/krzysz00 commented:

I'm not seeing anything obviously wrong here, but I don't know if I'm the right 
person to approve this in

https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-14 Thread Krzysztof Drewniak via llvm-branch-commits

https://github.com/krzysz00 approved this pull request.


https://github.com/llvm/llvm-project/pull/95593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung edited 
https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung edited 
https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung edited 
https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/8] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/8] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/8] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/8] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/7] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/7] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/7] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/7] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95396

>From 0ef98ac6c1858ec0e35cb0f1c293d5934f96b3ad Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 10 Jun 2024 19:48:13 +0200
Subject: [PATCH] AMDGPU: Remove ds atomic fadd intrinsics

These have been replaced with atomicrmw fadd
---
 clang/lib/CodeGen/CGBuiltin.cpp   |   2 +-
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   5 -
 llvm/lib/IR/AutoUpgrade.cpp   |  92 --
 llvm/lib/Target/AMDGPU/AMDGPUInstructions.td  |   1 -
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 -
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |   3 -
 llvm/lib/Target/AMDGPU/DSInstructions.td  |  10 -
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  15 -
 llvm/test/Bitcode/amdgcn-atomic.ll| 136 +
 .../AMDGPU/GlobalISel/fp-atomics-gfx940.ll|  55 
 .../AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll  | 125 +---
 .../AMDGPU/GlobalISel/llvm.amdgcn.ds.fadd.ll  | 279 --
 .../test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll | 102 ---
 llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll |  97 --
 .../CodeGen/AMDGPU/fp64-atomics-gfx90a.ll | 125 +---
 llvm/test/CodeGen/AMDGPU/lds-atomic-fadd.ll   |  25 --
 18 files changed, 232 insertions(+), 848 deletions(-)
 delete mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.fadd.ll
 delete mode 100644 llvm/test/CodeGen/AMDGPU/lds-atomic-fadd.ll

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d81cf40c912de..34d7e59ca45fd 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -19084,7 +19084,7 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_fadd_* builtins do not have syncscope/order arguments.
+  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
   SSID = llvm::SyncScope::System;
   AO = AtomicOrdering::SequentiallyConsistent;
 
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 1d5b360742059..0a4dd7a4725db 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -571,7 +571,6 @@ def int_amdgcn_ds_ordered_swap : AMDGPUDSOrderedIntrinsic;
 def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic;
 def int_amdgcn_ds_consume : AMDGPUDSAppendConsumedIntrinsic;
 
-def int_amdgcn_ds_fadd : AMDGPULDSIntrin;
 def int_amdgcn_ds_fmin : AMDGPULDSIntrin;
 def int_amdgcn_ds_fmax : AMDGPULDSIntrin;
 
@@ -2970,10 +2969,6 @@ multiclass AMDGPUMFp8SmfmacIntrinsic {
 // bf16 atomics use v2i16 argument since there is no bf16 data type in the 
llvm.
 def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUAtomicRtn;
 def int_amdgcn_flat_atomic_fadd_v2bf16   : AMDGPUAtomicRtn;
-def int_amdgcn_ds_fadd_v2bf16 : DefaultAttrsIntrinsic<
-[llvm_v2i16_ty],
-[LLVMQualPointerType<3>, llvm_v2i16_ty],
-[IntrArgMemOnly, NoCapture>]>;
 
 defset list AMDGPUMFMAIntrinsics940 = {
 def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 2f4b8351e747a..29310ad79ef70 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1033,6 +1033,12 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *,
 break; // No other 'amdgcn.atomic.*'
   }
 
+  if (Name.starts_with("ds.fadd")) {
+// Replaced with atomicrmw fadd, so there's no new declaration.
+NewFn = nullptr;
+return true;
+  }
+
   if (Name.starts_with("ldexp.")) {
 // Target specific intrinsic became redundant
 NewFn = Intrinsic::getDeclaration(
@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, 
CallBase *CI, Function *F,
   llvm_unreachable("Unknown function for ARM CallBase upgrade.");
 }
 
+// These are expected to have the arguments:
+// atomic.intrin (ptr, rmw_value, ordering, scope, isVolatile)
+//
+// Except for int_amdgcn_ds_fadd_v2bf16 which only has (ptr, rmw_value).
+//
 static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, CallBase *CI,
  Function *F, IRBuilder<> ) {
-  const bool IsInc = Name.starts_with("atomic.inc.");
-  if (IsInc || Name.starts_with("atomic.dec.")) {
-if (CI->getNumOperands() != 6) // Malformed bitcode.
-  return nullptr;
+  AtomicRMWInst::BinOp RMWOp =
+  StringSwitch(Name)
+  .StartsWith("ds.fadd", AtomicRMWInst::FAdd)
+  .StartsWith("atomic.inc.", AtomicRMWInst::UIncWrap)
+  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap);
+
+  unsigned NumOperands = CI->getNumOperands();
+  if 

[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95395

>From 0bfa259e0ec5f98261a7f84a8f0fe8248cd0e2fe Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 10 Jun 2024 19:40:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins

We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.

This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 113 +++---
 clang/test/CodeGenCUDA/builtins-amdgcn.cu |   2 +-
 .../test/CodeGenCUDA/builtins-spirv-amdgcn.cu |   2 +-
 .../builtins-unsafe-atomics-gfx90a.cu |   5 +-
 ...tins-unsafe-atomics-spirv-amdgcn-gfx90a.cu |   2 +-
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  |  37 +-
 .../builtins-fp-atomics-gfx12.cl  |  14 ++-
 .../CodeGenOpenCL/builtins-fp-atomics-gfx8.cl |   9 +-
 .../builtins-fp-atomics-gfx90a.cl |   4 +-
 .../builtins-fp-atomics-gfx940.cl |  10 +-
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   3 +-
 11 files changed, 139 insertions(+), 62 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 511e1fd4016d7..d81cf40c912de 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18140,9 +18140,35 @@ void CodeGenFunction::ProcessOrderScopeAMDGCN(Value 
*Order, Value *Scope,
 break;
   }
 
+  // Some of the atomic builtins take the scope as a string name.
   StringRef scp;
-  llvm::getConstantStringInfo(Scope, scp);
-  SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+  if (llvm::getConstantStringInfo(Scope, scp)) {
+SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+return;
+  }
+
+  // Older builtins had an enum argument for the memory scope.
+  int scope = cast(Scope)->getZExtValue();
+  switch (scope) {
+  case 0: // __MEMORY_SCOPE_SYSTEM
+SSID = llvm::SyncScope::System;
+break;
+  case 1: // __MEMORY_SCOPE_DEVICE
+SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
+break;
+  case 2: // __MEMORY_SCOPE_WRKGRP
+SSID = getLLVMContext().getOrInsertSyncScopeID("workgroup");
+break;
+  case 3: // __MEMORY_SCOPE_WVFRNT
+SSID = getLLVMContext().getOrInsertSyncScopeID("wavefront");
+break;
+  case 4: // __MEMORY_SCOPE_SINGLE
+SSID = llvm::SyncScope::SingleThread;
+break;
+  default:
+SSID = llvm::SyncScope::System;
+break;
+  }
 }
 
 llvm::Value *CodeGenFunction::EmitScalarOrConstFoldImmArg(unsigned 
ICEArguments,
@@ -18558,14 +18584,10 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
 Intrinsic::ID Intrin;
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_ds_faddf:
-  Intrin = Intrinsic::amdgcn_ds_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   Intrin = Intrinsic::amdgcn_ds_fmin;
   break;
@@ -18656,35 +18678,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()});
 return Builder.CreateCall(F, {Addr, Val});
   }
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16: {
-Intrinsic::ID IID;
-llvm::Type *ArgTy;
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
-  ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Constant *ZeroI32 = llvm::ConstantInt::getIntegerValue(
-llvm::Type::getInt32Ty(getLLVMContext()), APInt(32, 0, true));
-llvm::Constant *ZeroI1 = llvm::ConstantInt::getIntegerValue(
-llvm::Type::getInt1Ty(getLLVMContext()), APInt(1, 0));
-llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
-return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case 

[llvm-branch-commits] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-14 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes



---

Patch is 148.79 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/95593.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll 
(+141-1029) 
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll 
(+141-1029) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 78557e006170a..e03f262831eae 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16049,7 +16049,8 @@ 
SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
 return AtomicExpansionKind::None;
   if (Subtarget->hasAtomicFMinFMaxF64FlatInsts() && Ty->isDoubleTy())
 return AtomicExpansionKind::None;
-} else if (AMDGPU::isExtendedGlobalAddrSpace(AS)) {
+} else if (AMDGPU::isExtendedGlobalAddrSpace(AS) ||
+   AS == AMDGPUAS::BUFFER_FAT_POINTER) {
   if (Subtarget->hasAtomicFMinFMaxF32GlobalInsts() && Ty->isFloatTy())
 return AtomicExpansionKind::None;
   if (Subtarget->hasAtomicFMinFMaxF64GlobalInsts() && Ty->isDoubleTy())
diff --git a/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll 
b/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll
index fb068e35fc597..4b5cdd2ed32ef 100644
--- a/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll
+++ b/llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll
@@ -21,32 +21,11 @@ define float 
@buffer_fat_ptr_agent_atomic_fmax_ret_f32__offset(ptr addrspace(7)
 ; GFX12-NEXT:s_wait_samplecnt 0x0
 ; GFX12-NEXT:s_wait_bvhcnt 0x0
 ; GFX12-NEXT:s_wait_kmcnt 0x0
-; GFX12-NEXT:v_dual_mov_b32 v1, v0 :: v_dual_mov_b32 v0, s4
-; GFX12-NEXT:s_addk_co_i32 s4, 0x400
-; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
-; GFX12-NEXT:v_dual_mov_b32 v3, s4 :: v_dual_max_num_f32 v2, v1, v1
-; GFX12-NEXT:buffer_load_b32 v0, v0, s[0:3], null offen offset:1024
-; GFX12-NEXT:s_mov_b32 s4, 0
-; GFX12-NEXT:  .LBB0_1: ; %atomicrmw.start
-; GFX12-NEXT:; =>This Inner Loop Header: Depth=1
-; GFX12-NEXT:s_wait_loadcnt 0x0
-; GFX12-NEXT:v_mov_b32_e32 v5, v0
+; GFX12-NEXT:v_mov_b32_e32 v1, s4
 ; GFX12-NEXT:s_wait_storecnt 0x0
-; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
-; GFX12-NEXT:v_max_num_f32_e32 v0, v5, v5
-; GFX12-NEXT:v_max_num_f32_e32 v4, v0, v2
-; GFX12-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GFX12-NEXT:v_dual_mov_b32 v0, v4 :: v_dual_mov_b32 v1, v5
-; GFX12-NEXT:buffer_atomic_cmpswap_b32 v[0:1], v3, s[0:3], null offen 
th:TH_ATOMIC_RETURN
+; GFX12-NEXT:buffer_atomic_max_num_f32 v0, v1, s[0:3], null offen 
offset:1024 th:TH_ATOMIC_RETURN
 ; GFX12-NEXT:s_wait_loadcnt 0x0
 ; GFX12-NEXT:global_inv scope:SCOPE_DEV
-; GFX12-NEXT:v_cmp_eq_u32_e32 vcc_lo, v0, v5
-; GFX12-NEXT:s_or_b32 s4, vcc_lo, s4
-; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
-; GFX12-NEXT:s_and_not1_b32 exec_lo, exec_lo, s4
-; GFX12-NEXT:s_cbranch_execnz .LBB0_1
-; GFX12-NEXT:  ; %bb.2: ; %atomicrmw.end
-; GFX12-NEXT:s_or_b32 exec_lo, exec_lo, s4
 ; GFX12-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX940-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__offset:
@@ -81,64 +60,23 @@ define float 
@buffer_fat_ptr_agent_atomic_fmax_ret_f32__offset(ptr addrspace(7)
 ; GFX11-LABEL: buffer_fat_ptr_agent_atomic_fmax_ret_f32__offset:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:v_dual_mov_b32 v1, v0 :: v_dual_mov_b32 v0, s4
-; GFX11-NEXT:s_addk_i32 s4, 0x400
-; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instid1(SALU_CYCLE_1)
-; GFX11-NEXT:v_dual_mov_b32 v3, s4 :: v_dual_max_f32 v2, v1, v1
-; GFX11-NEXT:buffer_load_b32 v0, v0, s[0:3], 0 offen offset:1024
-; GFX11-NEXT:s_mov_b32 s4, 0
-; GFX11-NEXT:  .LBB0_1: ; %atomicrmw.start
-; GFX11-NEXT:; =>This Inner Loop Header: Depth=1
-; GFX11-NEXT:s_waitcnt vmcnt(0)
-; GFX11-NEXT:v_mov_b32_e32 v5, v0
+; GFX11-NEXT:v_mov_b32_e32 v1, s4
 ; GFX11-NEXT:s_waitcnt_vscnt null, 0x0
-; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
-; GFX11-NEXT:v_max_f32_e32 v0, v5, v5
-; GFX11-NEXT:v_max_f32_e32 v4, v0, v2
-; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:v_dual_mov_b32 v0, v4 :: v_dual_mov_b32 v1, v5
-; GFX11-NEXT:buffer_atomic_cmpswap_b32 v[0:1], v3, s[0:3], 0 offen glc
+; GFX11-NEXT:buffer_atomic_max_f32 v0, v1, s[0:3], 0 offen offset:1024 glc
 ; GFX11-NEXT:s_waitcnt vmcnt(0)
 ; GFX11-NEXT:buffer_gl1_inv
 ; GFX11-NEXT:buffer_gl0_inv
-; GFX11-NEXT:v_cmp_eq_u32_e32 vcc_lo, v0, v5
-; 

[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-14 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Define subtarget features for atomic fmin/fmax support.

The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.

gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.

---

Patch is 1.39 MiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/95592.diff


21 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+63-8) 
- (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+27-12) 
- (modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+55-30) 
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+20) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+22) 
- (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll (+155-1654) 
- (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll (+155-1654) 
- (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll (+181-2141) 
- (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll (+181-2141) 
- (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+490-1737) 
- (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+490-1737) 
- (added) 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmax.f32.ll 
(+638) 
- (added) 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmax.f64.ll 
(+271) 
- (added) 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmin.f32.ll 
(+638) 
- (added) 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.ptr.buffer.atomic.fmin.f64.ll 
(+271) 
- (modified) 
llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-agent.ll (+1584-256) 
- (modified) 
llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-system.ll 
(+1584-256) 
- (modified) 
llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-agent.ll (+792-128) 
- (modified) 
llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-system.ll (+792-128) 
- (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmax.ll 
(+97-77) 
- (modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmin.ll 
(+97-77) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 0a1550ccb53c4..2f4ca847096a1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -351,6 +351,7 @@ def FeatureGFX90AInsts : SubtargetFeature<"gfx90a-insts",
   "GFX90AInsts",
   "true",
   "Additional instructions for GFX90A+"
+  // [HasAtomicFMinFMaxF64GlobalInsts, HasAtomicFMinFMaxF64FlatInsts] // TODO
 >;
 
 def FeatureGFX940Insts : SubtargetFeature<"gfx940-insts",
@@ -711,6 +712,30 @@ def FeatureAtomicFaddRtnInsts : 
SubtargetFeature<"atomic-fadd-rtn-insts",
   [FeatureFlatGlobalInsts]
 >;
 
+def FeatureAtomicFMinFMaxF32GlobalInsts : 
SubtargetFeature<"atomic-fmin-fmax-global-f32",
+  "HasAtomicFMinFMaxF32GlobalInsts",
+  "true",
+  "Has global/buffer instructions for atomicrmw fmin/fmax for float"
+>;
+
+def FeatureAtomicFMinFMaxF64GlobalInsts : 
SubtargetFeature<"atomic-fmin-fmax-global-f64",
+  "HasAtomicFMinFMaxF64GlobalInsts",
+  "true",
+  "Has global/buffer instructions for atomicrmw fmin/fmax for float"
+>;
+
+def FeatureAtomicFMinFMaxF32FlatInsts : 
SubtargetFeature<"atomic-fmin-fmax-flat-f32",
+  "HasAtomicFMinFMaxF32FlatInsts",
+  "true",
+  "Has flat memory instructions for atomicrmw fmin/fmax for float"
+>;
+
+def FeatureAtomicFMinFMaxF64FlatInsts : 
SubtargetFeature<"atomic-fmin-fmax-flat-f64",
+  "HasAtomicFMinFMaxF64FlatInsts",
+  "true",
+  "Has flat memory instructions for atomicrmw fmin/fmax for double"
+>;
+
 def FeatureAtomicFaddNoRtnInsts : SubtargetFeature<"atomic-fadd-no-rtn-insts",
   "HasAtomicFaddNoRtnInsts",
   "true",
@@ -1061,7 +1086,8 @@ def FeatureSouthernIslands : 
GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
   FeatureWavefrontSize64, FeatureSMemTimeInst, FeatureMadMacF32Insts,
   FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel,
   FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts,
-  FeatureGDS, FeatureGWS, FeatureDefaultComponentZero
+  FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts
   ]
 >;
 
@@ -1072,7 +1098,9 @@ def FeatureSeaIslands : 
GCNSubtargetFeatureGeneration<"SEA_ISLANDS",
   FeatureCIInsts, FeatureMovrel, FeatureTrigReducedRange,
   FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
   FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess,
-  FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero
+  FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
+  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
   ]
 >;
 

[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

The global/flat/buffer atomic fmin/fmax situation is a mess. These
instructions have been renamed 3 times. We currently have
separate pseudos defined for the same opcodes with the different names
(e.g. GLOBAL_ATOMIC_MIN_F64 from gfx90a and GLOBAL_ATOMIC_FMIN_X2 from gfx10).

Use the _FMIN versions as the canonical name for the f32 versions. Use the
_MIN_F64 style as the canonical name for the f64 case. This is because
gfx90a has the most sensible names, but does not have the f32 versions.t sho

Wire through the pseudo to use for the instruction properties vs. the assembly
name like in other cases. This will simplify handling of direct atomicrmw 
selection.

This will simplify directly selecting these from atomicrmw.

---

Patch is 29.08 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/95591.diff


4 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+3-1) 
- (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+53-50) 
- (modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+53-54) 
- (modified) llvm/test/CodeGen/AMDGPU/fp-atomic-to-s_denormmode.mir (+20-20) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index d0d7a9dc17247..0a1550ccb53c4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1864,7 +1864,9 @@ def HasFlatAddressSpace : 
Predicate<"Subtarget->hasFlatAddressSpace()">,
 
 def HasBufferFlatGlobalAtomicsF64 :
   Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">,
-  AssemblerPredicate<(any_of FeatureGFX90AInsts)>;
+  // FIXME: This is too coarse, and working around using pseudo's predicates 
on real instruction.
+  AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, 
FeatureSouthernIslands, FeatureSeaIslands)>;
+
 def HasLdsAtomicAddF64 :
   Predicate<"Subtarget->hasLdsAtomicAddF64()">,
   AssemblerPredicate<(any_of FeatureGFX90AInsts)>;
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 43e5434ea2700..9d21f93a957cc 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1163,12 +1163,6 @@ let SubtargetPredicate = isGFX6GFX7GFX10 in {
 defm BUFFER_ATOMIC_FCMPSWAP_X2 : MUBUF_Pseudo_Atomics <
   "buffer_atomic_fcmpswap_x2", VReg_128, v2f64, null_frag
 >;
-defm BUFFER_ATOMIC_FMIN_X2 : MUBUF_Pseudo_Atomics <
-  "buffer_atomic_fmin_x2", VReg_64, f64, null_frag
->;
-defm BUFFER_ATOMIC_FMAX_X2 : MUBUF_Pseudo_Atomics <
-  "buffer_atomic_fmax_x2", VReg_64, f64, null_frag
->;
 
 }
 
@@ -1318,6 +1312,9 @@ let SubtargetPredicate = isGFX90APlus in {
 
 let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in {
   defm BUFFER_ATOMIC_ADD_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_add_f64", 
VReg_64, f64>;
+
+  // Note the names can be buffer_atomic_fmin_x2/buffer_atomic_fmax_x2
+  // depending on some subtargets.
   defm BUFFER_ATOMIC_MIN_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_min_f64", 
VReg_64, f64>;
   defm BUFFER_ATOMIC_MAX_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_max_f64", 
VReg_64, f64>;
 } // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
@@ -1763,8 +1760,8 @@ let OtherPredicates = [isGFX6GFX7GFX10Plus] in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f32, "BUFFER_ATOMIC_FMAX">;
 }
 let SubtargetPredicate = isGFX6GFX7GFX10 in {
-  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, 
"BUFFER_ATOMIC_FMIN_X2">;
-  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f64, 
"BUFFER_ATOMIC_FMAX_X2">;
+  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, 
"BUFFER_ATOMIC_MIN_F64">;
+  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f64, 
"BUFFER_ATOMIC_MAX_F64">;
 }
 
 class NoUseBufferAtomic : PatFrag <
@@ -2315,6 +2312,12 @@ let OtherPredicates = [HasPackedD16VMem] in {
 // Target-specific instruction encodings.
 
//===--===//
 
+// Shortcut to default Mnemonic from BUF_Pseudo. Hides the cast to the
+// specific pseudo (bothen in this case) since any of them will work.
+class get_BUF_ps {
+  string Mnemonic = !cast(name # "_OFFSET").Mnemonic;
+}
+
 
//===--===//
 // Base ENC_MUBUF for GFX6, GFX7, GFX10, GFX11.
 
//===--===//
@@ -2346,8 +2349,8 @@ multiclass MUBUF_Real_gfx11 op, string real_name 
= !cast(N
   }
 }
 
-class Base_MUBUF_Real_gfx6_gfx7_gfx10 op, MUBUF_Pseudo ps, int ef> :
-  Base_MUBUF_Real_gfx6_gfx7_gfx10_gfx11 {
+class Base_MUBUF_Real_gfx6_gfx7_gfx10 op, MUBUF_Pseudo ps, int ef, 
string asmName> :
+  Base_MUBUF_Real_gfx6_gfx7_gfx10_gfx11 {
   let Inst{12}= ps.offen;
   let Inst{13}= ps.idxen;
   let Inst{14}= !if(ps.has_glc, cpol{CPolBit.GLC}, ps.glc_value);
@@ -2357,9 +2360,10 @@ class 

[llvm-branch-commits] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95593?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95593** https://app.graphite.dev/github/pr/llvm/llvm-project/95593?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95592** https://app.graphite.dev/github/pr/llvm/llvm-project/95592?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95591** https://app.graphite.dev/github/pr/llvm/llvm-project/95591?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95590** https://app.graphite.dev/github/pr/llvm/llvm-project/95590?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95592?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95592** https://app.graphite.dev/github/pr/llvm/llvm-project/95592?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95591** https://app.graphite.dev/github/pr/llvm/llvm-project/95591?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95590** https://app.graphite.dev/github/pr/llvm/llvm-project/95590?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95591?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95592** https://app.graphite.dev/github/pr/llvm/llvm-project/95592?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95591** https://app.graphite.dev/github/pr/llvm/llvm-project/95591?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95590** https://app.graphite.dev/github/pr/llvm/llvm-project/95590?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95591

The global/flat/buffer atomic fmin/fmax situation is a mess. These
instructions have been renamed 3 times. We currently have
separate pseudos defined for the same opcodes with the different names
(e.g. GLOBAL_ATOMIC_MIN_F64 from gfx90a and GLOBAL_ATOMIC_FMIN_X2 from gfx10).

Use the _FMIN versions as the canonical name for the f32 versions. Use the
_MIN_F64 style as the canonical name for the f64 case. This is because
gfx90a has the most sensible names, but does not have the f32 versions.t sho

Wire through the pseudo to use for the instruction properties vs. the assembly
name like in other cases. This will simplify handling of direct atomicrmw 
selection.

This will simplify directly selecting these from atomicrmw.

>From b00ad0dab49ad96c160a16c062f35d7788bd77c8 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 13 Jun 2024 18:33:11 +0200
Subject: [PATCH] AMDGPU: Create pseudo to real mapping for flat/buffer atomic
 fmin/fmax

The global/flat/buffer atomic fmin/fmax situation is a mess. These
instructions have been renamed 3 times. We currently have
separate pseudos defined for the same opcodes with the different names
(e.g. GLOBAL_ATOMIC_MIN_F64 from gfx90a and GLOBAL_ATOMIC_FMIN_X2 from gfx10).

Use the _FMIN versions as the canonical name for the f32 versions. Use the
_MIN_F64 style as the canonical name for the f64 case. This is because
gfx90a has the most sensible names, but does not have the f32 versions.t sho

Wire through the pseudo to use for the instruction properties vs. the assembly
name like in other cases. This will simplify handling of direct atomicrmw 
selection.

This will simplify directly selecting these from atomicrmw.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  |   4 +-
 llvm/lib/Target/AMDGPU/BUFInstructions.td | 103 +
 llvm/lib/Target/AMDGPU/FLATInstructions.td| 107 +-
 .../AMDGPU/fp-atomic-to-s_denormmode.mir  |  40 +++
 4 files changed, 129 insertions(+), 125 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index d0d7a9dc17247..0a1550ccb53c4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1864,7 +1864,9 @@ def HasFlatAddressSpace : 
Predicate<"Subtarget->hasFlatAddressSpace()">,
 
 def HasBufferFlatGlobalAtomicsF64 :
   Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">,
-  AssemblerPredicate<(any_of FeatureGFX90AInsts)>;
+  // FIXME: This is too coarse, and working around using pseudo's predicates 
on real instruction.
+  AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, 
FeatureSouthernIslands, FeatureSeaIslands)>;
+
 def HasLdsAtomicAddF64 :
   Predicate<"Subtarget->hasLdsAtomicAddF64()">,
   AssemblerPredicate<(any_of FeatureGFX90AInsts)>;
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 43e5434ea2700..9d21f93a957cc 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1163,12 +1163,6 @@ let SubtargetPredicate = isGFX6GFX7GFX10 in {
 defm BUFFER_ATOMIC_FCMPSWAP_X2 : MUBUF_Pseudo_Atomics <
   "buffer_atomic_fcmpswap_x2", VReg_128, v2f64, null_frag
 >;
-defm BUFFER_ATOMIC_FMIN_X2 : MUBUF_Pseudo_Atomics <
-  "buffer_atomic_fmin_x2", VReg_64, f64, null_frag
->;
-defm BUFFER_ATOMIC_FMAX_X2 : MUBUF_Pseudo_Atomics <
-  "buffer_atomic_fmax_x2", VReg_64, f64, null_frag
->;
 
 }
 
@@ -1318,6 +1312,9 @@ let SubtargetPredicate = isGFX90APlus in {
 
 let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in {
   defm BUFFER_ATOMIC_ADD_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_add_f64", 
VReg_64, f64>;
+
+  // Note the names can be buffer_atomic_fmin_x2/buffer_atomic_fmax_x2
+  // depending on some subtargets.
   defm BUFFER_ATOMIC_MIN_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_min_f64", 
VReg_64, f64>;
   defm BUFFER_ATOMIC_MAX_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_max_f64", 
VReg_64, f64>;
 } // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
@@ -1763,8 +1760,8 @@ let OtherPredicates = [isGFX6GFX7GFX10Plus] in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f32, "BUFFER_ATOMIC_FMAX">;
 }
 let SubtargetPredicate = isGFX6GFX7GFX10 in {
-  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, 
"BUFFER_ATOMIC_FMIN_X2">;
-  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f64, 
"BUFFER_ATOMIC_FMAX_X2">;
+  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, 
"BUFFER_ATOMIC_MIN_F64">;
+  defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f64, 
"BUFFER_ATOMIC_MAX_F64">;
 }
 
 class NoUseBufferAtomic : PatFrag <
@@ -2315,6 +2312,12 @@ let OtherPredicates = [HasPackedD16VMem] in {
 // Target-specific instruction encodings.
 
//===--===//
 
+// Shortcut to default Mnemonic from BUF_Pseudo. Hides the cast to the
+// specific pseudo (bothen in this case) 

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits

https://github.com/WenleiHe approved this pull request.

lgtm with a nit, thanks.

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits

https://github.com/WenleiHe edited 
https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits


@@ -180,6 +186,13 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:

WenleiHe wrote:

nit: public is not needed for struct since it's the default. also if we only 
have one counter, the struct seems like an overkill. we could just keep it 
simple and use an integer counter for now without a struct, and expand later if 
needed.

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)

2024-06-14 Thread Tom Stellard via llvm-branch-commits

https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/95458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 443e23e - Bump version to 18.1.8 (#95458)

2024-06-14 Thread via llvm-branch-commits

Author: Tom Stellard
Date: 2024-06-14T12:20:26-07:00
New Revision: 443e23eed24d9533566f189ef25154263756a36d

URL: 
https://github.com/llvm/llvm-project/commit/443e23eed24d9533566f189ef25154263756a36d
DIFF: 
https://github.com/llvm/llvm-project/commit/443e23eed24d9533566f189ef25154263756a36d.diff

LOG: Bump version to 18.1.8 (#95458)

Added: 


Modified: 
llvm/CMakeLists.txt
llvm/utils/lit/lit/__init__.py

Removed: 




diff  --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 51278943847aa..909a965cd86c8 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 7)
+  set(LLVM_VERSION_PATCH 8)
 endif()
 if(NOT DEFINED LLVM_VERSION_SUFFIX)
   set(LLVM_VERSION_SUFFIX)

diff  --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index 5003d78ce5218..800d59492d8ff 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (18, 1, 7)
+__versioninfo__ = (18, 1, 8)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95395

>From b6fa394408069d850c2e074cec64eef8028d7737 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 10 Jun 2024 19:40:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins

We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.

This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 113 +++---
 clang/test/CodeGenCUDA/builtins-amdgcn.cu |   2 +-
 .../test/CodeGenCUDA/builtins-spirv-amdgcn.cu |   2 +-
 .../builtins-unsafe-atomics-gfx90a.cu |   5 +-
 ...tins-unsafe-atomics-spirv-amdgcn-gfx90a.cu |   2 +-
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  |  37 +-
 .../builtins-fp-atomics-gfx12.cl  |  14 ++-
 .../CodeGenOpenCL/builtins-fp-atomics-gfx8.cl |   9 +-
 .../builtins-fp-atomics-gfx90a.cl |   4 +-
 .../builtins-fp-atomics-gfx940.cl |  10 +-
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   3 +-
 11 files changed, 139 insertions(+), 62 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 511e1fd4016d7..d81cf40c912de 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18140,9 +18140,35 @@ void CodeGenFunction::ProcessOrderScopeAMDGCN(Value 
*Order, Value *Scope,
 break;
   }
 
+  // Some of the atomic builtins take the scope as a string name.
   StringRef scp;
-  llvm::getConstantStringInfo(Scope, scp);
-  SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+  if (llvm::getConstantStringInfo(Scope, scp)) {
+SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+return;
+  }
+
+  // Older builtins had an enum argument for the memory scope.
+  int scope = cast(Scope)->getZExtValue();
+  switch (scope) {
+  case 0: // __MEMORY_SCOPE_SYSTEM
+SSID = llvm::SyncScope::System;
+break;
+  case 1: // __MEMORY_SCOPE_DEVICE
+SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
+break;
+  case 2: // __MEMORY_SCOPE_WRKGRP
+SSID = getLLVMContext().getOrInsertSyncScopeID("workgroup");
+break;
+  case 3: // __MEMORY_SCOPE_WVFRNT
+SSID = getLLVMContext().getOrInsertSyncScopeID("wavefront");
+break;
+  case 4: // __MEMORY_SCOPE_SINGLE
+SSID = llvm::SyncScope::SingleThread;
+break;
+  default:
+SSID = llvm::SyncScope::System;
+break;
+  }
 }
 
 llvm::Value *CodeGenFunction::EmitScalarOrConstFoldImmArg(unsigned 
ICEArguments,
@@ -18558,14 +18584,10 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
 Intrinsic::ID Intrin;
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_ds_faddf:
-  Intrin = Intrinsic::amdgcn_ds_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   Intrin = Intrinsic::amdgcn_ds_fmin;
   break;
@@ -18656,35 +18678,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()});
 return Builder.CreateCall(F, {Addr, Val});
   }
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16: {
-Intrinsic::ID IID;
-llvm::Type *ArgTy;
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f64:
-  ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_ds_fadd;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Constant *ZeroI32 = llvm::ConstantInt::getIntegerValue(
-llvm::Type::getInt32Ty(getLLVMContext()), APInt(32, 0, true));
-llvm::Constant *ZeroI1 = llvm::ConstantInt::getIntegerValue(
-llvm::Type::getInt1Ty(getLLVMContext()), APInt(1, 0));
-llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
-return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case 

[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM contingent the plan to produce atomicrmw.

https://github.com/llvm/llvm-project/pull/95396
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits


@@ -15931,6 +15931,26 @@ static OptimizationRemark 
emitAtomicRMWLegalRemark(const AtomicRMWInst *RMW) {
  << " operation at memory scope " << MemScope;
 }
 
+static bool isHalf2OrBFloat2(Type *Ty) {

arsenm wrote:

Both instructions were added together. The currently defined feature is 
HasAtomicFlatPkAdd16Insts, so this mirrors that 

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits


@@ -51,6 +51,12 @@ cl::opt
   cl::desc("Infer counts from stale profile data."),
   cl::init(false), cl::Hidden, cl::cat(BoltOptCategory));
 
+cl::opt StaleMatchingMinMatchedBlock(
+"stale-matching-min-matched-block",
+cl::desc("Percentage threshold of matched basic blocks at which stale "
+ "profile inference is executed."),
+cl::init(50), cl::Hidden, cl::cat(BoltOptCategory));

shawbyoung wrote:

Yes, changed the default to 0

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits


@@ -1669,13 +1670,16 @@ defm : FlatSignedAtomicPatWithAddrSpace 
<"FLAT_ATOMIC_ADD_F32", "int_amdgcn_flat
 }
 
 let OtherPredicates = [HasAtomicFlatPkAdd16Insts] in {
+// FIXME: These do not have signed offsets

arsenm wrote:

Yes, but I was planning on copying the pre-existing bug and fixing them both 
together later (assuming this is actually a bug and there's not some special 
case I haven't found documentation for)

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/6] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/6] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/6] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/6] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -15931,6 +15931,26 @@ static OptimizationRemark 
emitAtomicRMWLegalRemark(const AtomicRMWInst *RMW) {
  << " operation at memory scope " << MemScope;
 }
 
+static bool isHalf2OrBFloat2(Type *Ty) {

rampitec wrote:

Does the underlying type really matter? Is 2 x 16-bit type sufficient?

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -1669,13 +1670,16 @@ defm : FlatSignedAtomicPatWithAddrSpace 
<"FLAT_ATOMIC_ADD_F32", "int_amdgcn_flat
 }
 
 let OtherPredicates = [HasAtomicFlatPkAdd16Insts] in {
+// FIXME: These do not have signed offsets

rampitec wrote:

Can you just use FlatAtomicPat?

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-14 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> On the other hand, it's a lot easier to handle ugly types down in instruction 
> selection, where you get to play much more fast and loose with types.

I think it's mostly easier to do this in the IR 

> 
> And there are buffer uses that don't fit into the fat pointer use use case 
> where we'd still want them to work. For example, both `str 
> uct.ptr.bufferload.v6f16` and `struct.ptr.buffer.load.v3f32` should be a 
> `buffer_load_dwordx3`, but I'm pretty sure 6 x half isn't a register type.

Yes, we should just fix this one

> 
> The load and store intrinsics are already overloaded to handle various {8, 
> 16, ..., 128}-bit types, and it seems much cleaner to let it support any type 
> of those lengths. It's just a load/store with somewhat weird indexing 
> semantics, is all.

Splitting is pretty ugly too, especially for a truly arbitrary type in 
legalization.



https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits


@@ -51,6 +51,12 @@ cl::opt
   cl::desc("Infer counts from stale profile data."),
   cl::init(false), cl::Hidden, cl::cat(BoltOptCategory));
 
+cl::opt StaleMatchingMinMatchedBlock(
+"stale-matching-min-matched-block",
+cl::desc("Percentage threshold of matched basic blocks at which stale "
+ "profile inference is executed."),
+cl::init(50), cl::Hidden, cl::cat(BoltOptCategory));

WenleiHe wrote:

Do we want to leave the default same as current behavior for now?

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung edited 
https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits


@@ -180,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks matched loosely.
+  uint64_t MatchedLooseBlocks{0};
+  /// The number of execution counts matched.
+  uint64_t MatchedExecCounts{0};

shawbyoung wrote:

Left this for possible extensibility of our definition for the threshold, but 
since these aren't being used at the moment, I'll remove them

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-14 Thread Anton Korobeynikov via llvm-branch-commits

https://github.com/asl approved this pull request.

+1

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/5] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/5] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/5] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/5] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/5] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/5] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/5] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/5] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits


@@ -58,7 +58,6 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
-  uint64_t Sink{UINT64_MAX};

WenleiHe wrote:

this change doesn't belong to this PR

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits


@@ -51,6 +51,12 @@ cl::opt
   cl::desc("Infer counts from stale profile data."),
   cl::init(false), cl::Hidden, cl::cat(BoltOptCategory));
 
+cl::opt MatchedProfileThreshold(
+"matched-profile-threshold",

WenleiHe wrote:

nit: the current name is quite ambiguous, suggest rename it to 
`stale-matching-min-matched-block` to be consistent with the convention used by 
other switches, and also more explicit. 

Suggestion for description: "Minimum number of exact match block for a function 
to be considered for profile inference." 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread via llvm-branch-commits


@@ -180,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks matched loosely.
+  uint64_t MatchedLooseBlocks{0};
+  /// The number of execution counts matched.
+  uint64_t MatchedExecCounts{0};

WenleiHe wrote:

What's the intended use for these two? There is no usage anywhere right now, 
and usually we don't keep unused stuff around.

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/5] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/5] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/5] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/5] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-14 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/95156

>From aa441dc0163d3d0f63de1e4dd1fa359180f82f1f Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 11 Jun 2024 11:43:13 -0700
Subject: [PATCH 1/4] Summary: Functions with little exact matching

Created using spr 1.3.4
---
 bolt/docs/CommandLineArgumentReference.md | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/bolt/docs/CommandLineArgumentReference.md 
b/bolt/docs/CommandLineArgumentReference.md
index 8887d1f5d5bd4..bdc1d9dfd735c 100644
--- a/bolt/docs/CommandLineArgumentReference.md
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.
+
+- `--matched-profile-threshold=`
+
   Threshold (in percent) for selecting functions to process in lite mode. 
Higher
   threshold means fewer functions to process. E.g threshold of 90 means only 
top
   10 percent of functions with profile will be processed.
@@ -1161,4 +1172,4 @@
 
 - `--print-options`
 
-  Print non-default options after command line parsing
\ No newline at end of file
+  Print non-default options after command line parsing

>From 46fa37a054a129ca36e7b6ae126273e40fddea98 Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:32:40 -0700
Subject: [PATCH 2/4] Update SampleProfileInference.h

---
 llvm/include/llvm/Transforms/Utils/SampleProfileInference.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index c654715c0ae9f..9ccbd0fa88f3d 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -58,6 +58,7 @@ struct FlowFunction {
   std::vector Jumps;
   /// The index of the entry block.
   uint64_t Entry{0};
+  uint64_t Sink{UINT64_MAX};
   // Matched execution count for the function.
   uint64_t MatchedExecCount{0};
 };

>From d532514257feb5e86232e76c437c99a41d5f2cea Mon Sep 17 00:00:00 2001
From: shaw young <58664393+shawbyo...@users.noreply.github.com>
Date: Tue, 11 Jun 2024 14:39:28 -0700
Subject: [PATCH 3/4] Update StaleProfileMatching.cpp

---
 bolt/lib/Profile/StaleProfileMatching.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 41afa6b4bbb19..47335163263a4 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -604,8 +604,8 @@ bool canApplyInference(const FlowFunction ,
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >=
-  opts::MatchedProfileThreshold / 100)
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(

>From 3fc6d72d866333d8ce964fdfaa748791d4f8d2b4 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Fri, 14 Jun 2024 08:38:19 -0700
Subject: [PATCH 4/4] spr amend

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 37 +++
 .../Transforms/Utils/SampleProfileInference.h |  3 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 47335163263a4..cb356afdd2948 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(0), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks 

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Johannes Doerfert via llvm-branch-commits

jdoerfert wrote:

If we make being declare variant elide on a user defined compile time 
condition, we could use the change in the `EF_AMDGPU_MACH_AMDGCN_LAST` value to 
determine a minimum version:
```
EF_AMDGPU_MACH_AMDGCN_LAST <= EF_AMDGPU_MACH_AMDGCN_GFX1013,
```
It's not possible right now but not hard to do.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Joseph Huber via llvm-branch-commits

jhuber6 wrote:

> The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires 
> the `HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol.
> 
> This symbol is expected to be provided by 
> `openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not 
> by third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h`.

This was introduced in ROCm-5.3, see 
https://github.com/ROCm/ROCR-Runtime/blob/rocm-5.3.x/src/inc/hsa_ext_amd.h#L333.
 The `dynamic_hsa/` version is a copy of this header for use when the system 
version is not provided. If the system fails to find HSA, then it will use the 
dynamic version. The problem here is that you _have_ HSA, but it's too old. I 
don't know how much backward compatibility we really provide here, 
unfortunately the HSA headers really don't give you much versioning to work 
with, so we can't do `ifdef` on this stuff. 

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

I reproduce the bug with both `release/18.x` and `release/17.x`.

I don't reproduce the bug with `release/16.x`.

I cannot test `release/15.x` because of other unrelated errors happening (like 
not having `getenv` defined).

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-14 Thread Kristof Beyls via llvm-branch-commits

https://github.com/kbeyls approved this pull request.


https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-14 Thread Kristof Beyls via llvm-branch-commits

kbeyls wrote:

> [37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f)
>  shows what I had in mind, let me know what you all think. I added:
> 
> ```
> void getSipHash_2_4_64(ArrayRef In, const uint8_t ()[16],
>uint8_t ()[8]);
> 
> void getSipHash_2_4_128(ArrayRef In, const uint8_t ()[16],
> uint8_t ()[16]);
> ```
> 
> as the core interfaces, and mimicked the ref. test harness to reuse the same 
> test vectors. If this seems reasonable to yall I'm happy to extract the 
> vectors.h file from the ref. implementation into the "Import original 
> sources" PR – that's why I kept it open ;)

Thanks, that looks good to me.

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

Here is a script to reproduce the bug:

```bash
#! /usr/bin/env bash

set -x -u -e -o pipefail

version="${1:-18}"

CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)"
export CMAKE_BUILD_PARALLEL_LEVEL="${CMAKE_BUILD_PARALLEL_LEVEL:-4}"

workspace="llvm-bug95484-${version}"

rm -rf "${workspace}"
mkdir "${workspace}"
cd "${workspace}"

git clone --depth 1 \
--branch "release/${version}.x" \
'https://github.com/llvm/llvm-project.git' \
'llvm-project'

git clone --depth 1 \
'https://github.com/KhronosGroup/SPIRV-Headers.git' \
'llvm-project/llvm/projects/SPIRV-Headers'

git clone --depth 1 \
--branch "llvm_release_${version}0" \
'https://github.com/KhronosGroup/SPIRV-LLVM-Translator.git' \
'llvm-project/llvm/projects/SPIRV-LLVM-Translator'

cmake \
-S'llvm-project/llvm' \
-B'build' \
-G'Ninja' \
-D'CMAKE_INSTALL_PREFIX'='install' \
-D'CMAKE_BUILD_TYPE'='Release' \
-D'BUILD_SHARED_LIBS'='ON' \
-D'LLVM_ENABLE_PROJECTS'='clang;openmp' \
-D'LLVM_TARGETS_TO_BUILD'='Native' \
-D'LLVM_EXPERIMENTAL_TARGETS_TO_BUILD'='SPIRV' \
-D'LLVM_ENABLE_ASSERTIONS'='OFF' \
-D'LLVM_ENABLE_RTTI'='ON' \
-D'LLVM_BUILD_TESTS'='OFF' \
-D'LLVM_BUILD_TOOLS'='ON' \
-D'LLVM_SPIRV_INCLUDE_TESTS'='OFF' \
-D'LLVM_EXTERNAL_PROJECTS'='SPIRV-Headers'

cmake --build 'build'

cmake --install 'build'
```

It can be used just by saving it as `llvm-bug95484` and running it by doing 
either:

- `./llvm-bug95484`
  to fetch and attempt a clean build of `release/18.x` in a way it reproduces 
the bug,
- `./llvm-bug95484 17`
  to fetch and reproduce the bug with `release/17.x`.

It will fail this way:

```
llvm-bug95484-18/llvm-project/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1902:37:
 error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope;
 did you mean ‘HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY’?
 1902 | if (auto Err = 
getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY,
  | 
^~
  | HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY
```

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits