[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-28 Thread Wei Xiao via cfe-commits

https://github.com/williamweixiao closed 
https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-28 Thread Wei Xiao via cfe-commits

https://github.com/williamweixiao approved this pull request.


https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-27 Thread Tim Creech via cfe-commits


@@ -2547,22 +2547,40 @@ usual build cycle when using sample profilers for 
optimization:
used in the first step. The only requirement is that you build the code
with the same debug info options and ``-fprofile-sample-use``.
 
+   On Linux:
+
.. code-block:: console
 
  $ clang++ -O2 -gline-tables-only \
-fdebug-info-for-profiling -funique-internal-linkage-names \
-fprofile-sample-use=code.prof code.cc -o code
 
-  [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
-  edge counters. The profile inference algorithm (profi) can be used to infer
-  missing blocks and edge counts, and improve the quality of profile data.
-  Enable it with ``-fsample-profile-use-profi``.
+   On Windows:
 
-  .. code-block:: console
+   .. code-block:: winbatch
+
+ > clang-cl -O2 -gdwarf -gline-tables-only ^

tcreech-intel wrote:

Good idea. I've updated the clang-cl examples to use cl-style forward-slash 
options when possible. There are still a few cases (`-gdwarf 
-gline-tables-only`) where only the hyphen version is understood, and also some 
cases (`/clang:-fdebug-info-for-profiling 
/clang:-funique-internal-linkage-names`) where the hyphen version is understood 
only with `/clang:`.

https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-27 Thread Tim Creech via cfe-commits

https://github.com/tcreech-intel updated 
https://github.com/llvm/llvm-project/pull/88438

>From fe3404cbdf78b434f16f8351dc242175b4543112 Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Thu, 11 Apr 2024 16:03:52 -0400
Subject: [PATCH 1/4] Improve documented sampling profiler steps to best known
 methods

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`,
   which improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which
   provides the most precise results on supporting hardware.  Mention
   `branches:u` as a more portable backup.

   Both should portray execution counts better than the default event
   (`cycles`) and have a better chance of working as an unprivileged
   user due to the `:u` modifier.
---
 clang/docs/UsersManual.rst | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld 
-link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only 
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names 
code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune

`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,

>From add91ec329f60eef6ecf79d6d5c9a548a8d6bcfe Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Mon, 22 Apr 2024 11:11:36 -0400
Subject: [PATCH 2/4] fixup: add uniqueing note, match debug flags

---
 clang/docs/UsersManual.rst | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 818841285cfae5..b87fc7f2aaa4dd 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2314,6 +2314,8 @@ are listed below.
on ELF targets when using the integrated assembler. This flag currently
only has an effect on ELF targets.
 
+.. _funique_internal_linkage_names:
+
 .. option:: -f[no]-unique-internal-linkage-names
 
Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2451,15 +2453,27 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
+ $ clang++ -O2 -gline-tables-only \
+   -fdebug-info-for-profiling -funique-internal-linkage-names \
+   code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
``-gdwarf`` to include DWARF debug information:
 
-   .. code-block:: console
+   .. code-block:: winbatch
+
+ $ clang-cl -O2 -gdwarf -gline-tables-only ^
+   /clang

[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-26 Thread via cfe-commits


@@ -2547,22 +2547,40 @@ usual build cycle when using sample profilers for 
optimization:
used in the first step. The only requirement is that you build the code
with the same debug info options and ``-fprofile-sample-use``.
 
+   On Linux:
+
.. code-block:: console
 
  $ clang++ -O2 -gline-tables-only \
-fdebug-info-for-profiling -funique-internal-linkage-names \
-fprofile-sample-use=code.prof code.cc -o code
 
-  [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
-  edge counters. The profile inference algorithm (profi) can be used to infer
-  missing blocks and edge counts, and improve the quality of profile data.
-  Enable it with ``-fsample-profile-use-profi``.
+   On Windows:
 
-  .. code-block:: console
+   .. code-block:: winbatch
+
+ > clang-cl -O2 -gdwarf -gline-tables-only ^

chrulski-intel wrote:

Since these commands are using 'clang-cl', would it be better to show and use 
the options in the native clang-cl format described at line 4557 for 
consistency instead of mixing slashes and hyphens? i.e. /O2 instead of -O2, and 
/Fe instead of -o, etc.

https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-25 Thread Tim Creech via cfe-commits

https://github.com/tcreech-intel updated 
https://github.com/llvm/llvm-project/pull/88438

>From fe3404cbdf78b434f16f8351dc242175b4543112 Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Thu, 11 Apr 2024 16:03:52 -0400
Subject: [PATCH 1/3] Improve documented sampling profiler steps to best known
 methods

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`,
   which improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which
   provides the most precise results on supporting hardware.  Mention
   `branches:u` as a more portable backup.

   Both should portray execution counts better than the default event
   (`cycles`) and have a better chance of working as an unprivileged
   user due to the `:u` modifier.
---
 clang/docs/UsersManual.rst | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld 
-link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only 
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names 
code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune

`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,

>From add91ec329f60eef6ecf79d6d5c9a548a8d6bcfe Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Mon, 22 Apr 2024 11:11:36 -0400
Subject: [PATCH 2/3] fixup: add uniqueing note, match debug flags

---
 clang/docs/UsersManual.rst | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 818841285cfae5..b87fc7f2aaa4dd 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2314,6 +2314,8 @@ are listed below.
on ELF targets when using the integrated assembler. This flag currently
only has an effect on ELF targets.
 
+.. _funique_internal_linkage_names:
+
 .. option:: -f[no]-unique-internal-linkage-names
 
Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2451,15 +2453,27 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
+ $ clang++ -O2 -gline-tables-only \
+   -fdebug-info-for-profiling -funique-internal-linkage-names \
+   code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
``-gdwarf`` to include DWARF debug information:
 
-   .. code-block:: console
+   .. code-block:: winbatch
+
+ $ clang-cl -O2 -gdwarf -gline-tables-only ^
+   /clang

[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-23 Thread via cfe-commits

https://github.com/chrulski-intel approved this pull request.


https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-22 Thread Tim Creech via cfe-commits

https://github.com/tcreech-intel updated 
https://github.com/llvm/llvm-project/pull/88438

>From fe3404cbdf78b434f16f8351dc242175b4543112 Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Thu, 11 Apr 2024 16:03:52 -0400
Subject: [PATCH 1/2] Improve documented sampling profiler steps to best known
 methods

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`,
   which improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which
   provides the most precise results on supporting hardware.  Mention
   `branches:u` as a more portable backup.

   Both should portray execution counts better than the default event
   (`cycles`) and have a better chance of working as an unprivileged
   user due to the `:u` modifier.
---
 clang/docs/UsersManual.rst | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld 
-link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only 
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names 
code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune

`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,

>From add91ec329f60eef6ecf79d6d5c9a548a8d6bcfe Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Mon, 22 Apr 2024 11:11:36 -0400
Subject: [PATCH 2/2] fixup: add uniqueing note, match debug flags

---
 clang/docs/UsersManual.rst | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 818841285cfae5..b87fc7f2aaa4dd 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2314,6 +2314,8 @@ are listed below.
on ELF targets when using the integrated assembler. This flag currently
only has an effect on ELF targets.
 
+.. _funique_internal_linkage_names:
+
 .. option:: -f[no]-unique-internal-linkage-names
 
Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2451,15 +2453,27 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
+ $ clang++ -O2 -gline-tables-only \
+   -fdebug-info-for-profiling -funique-internal-linkage-names \
+   code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
``-gdwarf`` to include DWARF debug information:
 
-   .. code-block:: console
+   .. code-block:: winbatch
+
+ $ clang-cl -O2 -gdwarf -gline-tables-only ^
+   /clang

[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-19 Thread Tim Creech via cfe-commits


@@ -2443,27 +2443,29 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.

tcreech-intel wrote:

Thanks, @chrulski-intel -- good point. I'll add a brief note.

@williamweixiao, I think you're right that they should match. I'll update those 
steps.

https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-19 Thread Wei Xiao via cfe-commits


@@ -2443,27 +2443,29 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.

williamweixiao wrote:

do we also need  ``-fdebug-info-for-profiling`` and 
``-funique-internal-linkage-names`` for step 4 
("-fprofile-sample-use=code.prof") ?

https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-11 Thread Tim Creech via cfe-commits

tcreech-intel wrote:

@williamweixiao, @HaohaiWen, this updates the docs to describe best practices 
given #83972.

It seems `-fdebug-info-for-profiling` can be particularly important. Without it 
we were discarding nearly half of the samples in some cases.

https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-11 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Tim Creech (tcreech-intel)


Changes

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`, which 
improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which provides the 
most precise results on supporting hardware.  Mention `branches:u` as a more 
portable backup.

   Both should portray execution counts better than the default event 
(`cycles`) and have a better chance of working as an unprivileged user due to 
the `:u` modifier.

---
Full diff: https://github.com/llvm/llvm-project/pull/88438.diff


1 Files Affected:

- (modified) clang/docs/UsersManual.rst (+10-6) 


``diff
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld 
-link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only 
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names 
code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune

`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,

``




https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-11 Thread Tim Creech via cfe-commits

https://github.com/tcreech-intel ready_for_review 
https://github.com/llvm/llvm-project/pull/88438
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

2024-04-11 Thread Tim Creech via cfe-commits

https://github.com/tcreech-intel created 
https://github.com/llvm/llvm-project/pull/88438

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`, which 
improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which provides the 
most precise results on supporting hardware.  Mention `branches:u` as a more 
portable backup.

   Both should portray execution counts better than the default event 
(`cycles`) and have a better chance of working as an unprivileged user due to 
the `:u` modifier.

>From fe3404cbdf78b434f16f8351dc242175b4543112 Mon Sep 17 00:00:00 2001
From: Tim Creech 
Date: Thu, 11 Apr 2024 16:03:52 -0400
Subject: [PATCH] Improve documented sampling profiler steps to best known
 methods

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`,
   which improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which
   provides the most precise results on supporting hardware.  Mention
   `branches:u` as a more portable backup.

   Both should portray execution counts better than the default event
   (`cycles`) and have a better chance of working as an unprivileged
   user due to the `:u` modifier.
---
 clang/docs/UsersManual.rst | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for 
optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
.. code-block:: console
 
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling 
-funique-internal-linkage-names code.cc -o code
 
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld 
-link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only 
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names 
code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune

`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for 
optimization:
 
.. code-block:: console
 
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits