[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-26 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 557368.
AlexVlx added a comment.

Use double dash flags exclusively.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -390,6 +390,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``--hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code:
+
+.. code-block:: C++
+
+   bool has_the_answer(const std::vector& v) {
+ return std::find(std::execution::par_unseq, std::cbegin(v), std::cend(v), 42) != std::cend(v);
+   }
+
+if Clang is invoked with the ``--hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``--hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-25 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx added inline comments.



Comment at: clang/docs/HIPSupport.rst:266
+
+- ``-hipstdpar`` / ``--hipstdpar`` enables algorithm offload, which depending 
on
+  phase, has the following effects:

MaskRay wrote:
> Newer long options that don't use the common prefix like `-f` are preferred 
> to only support `--foo`, not `-foo`.
> Many `--hip*` options don't support `-hip*`. Why add `-hipstdpar`?
Thanks for the review - to answer the question, it was only done for (apparent) 
symmetry, there's no strong incentive to have the single dash flavour; I will 
update both this and the driver patch to retain only the double dash flavours.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-25 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added inline comments.



Comment at: clang/docs/HIPSupport.rst:266
+
+- ``-hipstdpar`` / ``--hipstdpar`` enables algorithm offload, which depending 
on
+  phase, has the following effects:

Newer long options that don't use the common prefix like `-f` are preferred to 
only support `--foo`, not `-foo`.
Many `--hip*` options don't support `-hip*`. Why add `-hipstdpar`?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-19 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 557057.
AlexVlx marked an inline comment as done.
AlexVlx added a comment.

Rebase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -390,6 +390,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-12 Thread Ronan Keryell via Phabricator via cfe-commits
keryell accepted this revision.
keryell added a comment.

Thanks for the clarifications.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-12 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx marked an inline comment as done.
AlexVlx added inline comments.



Comment at: clang/docs/HIPSupport.rst:216
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+

keryell wrote:
> Since this does not sounds like an official wording and this is not a 
> recommended practice 
> https://isocpp.org/wiki/faq/coding-standards#using-namespace-std, I suggest 
> just adding `std::` everywhere since this is an end-user document.
> Further more it makes clear that your extension can work with the standard 
> library instead of something that would be declared in a namespace from your 
> extension.
> 
Addressed, thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-12 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 556618.
AlexVlx added a comment.

Address review feedback around explicit use of `std::`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -340,6 +340,11 @@
 CUDA/HIP Language Changes
 ^

+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 

Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.

+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code:
+
+.. code-block:: C++
+
+   bool has_the_answer(const std::vector& v) {
+ return std::find(std::execution::par_unseq, std::cbegin(v), std::cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-11 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl accepted this revision.
yaxunl added a comment.
This revision is now accepted and ready to land.

LGTM. Please address Ronan's comments. Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-10 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 556369.
AlexVlx added a comment.

Fix some wording to further clarify this is an HIP + AMDGPU exclusive extension.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -340,6 +340,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,392 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This section describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of AMD GPUs. This, coupled with the
+the availability and maturity of GPU accelerated algorithm libraries that
+implement most / all corresponding algorithms in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an AMD accelerator, via a well specified,
+familiar, algorithmic interface, without having to delve into low-level hardware
+specific details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise AMDGPU accelerator programming, without loss of user
+  familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-01 Thread Ronan Keryell via Phabricator via cfe-commits
keryell added inline comments.



Comment at: clang/docs/HIPSupport.rst:216
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+

Since this does not sounds like an official wording and this is not a 
recommended practice 
https://isocpp.org/wiki/faq/coding-standards#using-namespace-std, I suggest 
just adding `std::` everywhere since this is an end-user document.
Further more it makes clear that your extension can work with the standard 
library instead of something that would be declared in a namespace from your 
extension.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-09-01 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 555450.
AlexVlx added a reviewer: JonChesterfield.
AlexVlx added a comment.

Correctly reflect AMDGPU-only nature of the extension.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -311,6 +311,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,391 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: only AMDGPU accelerators targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-27 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 553800.
AlexVlx added a comment.

Add back clarification about RDC.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,391 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-27 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 553799.
AlexVlx edited reviewers, added: jhuber6; removed: tra, jdoerfert.
AlexVlx added a comment.
Herald added a reviewer: jdoerfert.

Fix typo, add release notes reflecting the HIP-only nature.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  clang/docs/ReleaseNotes.rst

Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -279,6 +279,11 @@
 CUDA/HIP Language Changes
 ^
 
+- [HIP Only] add experimental support for offloading select C++ Algorithms,
+  which can be toggled via the ``-hipstdpar`` flag. This is available only for
+  the AMDGPU target at the moment. See the dedicated entry in the `HIP Language
+  support document for details `_
+
 CUDA Support
 
 
Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,387 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 552763.
AlexVlx added a comment.

Remove noise.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst

Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,391 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arch``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on a
+pass over IR to clean up any and all code that was not "meant" for offload. If
+requested, allocation interposition is also handled via a separate pass over IR.
+
+To interface with the client HIP runtime, and to forward offloaded algorithm
+invocations to the corresponding accelerator specific library implementation, an
+implementation detail forwarding header is implicitly included by the driver,
+when compiling with ``-hipstdpar``. In what follows, we will delve into each
+component that contributes to implementing Algorithm Offload support.
+
+Implementation - Driver
+===
+
+We augment the 

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-23 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

seems some irrelevant change got into this patch


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-23 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 552754.
AlexVlx added a comment.
Herald added subscribers: llvm-commits, hiraditya.
Herald added a reviewer: kiranchandramohan.
Herald added a project: LLVM.

Update per review comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst
  flang/lib/Semantics/resolve-directives.cpp
  lld/tools/lld/lld.cpp
  llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
  llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
  llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir

Index: llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir
===
--- llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir
+++ llvm/test/DebugInfo/AArch64/dbg-entry-value-swiftasync.mir
@@ -36,7 +36,7 @@
   - { id: 0, name: '', type: spill-slot, offset: -16, size: 8, alignment: 16,
   stack-id: default, callee-saved-register: '$lr', callee-saved-restored: true }
 entry_values:
-  - { entry-value-register: '$x22', debug-info-variable: '!10', debug-info-expression: '!DIExpression(DW_OP_LLVM_entry_value, 1, DW_OP_deref)',
+  - { entry-value-register: '$x22', debug-info-variable: '!10', debug-info-expression: '!DIExpression(DW_OP_LLVM_entry_value, 1)',
   debug-info-location: '!12' }
 body: |
   bb.0 (%ir-block.0):
Index: llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
===
--- llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
+++ llvm/test/CodeGen/AArch64/dbg-declare-swift-async.ll
@@ -3,11 +3,11 @@
 ; RUN: llc -O0 -fast-isel=false -global-isel=false -stop-after=finalize-isel %s -o - | FileCheck %s
 
 ; CHECK: void @foo
-; CHECK-NEXT: dbg.declare(metadata {{.*}}, metadata ![[VAR:.*]], metadata !DIExpression([[EXPR:.*]])), !dbg ![[LOC:.*]]
+; CHECK-NEXT: dbg.declare(metadata {{.*}}, metadata ![[VAR:.*]], metadata ![[EXPR:.*]]), !dbg ![[LOC:.*]]
 ; CHECK: entry_values:
-; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '!DIExpression([[EXPR]], DW_OP_deref)',
+; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '![[EXPR]]',
 ; CHECK-NEXT:   debug-info-location: '![[LOC]]
-; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '!DIExpression([[EXPR]], DW_OP_deref)'
+; CHECK-NEXT: entry-value-register: '$x22', debug-info-variable: '![[VAR]]', debug-info-expression: '![[EXPR]]'
 ; CHECK-NEXT:   debug-info-location: '![[LOC]]
 
 ; CHECK-NOT: DBG_VALUE
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
===
--- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -1357,8 +1357,6 @@
   // Find the corresponding livein physical register to this argument.
   for (auto [PhysReg, VirtReg] : FuncInfo.RegInfo->liveins())
 if (VirtReg == ArgVReg) {
-  // Append an op deref to account for the fact that this is a dbg_declare.
-  Expr = DIExpression::append(Expr, dwarf::DW_OP_deref);
   FuncInfo.MF->setVariableDbgInfo(Var, Expr, PhysReg, DbgLoc);
   LLVM_DEBUG(dbgs() << "processDbgDeclare: setVariableDbgInfo Var=" << *Var
 << ", Expr=" << *Expr << ",  MCRegister=" << PhysReg
Index: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
===
--- llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1943,8 +1943,6 @@
   if (!PhysReg)
 return false;
 
-  // Append an op deref to account for the fact that this is a dbg_declare.
-  Expr = DIExpression::append(Expr, dwarf::DW_OP_deref);
   MF->setVariableDbgInfo(DebugInst.getVariable(), Expr, *PhysReg,
  DebugInst.getDebugLoc());
   return true;
Index: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
===
--- llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -1565,7 +1565,7 @@
 if (VI.inStackSlot())
   RegVar->initializeMMI(VI.Expr, VI.getStackSlot());
 else {
-  MachineLocation MLoc(VI.getEntryValueRegister(), /*IsIndirect*/ false);
+  MachineLocation MLoc(VI.getEntryValueRegister(), /*IsIndirect*/ true);
   auto LocEntry = DbgValueLocEntry(MLoc);
   RegVar->initializeDbgValue(DbgValueLoc(VI.Expr, LocEntry));
 }
Index: lld/tools/lld/lld.cpp
===
--- lld/tools/lld/lld.cpp
+++ lld/tools/lld/lld.cpp
@@ -30,6 +30,7 @@
 #include "lld/Common/Memory.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"

[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-23 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/docs/HIPSupport.rst:232
+execution seamlessly falls back to the host CPU. It is legal to specify 
multiple
+``--offload-arcj``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.

typo j -> h



Comment at: clang/docs/HIPSupport.rst:549
+   Standard.
+
+Open Questions / Future Developments

can we add an item explaining whether relocatable object files compiled from 
`--hipstdpar` and HIP be linked together? 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155769: [HIP][Clang][docs][RFC] Add documentation for C++ Parallel Algorithm Offload

2023-08-22 Thread Alex Voicu via Phabricator via cfe-commits
AlexVlx updated this revision to Diff 552550.
AlexVlx marked an inline comment as done.
AlexVlx retitled this revision from "[Clang][docs][RFC] Add documentation for 
C++ Parallel Algorithm Offload" to "[HIP][Clang][docs][RFC] Add documentation 
for C++ Parallel Algorithm Offload".
AlexVlx removed reviewers: bader, Anastasia, erichkeane.
AlexVlx added a comment.
Herald added a reviewer: jdoerfert.
Herald added subscribers: jplehr, sstefan1.

Updating this to reflect the outcome of the RFC, which is that this shall be a 
HIP only extension. As such, documentation lives within the HIP Support master 
document; the other patches in the series will be updated accordingly.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155769/new/

https://reviews.llvm.org/D155769

Files:
  clang/docs/HIPSupport.rst

Index: clang/docs/HIPSupport.rst
===
--- clang/docs/HIPSupport.rst
+++ clang/docs/HIPSupport.rst
@@ -176,3 +176,387 @@
* - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
  - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++ Standard Parallelism Offload Support: Compiler And Runtime
+==
+
+Introduction
+
+
+This document describes the implementation of support for offloading the
+execution of standard C++ algorithms to accelerators that can be targeted via
+HIP. Furthermore, it enumerates restrictions on user defined code, as well as
+the interactions with runtimes.
+
+Algorithm Offload: What, Why, Where
+===
+
+C++17 introduced overloads
+`for most algorithms in the standard library `_
+which allow the user to specify a desired
+`execution policy `_.
+The `parallel_unsequenced_policy `_
+maps relatively well to the execution model of many accelerators, such as GPUs.
+This, coupled with the ubiquity of GPU accelerated algorithm libraries that
+implement most / all corresponding libraries in the standard library
+(e.g. `rocThrust `_), makes
+it feasible to provide seamless accelerator offload for supported algorithms,
+when an accelerated version exists. Thus, it becomes possible to easily access
+the computational resources of an accelerator, via a well specified, familiar,
+algorithmic interface, without having to delve into low-level hardware specific
+details. Putting it all together:
+
+- **What**: standard library algorithms, when invoked with the
+  ``parallel_unsequenced_policy``
+- **Why**: democratise accelerator programming, without loss of user familiarity
+- **Where**: any and all accelerators that can be targeted by Clang/LLVM via HIP
+
+Small Example
+=
+
+Given the following C++ code, which assumes the ``std`` namespace is included:
+
+.. code-block:: C++
+
+   bool has_the_answer(const vector& v) {
+ return find(execution::par_unseq, cbegin(v), cend(v), 42) != cend(v);
+   }
+
+if Clang is invoked with the ``-hipstdpar --offload-arch=foo`` flags, the call
+to ``find`` will be offloaded to an accelerator that is part of the ``foo``
+target family. If either ``foo`` or its runtime environment do not support
+transparent on-demand paging (such as e.g. that provided in Linux via
+`HMM `_), it is necessary to also include
+the ``--hipstdpar-interpose-alloc`` flag. If the accelerator specific algorithm
+library ``foo`` uses doesn't have an implementation of a particular algorithm,
+execution seamlessly falls back to the host CPU. It is legal to specify multiple
+``--offload-arcj``s. All the flags we introduce, as well as a thorough view of
+various restrictions and their implications will be provided below.
+
+Implementation - General View
+=
+
+We built support for Algorithm Offload support atop the pre-existing HIP
+infrastructure. More specifically, when one requests offload via ``-hipstdpar``,
+compilation is switched to HIP compilation, as if ``-x hip`` was specified.
+Similarly, linking is also switched to HIP linking, as if ``--hip-link`` was
+specified. Note that these are implicit, and one should not assume that any
+interop with HIP specific language constructs is available e.g. ``__device__``
+annotations are neither necessary nor guaranteed to work.
+
+Since there are no language restriction mechanisms in place, it is necessary to
+relax HIP language specific semantic checks performed by the FE; they would
+identify otherwise valid, offloadable code, as invalid HIP code. Given that we
+know that the user intended only for certain algorithms to be offloaded, and
+encoded this by specifying the ``parallel_unsequenced_policy``, we rely on a
+pass over IR to clean up any and