from:"Tony Tye via Phabricator via cfe\-commits"

[PATCH] D107190: [AMDGPU][HIP] Switch default DWARF version to 5

2021-07-30 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107190/new/

https://reviews.llvm.org/D107190

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106339: Add support to generate Sphinx DOCX documentation

2021-07-24 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In D106339#2890258 , @ldionne wrote:

> What's the benefit of having docx documentation? We generate HTML 
> documentation, which ends up in the website, and that seems strictly superior 
> to generating docx. What do you need it for?
>
> The libc++ changes are almost trivial so I would not object to the change on 
> that basis, however in general I think it's better to avoid adding support 
> for things we won't be using on a regular basis.

We do have a project that requires docx documentation that includes parts of 
the LLVM documentation, so being able to generate it from the build is helpful. 
However, if adding docx support is not useful to anyone else then the changes 
can be kept out of tree.

A few observations are that the makefile.bat and Makefile.sphinx files already 
appear to support many of the build targets supported by Sphinx, so adding docx 
did not seem out of place. Building docx as part of the LLVM build is disabled 
by default so has no impact unless explicitly enabled. The changes to support 
it appeared fairly minor.

The changes not directly related to adding docx support have been split out to 
D106338 , D106734 
 and D106736 
 which may be worth considering independent 
of whether this review is useful.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106339/new/

https://reviews.llvm.org/D106339

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106339: Add support to generate Sphinx DOCX documentation

2021-07-24 Thread Tony Tye via Phabricator via cfe-commits

t-tye updated this revision to Diff 361407.
t-tye added a comment.

Factor out documentation and CMake file changes unrelated to adding DOCX 
support to D106736 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106339/new/

https://reviews.llvm.org/D106339

Files:
  clang-tools-extra/docs/CMakeLists.txt
  clang-tools-extra/docs/conf.py
  clang-tools-extra/docs/make.bat
  clang/docs/CMakeLists.txt
  clang/docs/Makefile.sphinx
  clang/docs/analyzer/conf.py
  clang/docs/analyzer/make.bat
  clang/docs/conf.py
  clang/docs/make.bat
  flang/docs/CMakeLists.txt
  flang/docs/conf.py
  libcxx/docs/CMakeLists.txt
  libcxx/docs/Makefile.sphinx
  libcxx/docs/conf.py
  libunwind/docs/CMakeLists.txt
  libunwind/docs/conf.py
  lld/docs/CMakeLists.txt
  lld/docs/conf.py
  lld/docs/make.bat
  lld/docs/sphinx_intro.rst
  lldb/docs/CMakeLists.txt
  lldb/docs/conf.py
  lldb/docs/resources/build.rst
  llvm/cmake/modules/AddSphinxTarget.cmake
  llvm/cmake/modules/FindSphinx.cmake
  llvm/docs/CMake.rst
  llvm/docs/CMakeLists.txt
  llvm/docs/Makefile.sphinx
  llvm/docs/README.txt
  llvm/docs/conf.py
  llvm/docs/make.bat
  openmp/docs/CMakeLists.txt
  openmp/docs/README.txt
  openmp/docs/conf.py
  polly/docs/CMakeLists.txt
  polly/docs/conf.py

Index: polly/docs/conf.py
===
--- polly/docs/conf.py
+++ polly/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax']
 
+if tags.has('builder-docx'):
+extensions.append('docxbuilder')
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
Index: polly/docs/CMakeLists.txt
===
--- polly/docs/CMakeLists.txt
+++ polly/docs/CMakeLists.txt
@@ -98,6 +98,9 @@
 if (${SPHINX_OUTPUT_MAN})
   add_sphinx_target(man polly)
 endif()
+if (${SPHINX_OUTPUT_DOCX})
+  add_sphinx_target(docx polly)
+endif()
   endif()
 endif()
 
Index: openmp/docs/conf.py
===
--- openmp/docs/conf.py
+++ openmp/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.intersphinx']
 
+if tags.has('builder-docx'):
+extensions.append('docxbuilder')
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
Index: openmp/docs/README.txt
===
--- openmp/docs/README.txt
+++ openmp/docs/README.txt
@@ -58,3 +58,23 @@
 
 Other projects (such as openmp)
 `https://.llvm.org/docs/CommandGuide/Foo.html`
+
+DOCX Output
+===
+
+Building the DOCX files is similar to building the HTML documentation. The
+primary difference is to use the `docx` makefile target, instead of the
+default (which is `html`). Sphinx then produces the DOCX files in the
+directory `/docs/docx/`.
+
+cd 
+cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_DOCX=true 
+make
+
+The correspondence between reStructuredText files and generated DOCX files is:
+
+LLVM project
+`llvm/docs/index.rst` <-> `/docs/docx/LLVM.docx` <-> `/share/doc/llvm/LLVM.docx`
+
+Other projects
+`/docs/index.rst` <-> `/tools//docs/docx/.docx` <-> `/share/doc//.docx`
Index: openmp/docs/CMakeLists.txt
===
--- openmp/docs/CMakeLists.txt
+++ openmp/docs/CMakeLists.txt
@@ -99,5 +99,8 @@
 if (${SPHINX_OUTPUT_MAN})
   add_sphinx_target(man openmp)
 endif()
+if (${SPHINX_OUTPUT_DOCX})
+  add_sphinx_target(docx openmp)
+endif()
   endif()
 endif()
Index: llvm/docs/make.bat
===
--- llvm/docs/make.bat
+++ llvm/docs/make.bat
@@ -31,6 +31,7 @@
 	echo.  text   to make text files
 	echo.  manto make manual pages
 	echo.  texinfoto make Texinfo files
+	echo.  docx   to make DOCX files
 	echo.  gettextto make PO message catalogs
 	echo.  changesto make an overview over all changed/added/deprecated items
 	echo.  linkcheck  to check all external links for integrity
@@ -153,6 +154,14 @@
 	goto end
 )
 
+if "%1" == "docx" (
+	%SPHINXBUILD% -b docx %ALLSPHINXOPTS% %BUILDDIR%/docx
+	if errorlevel 1 exit /b 1
+	echo.
+	echo.Build finished. The DOCX files are in %BUILDDIR%/docx.
+	goto end
+)
+
 if "%1" == "gettext" (
 	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
 	if errorlevel 1 exit /b 1
Index: llvm/docs/conf.py
===
--- llvm/docs/conf.py
+++ llvm/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions =

[PATCH] D106339: Add support to generate Sphinx DOCX documentation

2021-07-24 Thread Tony Tye via Phabricator via cfe-commits

t-tye updated this revision to Diff 361402.
t-tye added a comment.

Split change for clang makefile to elimnate Sphinx warnings of missing .rst 
fies when building man pages into D106734 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106339/new/

https://reviews.llvm.org/D106339

Files:
  clang-tools-extra/docs/CMakeLists.txt
  clang-tools-extra/docs/conf.py
  clang-tools-extra/docs/make.bat
  clang/docs/CMakeLists.txt
  clang/docs/Makefile.sphinx
  clang/docs/analyzer/conf.py
  clang/docs/analyzer/make.bat
  clang/docs/conf.py
  clang/docs/make.bat
  flang/docs/CMakeLists.txt
  flang/docs/conf.py
  libcxx/docs/CMakeLists.txt
  libcxx/docs/Makefile.sphinx
  libcxx/docs/conf.py
  libunwind/docs/CMakeLists.txt
  libunwind/docs/conf.py
  lld/docs/CMakeLists.txt
  lld/docs/conf.py
  lld/docs/make.bat
  lld/docs/sphinx_intro.rst
  lldb/docs/CMakeLists.txt
  lldb/docs/conf.py
  lldb/docs/resources/build.rst
  llvm/cmake/modules/AddSphinxTarget.cmake
  llvm/cmake/modules/FindSphinx.cmake
  llvm/docs/CMake.rst
  llvm/docs/CMakeLists.txt
  llvm/docs/Makefile.sphinx
  llvm/docs/README.txt
  llvm/docs/conf.py
  llvm/docs/make.bat
  openmp/docs/CMakeLists.txt
  openmp/docs/README.txt
  openmp/docs/conf.py
  polly/docs/CMakeLists.txt
  polly/docs/conf.py

Index: polly/docs/conf.py
===
--- polly/docs/conf.py
+++ polly/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax']
 
+if tags.has('builder-docx'):
+extensions.append('docxbuilder')
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
Index: polly/docs/CMakeLists.txt
===
--- polly/docs/CMakeLists.txt
+++ polly/docs/CMakeLists.txt
@@ -98,6 +98,9 @@
 if (${SPHINX_OUTPUT_MAN})
   add_sphinx_target(man polly)
 endif()
+if (${SPHINX_OUTPUT_DOCX})
+  add_sphinx_target(docx polly)
+endif()
   endif()
 endif()
 
Index: openmp/docs/conf.py
===
--- openmp/docs/conf.py
+++ openmp/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.intersphinx']
 
+if tags.has('builder-docx'):
+extensions.append('docxbuilder')
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
Index: openmp/docs/README.txt
===
--- openmp/docs/README.txt
+++ openmp/docs/README.txt
@@ -1,5 +1,5 @@
 OpenMP LLVM Documentation
-==
+=
 
 OpenMP LLVM's documentation is written in reStructuredText, a lightweight
 plaintext markup language (file extension `.rst`). While the
@@ -14,11 +14,15 @@
 cd 
 cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true 
 make
-$BROWSER /projects/openmp/docs//html/index.html
+$BROWSER /projects/openmp/docs/html/index.html
 
-The mapping between reStructuredText files and generated documentation is
-`docs/Foo.rst` <-> `/projects/openmp/docs//html/Foo.html` <->
-`https://openmp.llvm.org/docs/Foo.html`.
+The correspondence between reStructuredText files and generated HTML pages is:
+
+LLVM project
+`llvm/docs/Foo.rst` <-> `/docs/html/Foo.html` <-> `/share/doc/llvm/html/Foo.html` <-> `https://llvm.org/docs/Foo.html`
+
+Other projects
+`/docs/Foo.rst` <-> `/tools//docs/html/Foo.html` <-> `/share/doc//html/Foo.html`<-> `https://.llvm.org/docs/Foo.html`
 
 If you are interested in writing new documentation, you will want to read
 `llvm/docs/SphinxQuickstartTemplate.rst` which will get you writing
@@ -26,7 +30,7 @@
 reStructuredText markup syntax.
 
 Manpage Output
-===
+==
 
 Building the manpages is similar to building the HTML documentation. The
 primary difference is to use the `man` makefile target, instead of the
@@ -36,10 +40,41 @@
 cd 
 cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_MAN=true 
 make
-man -l >build-dir>/docs/man/FileCheck.1
+man -l /docs/man/FileCheck.1
+
+The correspondence between reStructuredText files and generated man pages is:
+
+LLVM project
+`llvm/docs/CommandGuide/Foo.rst` <-> `/docs/man/Foo.1` <-> `/share/man/man1/Foo.1`
+
+Other projects
+`/docs/CommandGuide/Foo.rst` <-> `/tools//docs/man/Foo.1` <-> `/share/man/man1/Foo.1`
 
-The correspondence between .rst files and man pages is
-`docs/CommandGuide/Foo.rst` <-> `/projects/openmp/docs//man/Foo.1`.
 These .rst files are also included during HTML generation so they are also
-viewable online (as noted above) at e.g.
-`https://openmp.llvm.org/docs/CommandGuide/Foo.html`.
+viewable online:
+
+LLVM project

[PATCH] D106734: Eliminate clang man page generation warning for missing .rst files

2021-07-23 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added reviewers: kzhuravl, scott.linder.
Herald added a subscriber: mgorny.
t-tye requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

Update `clang/docs/CMakeLists.txt` for Sphinx man builder to do the same as for
the html builder to ensure .td generated .rst files are available to prevent
warnings of missing .rst files.

Document `SOURCE_DIR` named argument to `add_sphinx_target` function.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106734

Files:
  clang/docs/CMakeLists.txt
  llvm/cmake/modules/AddSphinxTarget.cmake


Index: llvm/cmake/modules/AddSphinxTarget.cmake
===
--- llvm/cmake/modules/AddSphinxTarget.cmake
+++ llvm/cmake/modules/AddSphinxTarget.cmake
@@ -20,6 +20,9 @@
 # ``project`` should be the project name
 #
 # Named arguments:
+# ``SOURCE_DIR`` source directory for Sphinx files. Defaults to
+#${CMAKE_CURRENT_SOURCE_DIR}.
+#
 # ``ENV_VARS`` should be a list of environment variables that should be set 
when
 #  running Sphinx. Each environment variable should be a string 
with
 #  the form KEY=VALUE.
Index: clang/docs/CMakeLists.txt
===
--- clang/docs/CMakeLists.txt
+++ clang/docs/CMakeLists.txt
@@ -103,19 +103,26 @@
 if (LLVM_ENABLE_SPHINX)
   include(AddSphinxTarget)
   if (SPHINX_FOUND)
+
+# Copy rst files to build directory before generating the
+# documentation.  Some of the rst files are generated, so they
+# only exist in the build directory.  Sphinx needs all files in
+# the same directory in order to generate the files, so we need to
+# copy all the non-generated rst files from the source to the build
+# directory before we run Sphinx.
+add_custom_target(copy-clang-rst-docs
+  COMMAND "${CMAKE_COMMAND}" -E copy_directory
+  "${CMAKE_CURRENT_SOURCE_DIR}"
+  "${CMAKE_CURRENT_BINARY_DIR}")
+
+# Generated files
+gen_rst_file_from_td(AttributeReference.rst -gen-attr-docs 
../include/clang/Basic/Attr.td copy-clang-rst-docs)
+gen_rst_file_from_td(DiagnosticsReference.rst -gen-diag-docs 
../include/clang/Basic/Diagnostic.td copy-clang-rst-docs)
+gen_rst_file_from_td(ClangCommandLineReference.rst -gen-opt-docs 
../include/clang/Driver/ClangOptionDocs.td copy-clang-rst-docs)
+
 if (${SPHINX_OUTPUT_HTML})
   add_sphinx_target(html clang SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}")
 
-  # Copy rst files to build directory before generating the html
-  # documentation.  Some of the rst files are generated, so they
-  # only exist in the build directory.  Sphinx needs all files in
-  # the same directory in order to generate the html, so we need to
-  # copy all the non-gnerated rst files from the source to the build
-  # directory before we run sphinx.
-  add_custom_target(copy-clang-rst-docs
-COMMAND "${CMAKE_COMMAND}" -E copy_directory
-"${CMAKE_CURRENT_SOURCE_DIR}"
-"${CMAKE_CURRENT_BINARY_DIR}")
   add_dependencies(docs-clang-html copy-clang-rst-docs)
 
   add_custom_command(TARGET docs-clang-html POST_BUILD
@@ -123,13 +130,12 @@
 "${CMAKE_CURRENT_SOURCE_DIR}/LibASTMatchersReference.html"
 "${CMAKE_CURRENT_BINARY_DIR}/html/LibASTMatchersReference.html")
 
-  # Generated files
-  gen_rst_file_from_td(AttributeReference.rst -gen-attr-docs 
../include/clang/Basic/Attr.td docs-clang-html)
-  gen_rst_file_from_td(DiagnosticsReference.rst -gen-diag-docs 
../include/clang/Basic/Diagnostic.td docs-clang-html)
-  gen_rst_file_from_td(ClangCommandLineReference.rst -gen-opt-docs 
../include/clang/Driver/ClangOptionDocs.td docs-clang-html)
 endif()
+
 if (${SPHINX_OUTPUT_MAN})
-  add_sphinx_target(man clang)
+  add_sphinx_target(man clang SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}")
+  add_dependencies(docs-clang-man copy-clang-rst-docs)
 endif()
+
   endif()
 endif()


Index: llvm/cmake/modules/AddSphinxTarget.cmake
===
--- llvm/cmake/modules/AddSphinxTarget.cmake
+++ llvm/cmake/modules/AddSphinxTarget.cmake
@@ -20,6 +20,9 @@
 # ``project`` should be the project name
 #
 # Named arguments:
+# ``SOURCE_DIR`` source directory for Sphinx files. Defaults to
+#${CMAKE_CURRENT_SOURCE_DIR}.
+#
 # ``ENV_VARS`` should be a list of environment variables that should be set when
 #  running Sphinx. Each environment variable should be a string with
 #  the form KEY=VALUE.
Index: clang/docs/CMakeLists.txt
===
--- clang/docs/CMakeLists.txt
+++ clang/docs/CMakeLists.txt
@@ -103,19 +103,26 @@
 if (LLVM_ENABLE_SPHINX)
   include(AddSphinxTarget)
   if (SPHINX_FOUND)
+
+# Copy rst files to

[PATCH] D106339: Add support to generate Sphinx DOCX documentation

2021-07-20 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added a reviewer: scott.linder.
Herald added subscribers: libcxx-commits, mgorny.
Herald added a reviewer: bollu.
Herald added a reviewer: MaskRay.
Herald added a reviewer: sscalpone.
Herald added a project: libunwind.
Herald added a reviewer: libunwind.
t-tye requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: cfe-commits, llvm-commits, openmp-commits, 
lldb-commits, sstefan1.
Herald added projects: clang, LLDB, libc++, OpenMP, LLVM, clang-tools-extra.
Herald added a reviewer: libc++.

Add support for Sphinx docx builder. Requires Sphinx docx builder to be
installed using `pip install docxbuilder`.

Added `SPHINX_OUTPUT_DOCX` CMake variable to control generation of docx
documentation. Defaults to `OFF`.

Added `_INSTALL_SPHINX_DOCX_DIR` CMake variable to control where docx
files are installed. Defaults to `share/doc//.docx`.

Documented new CMake variables in `llvm/docs/CMake.rst`.

Updated description of building documentation is `lld/docs/sphinx_intro.rst`,
`lldb/docs/resources/build.rst`, `llvm/docs/README.txt`, and
`openmp/docs/README.txt`,

Update `clang/docs/CMakeLists.txt` for Sphinx man and docx builders to ensure
.td generated .rst files are available to prevent warnings of missing rst files.
Minor CMake cleanups.

Added `docx` as an additional target for the documentation make.bat scripts.

Added `docx` as an additional target rule in the Makefile.sphinx files.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106339

Files:
  clang-tools-extra/docs/CMakeLists.txt
  clang-tools-extra/docs/conf.py
  clang-tools-extra/docs/make.bat
  clang/docs/CMakeLists.txt
  clang/docs/Makefile.sphinx
  clang/docs/analyzer/conf.py
  clang/docs/analyzer/make.bat
  clang/docs/conf.py
  clang/docs/make.bat
  flang/docs/CMakeLists.txt
  flang/docs/conf.py
  libcxx/docs/CMakeLists.txt
  libcxx/docs/Makefile.sphinx
  libcxx/docs/conf.py
  libunwind/docs/CMakeLists.txt
  libunwind/docs/conf.py
  lld/docs/CMakeLists.txt
  lld/docs/conf.py
  lld/docs/make.bat
  lld/docs/sphinx_intro.rst
  lldb/docs/CMakeLists.txt
  lldb/docs/conf.py
  lldb/docs/resources/build.rst
  llvm/cmake/modules/AddSphinxTarget.cmake
  llvm/cmake/modules/FindSphinx.cmake
  llvm/docs/CMake.rst
  llvm/docs/CMakeLists.txt
  llvm/docs/Makefile.sphinx
  llvm/docs/README.txt
  llvm/docs/conf.py
  llvm/docs/make.bat
  openmp/docs/CMakeLists.txt
  openmp/docs/README.txt
  openmp/docs/conf.py
  polly/docs/CMakeLists.txt
  polly/docs/conf.py

Index: polly/docs/conf.py
===
--- polly/docs/conf.py
+++ polly/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax']
 
+if tags.has('builder-docx'):
+extensions.append('docxbuilder')
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
Index: polly/docs/CMakeLists.txt
===
--- polly/docs/CMakeLists.txt
+++ polly/docs/CMakeLists.txt
@@ -98,6 +98,9 @@
 if (${SPHINX_OUTPUT_MAN})
   add_sphinx_target(man polly)
 endif()
+if (${SPHINX_OUTPUT_DOCX})
+  add_sphinx_target(docx polly)
+endif()
   endif()
 endif()
 
Index: openmp/docs/conf.py
===
--- openmp/docs/conf.py
+++ openmp/docs/conf.py
@@ -28,6 +28,9 @@
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.intersphinx']
 
+if tags.has('builder-docx'):
+extensions.append('docxbuilder')
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
Index: openmp/docs/README.txt
===
--- openmp/docs/README.txt
+++ openmp/docs/README.txt
@@ -1,5 +1,5 @@
 OpenMP LLVM Documentation
-==
+=
 
 OpenMP LLVM's documentation is written in reStructuredText, a lightweight
 plaintext markup language (file extension `.rst`). While the
@@ -14,11 +14,15 @@
 cd 
 cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true 
 make
-$BROWSER /projects/openmp/docs//html/index.html
+$BROWSER /projects/openmp/docs/html/index.html
 
-The mapping between reStructuredText files and generated documentation is
-`docs/Foo.rst` <-> `/projects/openmp/docs//html/Foo.html` <->
-`https://openmp.llvm.org/docs/Foo.html`.
+The correspondence between reStructuredText files and generated HTML pages is:
+
+LLVM project
+`llvm/docs/Foo.rst` <-> `/docs/html/Foo.html` <-> `/share/doc/llvm/html/Foo.html` <-> `https://llvm.org/docs/Foo.html`
+
+Other projects
+`/docs/Foo.rst` <-> `/tools//docs/html/Foo.html` <-> `/share/doc//html/Foo.html`<-> `https://.llvm.org/docs/Foo.html`
 
 If you are

[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

2021-04-15 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/tools/amdgpu-arch/CMakeLists.txt:9
+
+find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS 
/opt/rocm)
+if (NOT ${hsa-runtime64_FOUND})

JonChesterfield wrote:
> JonChesterfield wrote:
> > gregrodgers wrote:
> > > What happens when /opt/rocm is not available?   Again, we need a 
> > > cross-architecture mechanism to identify the offload-arch. 
> > Exactly the same as the amdgpu plugin. The cmake detection is char for char 
> > identical. This will look in CMAKE_INSTALL_PREFIX, which is where I install 
> > these libs when using trunk, and falls back to /opt/rocm which seems to be 
> > convenient for some users.
> Which may need revising at some point - I like installing hsa as if it was an 
> llvm subcomponent, but other people might want a different convention. As 
> long as we remember to change this file + amdgpu's cmake at the same time, 
> all good.
/opt/tocm will not work with the side-by-side ROCm installation which installs 
ROCm in directories with the version number. Should there be the ability to 
configure this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D98746: [clang][amdgpu] Use implicit code object default

2021-03-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

I vaguely remember that clang needed to know what code object it was going to 
request as it used that to either validate other options, or change the format 
of other passed cc1 options. If that is true, then I am not sure the defaulting 
approach works as clang will not know what the backend will be defaulting to. 
@yaxunl can you remember this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98746/new/

https://reviews.llvm.org/D98746

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D98746: [clang][amdgpu] Use implicit code object default

2021-03-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

I have no opinion, just making an observation and defer to @kzhuravl .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98746/new/

https://reviews.llvm.org/D98746

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D98746: [clang][amdgpu] Use implicit code object default

2021-03-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1115-1124
+  // Currently defaults to 3 in AMDGPUBaseInfo.cpp
+  // Using that default lets clang emit IR for amdgcn when llvm has been built
+  // without that target, provided the user wants this code object version
+  if (CodeObjVer != 3) {
+CmdArgs.insert(CmdArgs.begin() + 1,
+   Args.MakeArgString(Twine("--amdhsa-code-object-version=") +
+  Twine(CodeObjVer)));

This seem rather fragile. If the backend default changes and this code is not 
updated it will likely result in creating the wrong code object version.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98746/new/

https://reviews.llvm.org/D98746

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93525: [OpenMP] Add unbundling of archives containing bundled object files into device specific archives

2021-01-20 Thread Tony Tye via Phabricator via cfe-commits

t-tye requested changes to this revision.
t-tye added a comment.
This revision now requires changes to proceed.

In D93525#2509796 , @jdoerfert wrote:

>> At the moment this patch defines compatibility as exact string match of 
>> bundler entry ID.
>> [...]
>> Supporting target ID requires little more work and discussion.
>
> Let's get this in first, then revisit target ID support as we need it.

I do not think this patch should ignore target ID as that is now upstreamed and 
documented. What is involved in correcting the compatibility test to be correct 
by the target ID rules? There are examples of doing this in all the runtimes 
and I can help if that is useful.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93525/new/

https://reviews.llvm.org/D93525

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93525: [OpenMP] Add unbundling of archives containing bundled object files into device specific archives

2021-01-13 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In D93525#2495374 , @saiislam wrote:

> In D93525#2493752 , @t-tye wrote:
>
>> In D93525#2493024 , @yaxunl wrote:
>>
>>> can you document this in ClangOffloadBundler.rst ? I think we need a clear 
>>> description about how clang-offload-bundler knows which file in the .a file 
>>> belongs to which target.
>>
>> How does the .a relate to bundled code objects? Does the .a have a number of 
>> bundled code objects? If so wouldn't the identity of code objects be defined 
>> by the existing bundled code object ABI already documented? If the .a is a 
>> set of non-bundled code objects then defining how they are identified is not 
>> part of the clang-offload-bundler documentation as there are no bundled code 
>> objects involved. It would seem that the documentation belongs with the 
>> OpenMP runtime/compiler that is choosing to use .a files in this manner.
>
> Bundles (created using clang-offload-bundler) are passed to llvm-ar to create 
> an archive of bundled objects (*.a file). An archive can have bundles for 
> multiple device types. So, yes, the identity of code objects is defined by 
> the existing bundled code object ABI.
> This patch reads such an archive and produces a device-specific archive for 
> each of the target devices given as input. Each device-specific archive 
> contains all the code objects corresponding to that particular device and are 
> written as per llvm archive format.
>
> Here is a snippet of relevant lit run lines:
>
>   // RUN: %clang -O0 -target %itanium_abi_triple %s -c -o %t.o
>   
>   // RUN: echo 'Content of device file 1' > %t.tgt1
>   // RUN: clang-offload-bundler -type=o 
> -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 
> -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o
>
>   // RUN: echo 'Content of device file 2' > %t.tgt2
>   // RUN: clang-offload-bundler -type=o 
> -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 
> -inputs=%t.o,%t.tgt2 -outputs=%t.abundle2.o
>
>   // RUN: llvm-ar cr %t.lib.a %t.abundle1.o %t.abundle2.o
>   
>   This patch ==>
>   // RUN: clang-offload-bundler -unbundle -type=a 
> -targets=openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.lib.a 
> -outputs=%t.devicelib.a
>   
>   %t.devicelib.a will contain all devices objects corresponding to gfx900
>
> Though my interest originates from OpenMP side, Device-specific Archive 
> Libraries created like this can be used by other offloading languages like 
> HIP, CUDA, and OpenCL. Pelase refer D81109  
> for the an earlier patch in the series of patches which will enable this.

The naming of code objects in a bundled code object includes the processor name 
and the settings for target features (see 
https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id and 
https://llvm.org/docs/AMDGPUUsage.html#target-id). The compatibility of code 
objects considers both target processor matching and target feature 
compatibility. Target features can have three settings: on, off and any. The 
compatibility is that each feature that is on/off must exactly match, but any 
will match either on or off.

So when unbundling an archive how is the desired code object being requested? 
How is it handling the target features? For example, if code objects that will 
be compatible with a feature being on is required, then matching code objects 
in the archive would be those that have that feature either on or any.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93525/new/

https://reviews.llvm.org/D93525

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93525: [OpenMP] Add unbundling of archives containing bundled object files into device specific archives

2021-01-12 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In D93525#2493024 , @yaxunl wrote:

> can you document this in ClangOffloadBundler.rst ? I think we need a clear 
> description about how clang-offload-bundler knows which file in the .a file 
> belongs to which target.

How does the .a relate to bundled code objects? Does the .a have a number of 
bundled code objects? If so wouldn't the identity of code objects be defined by 
the existing bundled code object ABI already documented? If the .a is a set of 
non-bundled code objects then defining how they are identified is not part of 
the clang-offload-bundler documentation as there are no bundled code objects 
involved. It would seem that the documentation belongs with the OpenMP 
runtime/compiler that is choosing to use .a files in this manner.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93525/new/

https://reviews.llvm.org/D93525

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D94338: [Clang][Docs] Fix ambiguity in clang-offload-bundler docs

2021-01-08 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94338/new/

https://reviews.llvm.org/D94338

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93648: Revert "[AMDGPU][HIP] Switch default DWARF version to 5"

2020-12-21 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93648/new/

https://reviews.llvm.org/D93648

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93398: [NFC] Use regex for code object version in hip tests

2020-12-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93398/new/

https://reviews.llvm.org/D93398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93398: [NFC] Use regex for code object version in hip tests

2020-12-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/test/Driver/hip-code-object-version.hip:56
 
-// V4: "-mllvm" "--amdhsa-code-object-version=4"
-// V4: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
+// VD: "-mllvm" "--amdhsa-code-object-version=4"
+// VD: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"

t-tye wrote:
> If the upstream default is being changed to 3 does this need to also be 3?
I can also help getting this landed in to internal branches.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93398/new/

https://reviews.llvm.org/D93398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93398: [NFC] Use regex for code object version in hip tests

2020-12-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/test/Driver/hip-code-object-version.hip:56
 
-// V4: "-mllvm" "--amdhsa-code-object-version=4"
-// V4: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"
+// VD: "-mllvm" "--amdhsa-code-object-version=4"
+// VD: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa--gfx906"

If the upstream default is being changed to 3 does this need to also be 3?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93398/new/

https://reviews.llvm.org/D93398

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93258: [amdgpu] Default to code object v3

2020-12-14 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93258/new/

https://reviews.llvm.org/D93258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93258: [amdgpu] Default to code object v3

2020-12-14 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:2874-2876
 .. warning::
   Code object V3 is not the default code object version emitted by this version
   of LLVM.

Move this to the "Code Object V4 Metadata" section.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93258/new/

https://reviews.llvm.org/D93258

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93181: [NFC][AMDGPU] Reformat AMD GPU targets in cuda.cpp

2020-12-13 Thread Tony Tye via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG5ad202ce8963: [NFC][AMDGPU] Reformat AMD GPU targets in 
cuda.cpp (authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93181/new/

https://reviews.llvm.org/D93181

Files:
  clang/lib/Basic/Cuda.cpp


Index: clang/lib/Basic/Cuda.cpp
===
--- clang/lib/Basic/Cuda.cpp
+++ clang/lib/Basic/Cuda.cpp
@@ -72,23 +72,34 @@
 SM(70), SM(72),  // Volta
 SM(75),  // Turing
 SM(80),  // Ampere
-GFX(600), // tahiti
-GFX(601), // pitcairn, verde
-GFX(602), // oland, hainan
-GFX(700), // kaveri
-GFX(701), // hawaii
-GFX(702), // 290,290x,R390,R390x
-GFX(703), // kabini mullins
-GFX(704), // bonaire
-GFX(705),
-GFX(801), // carrizo
-GFX(802), // tonga,iceland
-GFX(803), // fiji,polaris10
-GFX(805), // tongapro
-GFX(810), // stoney
-GFX(900), // vega, instinct
-GFX(902), GFX(904), GFX(906), GFX(908), GFX(909), GFX(90c),
-GFX(1010), GFX(1011), GFX(1012), GFX(1030), GFX(1031), GFX(1032), GFX(1033)
+GFX(600),  // gfx600
+GFX(601),  // gfx601
+GFX(602),  // gfx602
+GFX(700),  // gfx700
+GFX(701),  // gfx701
+GFX(702),  // gfx702
+GFX(703),  // gfx703
+GFX(704),  // gfx704
+GFX(705),  // gfx705
+GFX(801),  // gfx801
+GFX(802),  // gfx802
+GFX(803),  // gfx803
+GFX(805),  // gfx805
+GFX(810),  // gfx810
+GFX(900),  // gfx900
+GFX(902),  // gfx902
+GFX(904),  // gfx903
+GFX(906),  // gfx906
+GFX(908),  // gfx908
+GFX(909),  // gfx909
+GFX(90c),  // gfx90c
+GFX(1010), // gfx1010
+GFX(1011), // gfx1011
+GFX(1012), // gfx1012
+GFX(1030), // gfx1030
+GFX(1031), // gfx1031
+GFX(1032), // gfx1032
+GFX(1033), // gfx1033
 // clang-format on
 };
 #undef SM


Index: clang/lib/Basic/Cuda.cpp
===
--- clang/lib/Basic/Cuda.cpp
+++ clang/lib/Basic/Cuda.cpp
@@ -72,23 +72,34 @@
 SM(70), SM(72),  // Volta
 SM(75),  // Turing
 SM(80),  // Ampere
-GFX(600), // tahiti
-GFX(601), // pitcairn, verde
-GFX(602), // oland, hainan
-GFX(700), // kaveri
-GFX(701), // hawaii
-GFX(702), // 290,290x,R390,R390x
-GFX(703), // kabini mullins
-GFX(704), // bonaire
-GFX(705),
-GFX(801), // carrizo
-GFX(802), // tonga,iceland
-GFX(803), // fiji,polaris10
-GFX(805), // tongapro
-GFX(810), // stoney
-GFX(900), // vega, instinct
-GFX(902), GFX(904), GFX(906), GFX(908), GFX(909), GFX(90c),
-GFX(1010), GFX(1011), GFX(1012), GFX(1030), GFX(1031), GFX(1032), GFX(1033)
+GFX(600),  // gfx600
+GFX(601),  // gfx601
+GFX(602),  // gfx602
+GFX(700),  // gfx700
+GFX(701),  // gfx701
+GFX(702),  // gfx702
+GFX(703),  // gfx703
+GFX(704),  // gfx704
+GFX(705),  // gfx705
+GFX(801),  // gfx801
+GFX(802),  // gfx802
+GFX(803),  // gfx803
+GFX(805),  // gfx805
+GFX(810),  // gfx810
+GFX(900),  // gfx900
+GFX(902),  // gfx902
+GFX(904),  // gfx903
+GFX(906),  // gfx906
+GFX(908),  // gfx908
+GFX(909),  // gfx909
+GFX(90c),  // gfx90c
+GFX(1010), // gfx1010
+GFX(1011), // gfx1011
+GFX(1012), // gfx1012
+GFX(1030), // gfx1030
+GFX(1031), // gfx1031
+GFX(1032), // gfx1032
+GFX(1033), // gfx1033
 // clang-format on
 };
 #undef SM
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93181: [NFC][AMDGPU] Reformat AMD GPU targets in cuda.cpp

2020-12-13 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added reviewers: kzhuravl, ronlieb.
Herald added subscribers: dexonsmith, tpr, dstuttard, yaxunl.
t-tye requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D93181

Files:
  clang/lib/Basic/Cuda.cpp


Index: clang/lib/Basic/Cuda.cpp
===
--- clang/lib/Basic/Cuda.cpp
+++ clang/lib/Basic/Cuda.cpp
@@ -72,23 +72,34 @@
 SM(70), SM(72),  // Volta
 SM(75),  // Turing
 SM(80),  // Ampere
-GFX(600), // tahiti
-GFX(601), // pitcairn, verde
-GFX(602), // oland, hainan
-GFX(700), // kaveri
-GFX(701), // hawaii
-GFX(702), // 290,290x,R390,R390x
-GFX(703), // kabini mullins
-GFX(704), // bonaire
-GFX(705),
-GFX(801), // carrizo
-GFX(802), // tonga,iceland
-GFX(803), // fiji,polaris10
-GFX(805), // tongapro
-GFX(810), // stoney
-GFX(900), // vega, instinct
-GFX(902), GFX(904), GFX(906), GFX(908), GFX(909), GFX(90c),
-GFX(1010), GFX(1011), GFX(1012), GFX(1030), GFX(1031), GFX(1032), GFX(1033)
+GFX(600),  // gfx600
+GFX(601),  // gfx601
+GFX(602),  // gfx602
+GFX(700),  // gfx700
+GFX(701),  // gfx701
+GFX(702),  // gfx702
+GFX(703),  // gfx703
+GFX(704),  // gfx704
+GFX(705),  // gfx705
+GFX(801),  // gfx801
+GFX(802),  // gfx802
+GFX(803),  // gfx803
+GFX(805),  // gfx805
+GFX(810),  // gfx810
+GFX(900),  // gfx900
+GFX(902),  // gfx902
+GFX(904),  // gfx903
+GFX(906),  // gfx906
+GFX(908),  // gfx908
+GFX(909),  // gfx909
+GFX(90c),  // gfx90c
+GFX(1010), // gfx1010
+GFX(1011), // gfx1011
+GFX(1012), // gfx1012
+GFX(1030), // gfx1030
+GFX(1031), // gfx1031
+GFX(1032), // gfx1032
+GFX(1033), // gfx1033
 // clang-format on
 };
 #undef SM


Index: clang/lib/Basic/Cuda.cpp
===
--- clang/lib/Basic/Cuda.cpp
+++ clang/lib/Basic/Cuda.cpp
@@ -72,23 +72,34 @@
 SM(70), SM(72),  // Volta
 SM(75),  // Turing
 SM(80),  // Ampere
-GFX(600), // tahiti
-GFX(601), // pitcairn, verde
-GFX(602), // oland, hainan
-GFX(700), // kaveri
-GFX(701), // hawaii
-GFX(702), // 290,290x,R390,R390x
-GFX(703), // kabini mullins
-GFX(704), // bonaire
-GFX(705),
-GFX(801), // carrizo
-GFX(802), // tonga,iceland
-GFX(803), // fiji,polaris10
-GFX(805), // tongapro
-GFX(810), // stoney
-GFX(900), // vega, instinct
-GFX(902), GFX(904), GFX(906), GFX(908), GFX(909), GFX(90c),
-GFX(1010), GFX(1011), GFX(1012), GFX(1030), GFX(1031), GFX(1032), GFX(1033)
+GFX(600),  // gfx600
+GFX(601),  // gfx601
+GFX(602),  // gfx602
+GFX(700),  // gfx700
+GFX(701),  // gfx701
+GFX(702),  // gfx702
+GFX(703),  // gfx703
+GFX(704),  // gfx704
+GFX(705),  // gfx705
+GFX(801),  // gfx801
+GFX(802),  // gfx802
+GFX(803),  // gfx803
+GFX(805),  // gfx805
+GFX(810),  // gfx810
+GFX(900),  // gfx900
+GFX(902),  // gfx902
+GFX(904),  // gfx903
+GFX(906),  // gfx906
+GFX(908),  // gfx908
+GFX(909),  // gfx909
+GFX(90c),  // gfx90c
+GFX(1010), // gfx1010
+GFX(1011), // gfx1011
+GFX(1012), // gfx1012
+GFX(1030), // gfx1030
+GFX(1031), // gfx1031
+GFX(1032), // gfx1032
+GFX(1033), // gfx1033
 // clang-format on
 };
 #undef SM
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93018: [AMDGPU] Add missing targets to target-invalid-cpu-note.c

2020-12-12 Thread Tony Tye via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG7beee561e23d: [AMDGPU] Add missing targets to 
target-invalid-cpu-note.c (authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93018/new/

https://reviews.llvm.org/D93018

Files:
  clang/test/Misc/target-invalid-cpu-note.c


Index: clang/test/Misc/target-invalid-cpu-note.c
===
--- clang/test/Misc/target-invalid-cpu-note.c
+++ clang/test/Misc/target-invalid-cpu-note.c
@@ -73,9 +73,9 @@
 
 // RUN: not %clang_cc1 -triple r600--- -target-cpu not-a-cpu -fsyntax-only %s 
2>&1 | FileCheck %s --check-prefix R600
 // R600: error: unknown target CPU 'not-a-cpu'
-// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780, 
-// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar, 
-// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts, 
+// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780,
+// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar,
+// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts,
 // R600-SAME: caicos, aruba, cayman, turks
 
 
@@ -83,9 +83,11 @@
 // AMDGCN: error: unknown target CPU 'not-a-cpu'
 // AMDGCN: note: valid target CPU values are: gfx600, tahiti, gfx601, 
pitcairn, verde,
 // AMDGCN-SAME: gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702,
-// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, 
carrizo, 
+// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, 
carrizo,
 // AMDGCN-SAME: gfx802, iceland, tonga, gfx803, fiji, polaris10, polaris11,
-// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902
+// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902, gfx904, 
gfx906,
+// AMDGCN-SAME: gfx908, gfx909, gfx90c, gfx1010, gfx1011, gfx1012, gfx1030, 
gfx1031,
+// AMDGCN-SAME: gfx1032, gfx1033
 
 // RUN: not %clang_cc1 -triple wasm64--- -target-cpu not-a-cpu -fsyntax-only 
%s 2>&1 | FileCheck %s --check-prefix WEBASM
 // WEBASM: error: unknown target CPU 'not-a-cpu'


Index: clang/test/Misc/target-invalid-cpu-note.c
===
--- clang/test/Misc/target-invalid-cpu-note.c
+++ clang/test/Misc/target-invalid-cpu-note.c
@@ -73,9 +73,9 @@
 
 // RUN: not %clang_cc1 -triple r600--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix R600
 // R600: error: unknown target CPU 'not-a-cpu'
-// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780, 
-// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar, 
-// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts, 
+// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780,
+// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar,
+// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts,
 // R600-SAME: caicos, aruba, cayman, turks
 
 
@@ -83,9 +83,11 @@
 // AMDGCN: error: unknown target CPU 'not-a-cpu'
 // AMDGCN: note: valid target CPU values are: gfx600, tahiti, gfx601, pitcairn, verde,
 // AMDGCN-SAME: gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702,
-// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo, 
+// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo,
 // AMDGCN-SAME: gfx802, iceland, tonga, gfx803, fiji, polaris10, polaris11,
-// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902
+// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902, gfx904, gfx906,
+// AMDGCN-SAME: gfx908, gfx909, gfx90c, gfx1010, gfx1011, gfx1012, gfx1030, gfx1031,
+// AMDGCN-SAME: gfx1032, gfx1033
 
 // RUN: not %clang_cc1 -triple wasm64--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix WEBASM
 // WEBASM: error: unknown target CPU 'not-a-cpu'
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93017: [AMDGPU] Add missing targets to amdgpu-features.cl

2020-12-12 Thread Tony Tye via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG92ab6ed6672b: [AMDGPU] Add missing targets to 
amdgpu-features.cl (authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93017/new/

https://reviews.llvm.org/D93017

Files:
  clang/test/CodeGenOpenCL/amdgpu-features.cl


Index: clang/test/CodeGenOpenCL/amdgpu-features.cl
===
--- clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -7,7 +7,18 @@
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX601 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx602 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX602 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX700 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx701 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX701 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx702 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX702 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx703 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX703 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx704 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX704 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx705 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX705 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX801 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx802 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX802 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx803 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX803 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx805 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX805 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx810 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX810 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx900 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx902 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX902 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX904 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX906 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX908 %s


Index: clang/test/CodeGenOpenCL/amdgpu-features.cl
===
--- clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -7,7 +7,18 @@
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx602 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX602 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX700 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx701 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX701 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx702 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX702 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx703 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX703 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx704 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX704 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx705 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX705 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX801 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx802 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX802 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx803 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX803 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx805 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX805 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx810 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX810 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx900 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx902 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX902 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX904 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX906 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX908 %s

[PATCH] D93018: [AMDGPU] Add missing targets to target-invalid-cpu-note.c

2020-12-10 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added a reviewer: kzhuravl.
Herald added subscribers: tpr, dstuttard, yaxunl.
t-tye requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D93018

Files:
  clang/test/Misc/target-invalid-cpu-note.c


Index: clang/test/Misc/target-invalid-cpu-note.c
===
--- clang/test/Misc/target-invalid-cpu-note.c
+++ clang/test/Misc/target-invalid-cpu-note.c
@@ -73,9 +73,9 @@
 
 // RUN: not %clang_cc1 -triple r600--- -target-cpu not-a-cpu -fsyntax-only %s 
2>&1 | FileCheck %s --check-prefix R600
 // R600: error: unknown target CPU 'not-a-cpu'
-// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780, 
-// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar, 
-// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts, 
+// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780,
+// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar,
+// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts,
 // R600-SAME: caicos, aruba, cayman, turks
 
 
@@ -83,9 +83,11 @@
 // AMDGCN: error: unknown target CPU 'not-a-cpu'
 // AMDGCN: note: valid target CPU values are: gfx600, tahiti, gfx601, 
pitcairn, verde,
 // AMDGCN-SAME: gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702,
-// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, 
carrizo, 
+// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, 
carrizo,
 // AMDGCN-SAME: gfx802, iceland, tonga, gfx803, fiji, polaris10, polaris11,
-// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902
+// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902, gfx904, 
gfx906,
+// AMDGCN-SAME: gfx908, gfx909, gfx90c, gfx1010, gfx1011, gfx1012, gfx1030, 
gfx1031,
+// AMDGCN-SAME: gfx1032, gfx1033
 
 // RUN: not %clang_cc1 -triple wasm64--- -target-cpu not-a-cpu -fsyntax-only 
%s 2>&1 | FileCheck %s --check-prefix WEBASM
 // WEBASM: error: unknown target CPU 'not-a-cpu'


Index: clang/test/Misc/target-invalid-cpu-note.c
===
--- clang/test/Misc/target-invalid-cpu-note.c
+++ clang/test/Misc/target-invalid-cpu-note.c
@@ -73,9 +73,9 @@
 
 // RUN: not %clang_cc1 -triple r600--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix R600
 // R600: error: unknown target CPU 'not-a-cpu'
-// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780, 
-// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar, 
-// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts, 
+// R600: note: valid target CPU values are: r600, rv630, rv635, r630, rs780,
+// R600-SAME: rs880, rv610, rv620, rv670, rv710, rv730, rv740, rv770, cedar,
+// R600-SAME: palm, cypress, hemlock, juniper, redwood, sumo, sumo2, barts,
 // R600-SAME: caicos, aruba, cayman, turks
 
 
@@ -83,9 +83,11 @@
 // AMDGCN: error: unknown target CPU 'not-a-cpu'
 // AMDGCN: note: valid target CPU values are: gfx600, tahiti, gfx601, pitcairn, verde,
 // AMDGCN-SAME: gfx602, hainan, oland, gfx700, kaveri, gfx701, hawaii, gfx702,
-// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo, 
+// AMDGCN-SAME: gfx703, kabini, mullins, gfx704, bonaire, gfx705, gfx801, carrizo,
 // AMDGCN-SAME: gfx802, iceland, tonga, gfx803, fiji, polaris10, polaris11,
-// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902
+// AMDGCN-SAME: gfx805, tongapro, gfx810, stoney, gfx900, gfx902, gfx904, gfx906,
+// AMDGCN-SAME: gfx908, gfx909, gfx90c, gfx1010, gfx1011, gfx1012, gfx1030, gfx1031,
+// AMDGCN-SAME: gfx1032, gfx1033
 
 // RUN: not %clang_cc1 -triple wasm64--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix WEBASM
 // WEBASM: error: unknown target CPU 'not-a-cpu'
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93017: [AMDGPU] Add missing targets to amdgpu-features.cl

2020-12-10 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added a reviewer: kzhuravl.
Herald added subscribers: kerbowa, tpr, dstuttard, yaxunl, nhaehnle, jvesely.
t-tye requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D93017

Files:
  clang/test/CodeGenOpenCL/amdgpu-features.cl


Index: clang/test/CodeGenOpenCL/amdgpu-features.cl
===
--- clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -7,7 +7,18 @@
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX601 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx602 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX602 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX700 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx701 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX701 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx702 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX702 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx703 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX703 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx704 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX704 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx705 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX705 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX801 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx802 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX802 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx803 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX803 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx805 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX805 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx810 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX810 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx900 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx902 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX902 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX904 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX906 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX908 %s


Index: clang/test/CodeGenOpenCL/amdgpu-features.cl
===
--- clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -7,7 +7,18 @@
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX601 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx602 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX602 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX700 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx701 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX701 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx702 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX702 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx703 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX703 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx704 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX704 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx705 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX705 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX801 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx802 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX802 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx803 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX803 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx805 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX805 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx810 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX810 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx900 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx902 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX902 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX904 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX906 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX908 %s
___
cfe-commits

[PATCH] D92441: Add CLangOffloadBundler documentation to Clang index

2020-12-01 Thread Tony Tye via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGa417cb086209: [NFC] Add CLangOffloadBundler documentation to 
Clang index (authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92441/new/

https://reviews.llvm.org/D92441

Files:
  clang/docs/index.rst


Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -79,6 +79,7 @@
ClangFormat
ClangFormatStyleOptions
ClangFormattedStatus
+   ClangOffloadBundler
 
 Design Documents
 


Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -79,6 +79,7 @@
ClangFormat
ClangFormatStyleOptions
ClangFormattedStatus
+   ClangOffloadBundler
 
 Design Documents
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D92441: Add CLangOffloadBundler documentation to Clang index

2020-12-01 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added reviewers: kzhuravl, scott.linder, b-sumner, tpr, rampitec, yaxunl, 
kerbowa.
Herald added subscribers: cfe-commits, arphaman.
Herald added a project: clang.
t-tye requested review of this revision.

Change-Id: I1a35bea10861cb5219e0dd13e14a86df56b38825


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D92441

Files:
  clang/docs/index.rst


Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -79,6 +79,7 @@
ClangFormat
ClangFormatStyleOptions
ClangFormattedStatus
+   ClangOffloadBundler
 
 Design Documents
 


Index: clang/docs/index.rst
===
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -79,6 +79,7 @@
ClangFormat
ClangFormatStyleOptions
ClangFormattedStatus
+   ClangOffloadBundler
 
 Design Documents
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D92115: AMDGPU - Add diagnostic for compiling modules with AMD HSA OS type and GFX 6 arch

2020-11-27 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:134
+  if (isAmdHsaOS() && getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS) {
+report_fatal_error("GFX6 (SI) ASICs does not support AMD HSA OS type \n",
+   false);

pvellien wrote:
> t-tye wrote:
> > rampitec wrote:
> > > "do not support". I would also drop "(SI)" from the message. Maybe even 
> > > better just "GFX6 does not support AMD HSA".
> > Make the message include the full target triple text so the user 
> > understands how to resolve the issue. For example:
> > 
> >   The target triple %s is not supported: the processor %s does not support 
> > the amdhsa OS
> > 
> > Do the r600 targets also produce a similar error message?
> > 
> > Is this really the right test? My understanding is that the issue is not 
> > that gfx60x does not support the amdhsa OS, but that it does not use the 
> > FLAT address space.
> > 
> > My understanding is that the current problem is that FLAT instructions are 
> > being used for the GLOBAL address space accesses. The use of FLAT 
> > instructions for the global address space was introduced after gfx60x was 
> > initially being supported on amdhsa. Originally BUFFER instructions that 
> > use an SRD that has a 0 base and are marked as addr64 where used for GLOBAL 
> > address space accesses. This was changed to use FLAT instructions due to 
> > later targets dropping the SRD addr64 support. I suspect it is that change 
> > that broke gfx60x as there were no tests to catch it.
> > 
> > So the real fix seems to find that change and make the code still use use 
> > BUFFER instructions for gfxx60x and FLAT instructions for gfx70x+. The 
> > tests can then be updated to test gfx60x for amdhsa but to omit the FLAT 
> > address space tests. The error would then indicate that the gfx60x does not 
> > support the FLAT address space (and that is not conditional on the OS). The 
> > documentation in AMDGPUUsage can state that gfx60x does not support the 
> > FLAT address space in the Address Space section. The Processor table can 
> > add a column for processor characteristics and mention that the gfx60x 
> > targets do not support the FLAT address space.
> Previously in the internal review process it mentioned that gfx60x does not 
> support HSA and agreed to add a diagnostic to report that GFX6 do not support 
> HSA OS type, @rampitec mentioned that SI ASICs cannot support HSA because we 
> can't able to map memory on SI as HSA requires so the user will just have 
> weird runtime failures. But based on your comment it seems like we have to 
> use MUBUF instructions for -mtriple=amdgcn-amd-amdhsa -mcpu=gfx60x 
> combination and use FLAT instructions for -mtriple=amdgcn-amd-amdhsa 
> -mcpu=gfx70x+. Is my understanding correct? If the compiler emits the MUBUF 
> instructions for global address space accesses, it is still required to 
> produce the error msg? 
In the early days of implementing HSA I believe we were bringing up on gfx6. It 
could not support all HSA features, but it did function with the parts it could 
support. So I was suggesting we restore the code to support what it did 
originally. That would mean using MUBUF for the GLOBAL address space like it 
used to do (is that code still present?).

The compiler can then report errors for the features it cannot support, which 
in this case is it cannot support instruction selection of the GENERIC address 
space on gfx6.

If you could find the commit that switched to using FLAT instructions to access 
the GLOBAL address space that will likely provide the necessary information to 
decide the best thing to do for this issue.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92115/new/

https://reviews.llvm.org/D92115

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D92115: AMDGPU - Add diagnostic for compiling modules with AMD HSA OS type and GFX 6 arch

2020-11-27 Thread Tony Tye via Phabricator via cfe-commits

t-tye requested changes to this revision.
t-tye added inline comments.
This revision now requires changes to proceed.

Comment at: llvm/docs/AMDGPUUsage.rst:2109-2112
+Caution:
+  AMD HSA Os is not supported in Southern Islands (GFX6) ASICs.
+
 For example:

This is not the right place for mentioning this. The Processor table would 
likely be a better place. It should be in terms of supporting the amdhsa ABI.

Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:62-72
+static AMDGPUSubtarget::Generation initializeGen(const Triple ,
+ StringRef GPU) {
+  if (GPU.contains("generic")) {
+return TT.getOS() == Triple::AMDHSA
+   ? AMDGPUSubtarget::Generation::SEA_ISLANDS
+   : AMDGPUSubtarget::Generation::SOUTHERN_ISLANDS;
+  } else {

I am not clear what this function is doing. It seems to be returning a 
generation unrelated to to the actual target generation to satisfy the one 
place it is called. If the target is not SEA_ISLANDS it seems incorrect to be 
returning SEA_ISLANDS. If this function is doing something special for the one 
place it is called perhaps it should be expanded there?

Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:134
+  if (isAmdHsaOS() && getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS) {
+report_fatal_error("GFX6 (SI) ASICs does not support AMD HSA OS type \n",
+   false);

rampitec wrote:
> "do not support". I would also drop "(SI)" from the message. Maybe even 
> better just "GFX6 does not support AMD HSA".
Make the message include the full target triple text so the user understands 
how to resolve the issue. For example:

  The target triple %s is not supported: the processor %s does not support the 
amdhsa OS

Do the r600 targets also produce a similar error message?

Is this really the right test? My understanding is that the issue is not that 
gfx60x does not support the amdhsa OS, but that it does not use the FLAT 
address space.

My understanding is that the current problem is that FLAT instructions are 
being used for the GLOBAL address space accesses. The use of FLAT instructions 
for the global address space was introduced after gfx60x was initially being 
supported on amdhsa. Originally BUFFER instructions that use an SRD that has a 
0 base and are marked as addr64 where used for GLOBAL address space accesses. 
This was changed to use FLAT instructions due to later targets dropping the SRD 
addr64 support. I suspect it is that change that broke gfx60x as there were no 
tests to catch it.

So the real fix seems to find that change and make the code still use use 
BUFFER instructions for gfxx60x and FLAT instructions for gfx70x+. The tests 
can then be updated to test gfx60x for amdhsa but to omit the FLAT address 
space tests. The error would then indicate that the gfx60x does not support the 
FLAT address space (and that is not conditional on the OS). The documentation 
in AMDGPUUsage can state that gfx60x does not support the FLAT address space in 
the Address Space section. The Processor table can add a column for processor 
characteristics and mention that the gfx60x targets do not support the FLAT 
address space.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92115/new/

https://reviews.llvm.org/D92115

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-11-01 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM provided the other files are updated as other reviews have mentioned.




Comment at: llvm/docs/AMDGPUUsage.rst:317
+   
  names.
+ ``gfx1033`` ``amdgcn``   dGPU  - wavefrontsize64  
   *TBA*
+  [off]

tpr wrote:
> t-tye wrote:
> > t-tye wrote:
> > > APU
> > This still needs fixing.
> It now says APU in the diff I'm looking at. Is there something else that 
> needs changing?
I see it now. Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-10-30 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test:61-64
+# RUN: yaml2obj %s -o %t -DCPU=GFX90C
+# RUN: llvm-readobj -h %t | FileCheck %s --match-full-lines -DFILE=%t 
-DCPU=GFX90C -DFLAGS=0x32
+
 # RUN: yaml2obj %s -o %t -DCPU=GFX1010

Seems this should be in D90419 rather than this review? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-10-30 Thread Tony Tye via Phabricator via cfe-commits

t-tye requested changes to this revision.
t-tye added inline comments.
This revision now requires changes to proceed.



Comment at: llvm/docs/AMDGPUUsage.rst:317
+   
  names.
+ ``gfx1033`` ``amdgcn``   dGPU  - wavefrontsize64  
   *TBA*
+  [off]

t-tye wrote:
> APU
This still needs fixing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90419: [AMDGPU] Add gfx90c target

2020-10-30 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90419/new/

https://reviews.llvm.org/D90419

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-10-30 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:317
+   
  names.
+ ``gfx1033`` ``amdgcn``   dGPU  - wavefrontsize64  
   *TBA*
+  [off]

APU


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90364: [AMDGPU] Update AMD GPU documentation

2020-10-29 Thread Tony Tye via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG661797bd7633: [AMDGPU] Update AMD GPU documentation 
(authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90364/new/

https://reviews.llvm.org/D90364

Files:
  clang/docs/ClangCommandLineReference.rst
  clang/include/clang/Driver/Options.td
  llvm/docs/AMDGPUUsage.rst

Index: llvm/docs/AMDGPUUsage.rst
===
--- llvm/docs/AMDGPUUsage.rst
+++ llvm/docs/AMDGPUUsage.rst
@@ -1411,13 +1411,13 @@
  address address
  space   space
=== = ===  = ===
-   ``DW_ASPACE_none``  0x00  8   4Global*default address space*
-   ``DW_ASPACE_AMDGPU_generic``0x01  8   4Generic (Flat)
-   ``DW_ASPACE_AMDGPU_region`` 0x02  4   4Region (GDS)
-   ``DW_ASPACE_AMDGPU_local``  0x03  4   4Local (group/LDS)
+   ``DW_ASPACE_none``  0x00  64  32   Global*default address space*
+   ``DW_ASPACE_AMDGPU_generic``0x01  64  32   Generic (Flat)
+   ``DW_ASPACE_AMDGPU_region`` 0x02  32  32   Region (GDS)
+   ``DW_ASPACE_AMDGPU_local``  0x03  32  32   Local (group/LDS)
*Reserved*  0x04
-   ``DW_ASPACE_AMDGPU_private_lane``   0x05  4   4Private (Scratch) *focused lane*
-   ``DW_ASPACE_AMDGPU_private_wave``   0x06  4   4Private (Scratch) *unswizzled wavefront*
+   ``DW_ASPACE_AMDGPU_private_lane``   0x05  32  32   Private (Scratch) *focused lane*
+   ``DW_ASPACE_AMDGPU_private_wave``   0x06  32  32   Private (Scratch) *unswizzled wavefront*
=== = ===  = ===
 
 See :ref:`amdgpu-address-spaces` for information on the AMDGPU address spaces
Index: clang/include/clang/Driver/Options.td
===
--- clang/include/clang/Driver/Options.td
+++ clang/include/clang/Driver/Options.td
@@ -2468,28 +2468,25 @@
  HelpText<"Execution model (WebAssembly only)">;
 
 def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group,
-  HelpText<"Legacy option to specify code object v3 (AMDGPU only)">;
-def mno_code_object_v3_legacy : Flag<["-"], "mno-code-object-v3">, Group,
-  HelpText<"Legacy option to specify code object v2 (AMDGPU only)">;
+  HelpText<"Legacy option to specify code object ABI V2 (-mnocode-object-v3) or V3 (-mcode-object-v3) (AMDGPU only)">;
+def mno_code_object_v3_legacy : Flag<["-"], "mno-code-object-v3">, Group;
+
+def mcumode : Flag<["-"], "mcumode">, Group,
+  HelpText<"Specify CU (-mcumode) or WGP (-mno-cumode) wavefront execution mode (AMDGPU only)">;
+def mno_cumode : Flag<["-"], "mno-cumode">, Group;
 
-def mxnack : Flag<["-"], "mxnack">, Group,
-  HelpText<"Enable XNACK (AMDGPU only)">;
-def mno_xnack : Flag<["-"], "mno-xnack">, Group,
-  HelpText<"Disable XNACK (AMDGPU only)">;
 def msram_ecc : Flag<["-"], "msram-ecc">, Group,
-  HelpText<"Enable SRAM ECC (AMDGPU only)">;
-def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group,
-  HelpText<"Disable SRAM ECC (AMDGPU only)">;
+  HelpText<"Specify SRAM ECC mode (AMDGPU only)">;
+def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group;
 
-def mcumode : Flag<["-"], "mcumode">, Group,
-  HelpText<"CU wavefront execution mode is used (AMDGPU only)">;
-def mno_cumode : Flag<["-"], "mno-cumode">, Group,
-  HelpText<"WGP wavefront execution mode is used (AMDGPU only)">;
-
-def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">,
-  Group, HelpText<"Wavefront size 64 is used">;
-def mno_wavefrontsize64 : Flag<["-"], "mno-wavefrontsize64">,
-  Group, HelpText<"Wavefront size 32 is used">;
+def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">, Group,
+  HelpText<"Specify wavefront size 64 mode (AMDGPU only)">;
+def mno_wavefrontsize64 : Flag<["-"], "mno-wavefrontsize64">, Group,
+  HelpText<"Specify wavefront size 32 mode (AMDGPU only)">;
+
+def mxnack : Flag<["-"], "mxnack">, Group,
+  HelpText<"Specify XNACK mode (AMDGPU only)">;
+def mno_xnack : Flag<["-"], "mno-xnack">, Group;
 
 def faltivec : Flag<["-"], "faltivec">, Group, Flags<[DriverOption]>;
 def fno_altivec : Flag<["-"], "fno-altivec">, Group, Flags<[DriverOption]>;
Index: clang/docs/ClangCommandLineReference.rst
===
--- clang/docs/ClangCommandLineReference.rst
+++ clang/docs/ClangCommandLineReference.rst
@@ -1641,6 +1641,10 @@
 
 Allow device side init

[PATCH] D90364: [AMDGPU] Update AMD GPU documentation

2020-10-28 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added reviewers: kzhuravl, scott.linder.
Herald added subscribers: llvm-commits, cfe-commits, dang, kerbowa, s.egerton, 
simoncook, tpr, dstuttard, yaxunl, nhaehnle, jvesely.
Herald added projects: clang, LLVM.
t-tye requested review of this revision.
Herald added a subscriber: wdng.

- AMDGPUUsage.rst: Correct AMD GPU DWARF address space table address sizes 
which are in bits and not bytes.

- clang/.../Options.td: Improve description of AMD GPU options.

- Re-generate ClangComamndLineReference.rst from clang/.../Options.td .


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D90364

Files:
  clang/docs/ClangCommandLineReference.rst
  clang/include/clang/Driver/Options.td
  llvm/docs/AMDGPUUsage.rst

Index: llvm/docs/AMDGPUUsage.rst
===
--- llvm/docs/AMDGPUUsage.rst
+++ llvm/docs/AMDGPUUsage.rst
@@ -1411,13 +1411,13 @@
  address address
  space   space
=== = ===  = ===
-   ``DW_ASPACE_none``  0x00  8   4Global*default address space*
-   ``DW_ASPACE_AMDGPU_generic``0x01  8   4Generic (Flat)
-   ``DW_ASPACE_AMDGPU_region`` 0x02  4   4Region (GDS)
-   ``DW_ASPACE_AMDGPU_local``  0x03  4   4Local (group/LDS)
+   ``DW_ASPACE_none``  0x00  64  32   Global*default address space*
+   ``DW_ASPACE_AMDGPU_generic``0x01  64  32   Generic (Flat)
+   ``DW_ASPACE_AMDGPU_region`` 0x02  32  32   Region (GDS)
+   ``DW_ASPACE_AMDGPU_local``  0x03  32  32   Local (group/LDS)
*Reserved*  0x04
-   ``DW_ASPACE_AMDGPU_private_lane``   0x05  4   4Private (Scratch) *focused lane*
-   ``DW_ASPACE_AMDGPU_private_wave``   0x06  4   4Private (Scratch) *unswizzled wavefront*
+   ``DW_ASPACE_AMDGPU_private_lane``   0x05  32  32   Private (Scratch) *focused lane*
+   ``DW_ASPACE_AMDGPU_private_wave``   0x06  32  32   Private (Scratch) *unswizzled wavefront*
=== = ===  = ===
 
 See :ref:`amdgpu-address-spaces` for information on the AMDGPU address spaces
Index: clang/include/clang/Driver/Options.td
===
--- clang/include/clang/Driver/Options.td
+++ clang/include/clang/Driver/Options.td
@@ -2468,28 +2468,25 @@
  HelpText<"Execution model (WebAssembly only)">;
 
 def mcode_object_v3_legacy : Flag<["-"], "mcode-object-v3">, Group,
-  HelpText<"Legacy option to specify code object v3 (AMDGPU only)">;
-def mno_code_object_v3_legacy : Flag<["-"], "mno-code-object-v3">, Group,
-  HelpText<"Legacy option to specify code object v2 (AMDGPU only)">;
+  HelpText<"Legacy option to specify code object ABI V2 (-mnocode-object-v3) or V3 (-mcode-object-v3) (AMDGPU only)">;
+def mno_code_object_v3_legacy : Flag<["-"], "mno-code-object-v3">, Group;
+
+def mcumode : Flag<["-"], "mcumode">, Group,
+  HelpText<"Specify CU (-mcumode) or WGP (-mno-cumode) wavefront execution mode (AMDGPU only)">;
+def mno_cumode : Flag<["-"], "mno-cumode">, Group;
 
-def mxnack : Flag<["-"], "mxnack">, Group,
-  HelpText<"Enable XNACK (AMDGPU only)">;
-def mno_xnack : Flag<["-"], "mno-xnack">, Group,
-  HelpText<"Disable XNACK (AMDGPU only)">;
 def msram_ecc : Flag<["-"], "msram-ecc">, Group,
-  HelpText<"Enable SRAM ECC (AMDGPU only)">;
-def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group,
-  HelpText<"Disable SRAM ECC (AMDGPU only)">;
+  HelpText<"Specify SRAM ECC mode (AMDGPU only)">;
+def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group;
 
-def mcumode : Flag<["-"], "mcumode">, Group,
-  HelpText<"CU wavefront execution mode is used (AMDGPU only)">;
-def mno_cumode : Flag<["-"], "mno-cumode">, Group,
-  HelpText<"WGP wavefront execution mode is used (AMDGPU only)">;
-
-def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">,
-  Group, HelpText<"Wavefront size 64 is used">;
-def mno_wavefrontsize64 : Flag<["-"], "mno-wavefrontsize64">,
-  Group, HelpText<"Wavefront size 32 is used">;
+def mwavefrontsize64 : Flag<["-"], "mwavefrontsize64">, Group,
+  HelpText<"Specify wavefront size 64 mode (AMDGPU only)">;
+def mno_wavefrontsize64 : Flag<["-"], "mno-wavefrontsize64">, Group,
+  HelpText<"Specify wavefront size 32 mode (AMDGPU only)">;
+
+def mxnack : Flag<["-"], "mxnack">, Group,
+  HelpText<"Specify XNACK mode (AMDGPU only)">;
+def mno_xnack : Flag<["-"], "mno-xnack">, Group;
 
 def faltivec : Flag<["-"], "faltivec">, Group, Flags<[DriverOption]>;
 def fno_altivec : Flag<["-"], "fno-altivec">, Group,

[PATCH] D90212: [AMDGPU] Add missing support for targets

2020-10-27 Thread Tony Tye via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG598409782389: [AMDGPU] Add missing support for targets 
(authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90212/new/

https://reviews.llvm.org/D90212

Files:
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  llvm/lib/Object/ELFObjectFile.cpp
  llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
  llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test

Index: llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
===
--- llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
+++ llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
@@ -73,6 +73,9 @@
 # RUN: yaml2obj %s -o %t -DCPU=GFX1031
 # RUN: llvm-readobj -h %t | FileCheck %s --match-full-lines -DFILE=%t -DCPU=GFX1031 -DFLAGS=0x37
 
+# RUN: yaml2obj %s -o %t -DCPU=GFX1032
+# RUN: llvm-readobj -h %t | FileCheck %s --match-full-lines -DFILE=%t -DCPU=GFX1032 -DFLAGS=0x38
+
 --- !ELF
 FileHeader:
   Class:   ELFCLASS64
Index: llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
===
--- llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
+++ llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
@@ -1,115 +1,171 @@
 # RUN: yaml2obj --docnum=1 %s -o %t.o.1
 # RUN: llvm-readobj -S --file-headers %t.o.1 | FileCheck --check-prefixes=ELF-ALL,ELF-R600 %s
 # RUN: obj2yaml %t.o.1 | FileCheck --check-prefixes=YAML-R600 %s
+
 # RUN: yaml2obj --docnum=2 %s -o %t.o.2
 # RUN: llvm-readobj -S --file-headers %t.o.2 | FileCheck --check-prefixes=ELF-ALL,ELF-R630 %s
 # RUN: obj2yaml %t.o.2 | FileCheck --check-prefixes=YAML-R630 %s
+
 # RUN: yaml2obj --docnum=3 %s -o %t.o.3
 # RUN: llvm-readobj -S --file-headers %t.o.3 | FileCheck --check-prefixes=ELF-ALL,ELF-RS880 %s
 # RUN: obj2yaml %t.o.3 | FileCheck --check-prefixes=YAML-RS880 %s
+
 # RUN: yaml2obj --docnum=4 %s -o %t.o.4
 # RUN: llvm-readobj -S --file-headers %t.o.4 | FileCheck --check-prefixes=ELF-ALL,ELF-RV670 %s
 # RUN: obj2yaml %t.o.4 | FileCheck --check-prefixes=YAML-RV670 %s
+
 # RUN: yaml2obj --docnum=5 %s -o %t.o.5
 # RUN: llvm-readobj -S --file-headers %t.o.5 | FileCheck --check-prefixes=ELF-ALL,ELF-RV710 %s
 # RUN: obj2yaml %t.o.5 | FileCheck --check-prefixes=YAML-RV710 %s
+
 # RUN: yaml2obj --docnum=6 %s -o %t.o.6
 # RUN: llvm-readobj -S --file-headers %t.o.6 | FileCheck --check-prefixes=ELF-ALL,ELF-RV730 %s
 # RUN: obj2yaml %t.o.6 | FileCheck --check-prefixes=YAML-RV730 %s
+
 # RUN: yaml2obj --docnum=7 %s -o %t.o.7
 # RUN: llvm-readobj -S --file-headers %t.o.7 | FileCheck --check-prefixes=ELF-ALL,ELF-RV770 %s
 # RUN: obj2yaml %t.o.7 | FileCheck --check-prefixes=YAML-RV770 %s
+
 # RUN: yaml2obj --docnum=8 %s -o %t.o.8
 # RUN: llvm-readobj -S --file-headers %t.o.8 | FileCheck --check-prefixes=ELF-ALL,ELF-CEDAR %s
 # RUN: obj2yaml %t.o.8 | FileCheck --check-prefixes=YAML-CEDAR %s
+
 # RUN: yaml2obj --docnum=9 %s -o %t.o.9
 # RUN: llvm-readobj -S --file-headers %t.o.9 | FileCheck --check-prefixes=ELF-ALL,ELF-CYPRESS %s
 # RUN: obj2yaml %t.o.9 | FileCheck --check-prefixes=YAML-CYPRESS %s
+
 # RUN: yaml2obj --docnum=10 %s -o %t.o.10
 # RUN: llvm-readobj -S --file-headers %t.o.10 | FileCheck --check-prefixes=ELF-ALL,ELF-JUNIPER %s
 # RUN: obj2yaml %t.o.10 | FileCheck --check-prefixes=YAML-JUNIPER %s
+
 # RUN: yaml2obj --docnum=11 %s -o %t.o.11
 # RUN: llvm-readobj -S --file-headers %t.o.11 | FileCheck --check-prefixes=ELF-ALL,ELF-REDWOOD %s
 # RUN: obj2yaml %t.o.11 | FileCheck --check-prefixes=YAML-REDWOOD %s
+
 # RUN: yaml2obj --docnum=12 %s -o %t.o.12
 # RUN: llvm-readobj -S --file-headers %t.o.12 | FileCheck --check-prefixes=ELF-ALL,ELF-SUMO %s
 # RUN: obj2yaml %t.o.12 | FileCheck --check-prefixes=YAML-SUMO %s
+
 # RUN: yaml2obj --docnum=13 %s -o %t.o.13
 # RUN: llvm-readobj -S --file-headers %t.o.13 | FileCheck --check-prefixes=ELF-ALL,ELF-BARTS %s
 # RUN: obj2yaml %t.o.13 | FileCheck --check-prefixes=YAML-BARTS %s
+
 # RUN: yaml2obj --docnum=14 %s -o %t.o.14
 # RUN: llvm-readobj -S --file-headers %t.o.14 | FileCheck --check-prefixes=ELF-ALL,ELF-CAICOS %s
 # RUN: obj2yaml %t.o.14 | FileCheck --check-prefixes=YAML-CAICOS %s
+
 # RUN: yaml2obj --docnum=15 %s -o %t.o.15
 # RUN: llvm-readobj -S --file-headers %t.o.15 | FileCheck --check-prefixes=ELF-ALL,ELF-CAYMAN %s
 # RUN: obj2yaml %t.o.15 | FileCheck --check-prefixes=YAML-CAYMAN %s
+
 # RUN: yaml2obj --docnum=16 %s -o %t.o.16
 # RUN: llvm-readobj -S --file-headers %t.o.16 | FileCheck --check-prefixes=ELF-ALL,ELF-TURKS %s
 # RUN: obj2yaml %t.o.16 | FileCheck --check-prefixes=YAML-TURKS %s
+
 # RUN: yaml2obj --docnum=17 %s -o %t.o.17
 # RUN: llvm-readobj -S --file-headers %t.o.17 | FileCheck --check-prefixes=ELF-ALL,ELF-GFX600 %s
 # RUN: obj2yaml %t.o.17 | FileCheck --check-prefixes=YAML-GFX600 %s
+
 # RUN: yaml2obj

[PATCH] D90212: [AMDGPU] Add missing support for targets

2020-10-27 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added reviewers: kzhuravl, scott.linder, tpr.
Herald added subscribers: llvm-commits, cfe-commits, kerbowa, rupprecht, 
hiraditya, dstuttard, yaxunl, nhaehnle, jvesely, emaste.
Herald added a reviewer: espindola.
Herald added a reviewer: jhenderson.
Herald added projects: clang, LLVM.
t-tye requested review of this revision.
Herald added subscribers: MaskRay, wdng.

- Add missing tests.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D90212

Files:
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  llvm/lib/Object/ELFObjectFile.cpp
  llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
  llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test

Index: llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
===
--- llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
+++ llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
@@ -73,6 +73,9 @@
 # RUN: yaml2obj %s -o %t -DCPU=GFX1031
 # RUN: llvm-readobj -h %t | FileCheck %s --match-full-lines -DFILE=%t -DCPU=GFX1031 -DFLAGS=0x37
 
+# RUN: yaml2obj %s -o %t -DCPU=GFX1032
+# RUN: llvm-readobj -h %t | FileCheck %s --match-full-lines -DFILE=%t -DCPU=GFX1032 -DFLAGS=0x38
+
 --- !ELF
 FileHeader:
   Class:   ELFCLASS64
Index: llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
===
--- llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
+++ llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
@@ -1,115 +1,171 @@
 # RUN: yaml2obj --docnum=1 %s -o %t.o.1
 # RUN: llvm-readobj -S --file-headers %t.o.1 | FileCheck --check-prefixes=ELF-ALL,ELF-R600 %s
 # RUN: obj2yaml %t.o.1 | FileCheck --check-prefixes=YAML-R600 %s
+
 # RUN: yaml2obj --docnum=2 %s -o %t.o.2
 # RUN: llvm-readobj -S --file-headers %t.o.2 | FileCheck --check-prefixes=ELF-ALL,ELF-R630 %s
 # RUN: obj2yaml %t.o.2 | FileCheck --check-prefixes=YAML-R630 %s
+
 # RUN: yaml2obj --docnum=3 %s -o %t.o.3
 # RUN: llvm-readobj -S --file-headers %t.o.3 | FileCheck --check-prefixes=ELF-ALL,ELF-RS880 %s
 # RUN: obj2yaml %t.o.3 | FileCheck --check-prefixes=YAML-RS880 %s
+
 # RUN: yaml2obj --docnum=4 %s -o %t.o.4
 # RUN: llvm-readobj -S --file-headers %t.o.4 | FileCheck --check-prefixes=ELF-ALL,ELF-RV670 %s
 # RUN: obj2yaml %t.o.4 | FileCheck --check-prefixes=YAML-RV670 %s
+
 # RUN: yaml2obj --docnum=5 %s -o %t.o.5
 # RUN: llvm-readobj -S --file-headers %t.o.5 | FileCheck --check-prefixes=ELF-ALL,ELF-RV710 %s
 # RUN: obj2yaml %t.o.5 | FileCheck --check-prefixes=YAML-RV710 %s
+
 # RUN: yaml2obj --docnum=6 %s -o %t.o.6
 # RUN: llvm-readobj -S --file-headers %t.o.6 | FileCheck --check-prefixes=ELF-ALL,ELF-RV730 %s
 # RUN: obj2yaml %t.o.6 | FileCheck --check-prefixes=YAML-RV730 %s
+
 # RUN: yaml2obj --docnum=7 %s -o %t.o.7
 # RUN: llvm-readobj -S --file-headers %t.o.7 | FileCheck --check-prefixes=ELF-ALL,ELF-RV770 %s
 # RUN: obj2yaml %t.o.7 | FileCheck --check-prefixes=YAML-RV770 %s
+
 # RUN: yaml2obj --docnum=8 %s -o %t.o.8
 # RUN: llvm-readobj -S --file-headers %t.o.8 | FileCheck --check-prefixes=ELF-ALL,ELF-CEDAR %s
 # RUN: obj2yaml %t.o.8 | FileCheck --check-prefixes=YAML-CEDAR %s
+
 # RUN: yaml2obj --docnum=9 %s -o %t.o.9
 # RUN: llvm-readobj -S --file-headers %t.o.9 | FileCheck --check-prefixes=ELF-ALL,ELF-CYPRESS %s
 # RUN: obj2yaml %t.o.9 | FileCheck --check-prefixes=YAML-CYPRESS %s
+
 # RUN: yaml2obj --docnum=10 %s -o %t.o.10
 # RUN: llvm-readobj -S --file-headers %t.o.10 | FileCheck --check-prefixes=ELF-ALL,ELF-JUNIPER %s
 # RUN: obj2yaml %t.o.10 | FileCheck --check-prefixes=YAML-JUNIPER %s
+
 # RUN: yaml2obj --docnum=11 %s -o %t.o.11
 # RUN: llvm-readobj -S --file-headers %t.o.11 | FileCheck --check-prefixes=ELF-ALL,ELF-REDWOOD %s
 # RUN: obj2yaml %t.o.11 | FileCheck --check-prefixes=YAML-REDWOOD %s
+
 # RUN: yaml2obj --docnum=12 %s -o %t.o.12
 # RUN: llvm-readobj -S --file-headers %t.o.12 | FileCheck --check-prefixes=ELF-ALL,ELF-SUMO %s
 # RUN: obj2yaml %t.o.12 | FileCheck --check-prefixes=YAML-SUMO %s
+
 # RUN: yaml2obj --docnum=13 %s -o %t.o.13
 # RUN: llvm-readobj -S --file-headers %t.o.13 | FileCheck --check-prefixes=ELF-ALL,ELF-BARTS %s
 # RUN: obj2yaml %t.o.13 | FileCheck --check-prefixes=YAML-BARTS %s
+
 # RUN: yaml2obj --docnum=14 %s -o %t.o.14
 # RUN: llvm-readobj -S --file-headers %t.o.14 | FileCheck --check-prefixes=ELF-ALL,ELF-CAICOS %s
 # RUN: obj2yaml %t.o.14 | FileCheck --check-prefixes=YAML-CAICOS %s
+
 # RUN: yaml2obj --docnum=15 %s -o %t.o.15
 # RUN: llvm-readobj -S --file-headers %t.o.15 | FileCheck --check-prefixes=ELF-ALL,ELF-CAYMAN %s
 # RUN: obj2yaml %t.o.15 | FileCheck --check-prefixes=YAML-CAYMAN %s
+
 # RUN: yaml2obj --docnum=16 %s -o %t.o.16
 # RUN: llvm-readobj -S --file-headers %t.o.16 | FileCheck --check-prefixes=ELF-ALL,ELF-TURKS %s
 # RUN: obj2yaml %t.o.16 | FileCheck --check-prefixes=YAML-TURKS %s
+
 # RUN: yaml2obj --docnum=17 %s -o %t.o.17
 # RUN: llvm-readobj -S --file-headers %t.o.17 | FileCheck

[PATCH] D89636: [AMDGPU] Extend hip-toolchin-features.hip test

2020-10-19 Thread Tony Tye via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG89d71970cb82: [AMDGPU] Extend hip-toolchin-features.hip test 
(authored by t-tye).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89636/new/

https://reviews.llvm.org/D89636

Files:
  clang/test/Driver/hip-toolchain-features.hip


Index: clang/test/Driver/hip-toolchain-features.hip
===
--- clang/test/Driver/hip-toolchain-features.hip
+++ clang/test/Driver/hip-toolchain-features.hip
@@ -11,6 +11,8 @@
 
 // XNACK: {{.*}}clang{{.*}}"-target-feature" "+xnack"
 // NOXNACK: {{.*}}clang{{.*}}"-target-feature" "-xnack"
+// XNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+xnack"
+// NOXNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-xnack"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:sram-ecc+ %s \
@@ -21,6 +23,20 @@
 
 // SRAM: {{.*}}clang{{.*}}"-target-feature" "+sram-ecc"
 // NOSRAM: {{.*}}clang{{.*}}"-target-feature" "-sram-ecc"
+// SRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+sram-ecc"
+// NOTSRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-sram-ecc"
+
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mcumode  2>&1 | FileCheck %s -check-prefix=CUMODE
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mno-cumode  2>&1 | FileCheck %s -check-prefix=NOTCUMODE
+
+// CUMODE: {{.*}}clang{{.*}}"-target-feature" "+cumode"
+// NOTCUMODE: {{.*}}clang{{.*}}"-target-feature" "-cumode"
+// CUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+cumode"
+// NOTCUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-cumode"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:xnack+:sram-ecc+ %s \


Index: clang/test/Driver/hip-toolchain-features.hip
===
--- clang/test/Driver/hip-toolchain-features.hip
+++ clang/test/Driver/hip-toolchain-features.hip
@@ -11,6 +11,8 @@
 
 // XNACK: {{.*}}clang{{.*}}"-target-feature" "+xnack"
 // NOXNACK: {{.*}}clang{{.*}}"-target-feature" "-xnack"
+// XNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+xnack"
+// NOXNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-xnack"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:sram-ecc+ %s \
@@ -21,6 +23,20 @@
 
 // SRAM: {{.*}}clang{{.*}}"-target-feature" "+sram-ecc"
 // NOSRAM: {{.*}}clang{{.*}}"-target-feature" "-sram-ecc"
+// SRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+sram-ecc"
+// NOTSRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-sram-ecc"
+
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mcumode  2>&1 | FileCheck %s -check-prefix=CUMODE
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mno-cumode  2>&1 | FileCheck %s -check-prefix=NOTCUMODE
+
+// CUMODE: {{.*}}clang{{.*}}"-target-feature" "+cumode"
+// NOTCUMODE: {{.*}}clang{{.*}}"-target-feature" "-cumode"
+// CUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+cumode"
+// NOTCUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-cumode"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:xnack+:sram-ecc+ %s \
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89636: [AMDGPU] Extend hip-toolchin-features.hip test

2020-10-17 Thread Tony Tye via Phabricator via cfe-commits

t-tye created this revision.
t-tye added reviewers: kzhuravl, scott.linder, rampitec.
Herald added subscribers: cfe-commits, tpr, dstuttard, yaxunl.
Herald added a project: clang.
t-tye requested review of this revision.
Herald added a subscriber: wdng.

- Extend hip-toolchin-features.hip to also check the lld attributes are passed 
correctly.

- Add check for cumode attributes.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D89636

Files:
  clang/test/Driver/hip-toolchain-features.hip


Index: clang/test/Driver/hip-toolchain-features.hip
===
--- clang/test/Driver/hip-toolchain-features.hip
+++ clang/test/Driver/hip-toolchain-features.hip
@@ -11,6 +11,8 @@
 
 // XNACK: {{.*}}clang{{.*}}"-target-feature" "+xnack"
 // NOXNACK: {{.*}}clang{{.*}}"-target-feature" "-xnack"
+// XNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+xnack"
+// NOXNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-xnack"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:sram-ecc+ %s \
@@ -21,6 +23,20 @@
 
 // SRAM: {{.*}}clang{{.*}}"-target-feature" "+sram-ecc"
 // NOSRAM: {{.*}}clang{{.*}}"-target-feature" "-sram-ecc"
+// SRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+sram-ecc"
+// NOTSRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-sram-ecc"
+
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mcumode  2>&1 | FileCheck %s -check-prefix=CUMODE
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mno-cumode  2>&1 | FileCheck %s -check-prefix=NOTCUMODE
+
+// CUMODE: {{.*}}clang{{.*}}"-target-feature" "+cumode"
+// NOTCUMODE: {{.*}}clang{{.*}}"-target-feature" "-cumode"
+// CUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+cumode"
+// NOTCUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-cumode"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:xnack+:sram-ecc+ %s \


Index: clang/test/Driver/hip-toolchain-features.hip
===
--- clang/test/Driver/hip-toolchain-features.hip
+++ clang/test/Driver/hip-toolchain-features.hip
@@ -11,6 +11,8 @@
 
 // XNACK: {{.*}}clang{{.*}}"-target-feature" "+xnack"
 // NOXNACK: {{.*}}clang{{.*}}"-target-feature" "-xnack"
+// XNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+xnack"
+// NOXNACK: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-xnack"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:sram-ecc+ %s \
@@ -21,6 +23,20 @@
 
 // SRAM: {{.*}}clang{{.*}}"-target-feature" "+sram-ecc"
 // NOSRAM: {{.*}}clang{{.*}}"-target-feature" "-sram-ecc"
+// SRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+sram-ecc"
+// NOTSRAM: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-sram-ecc"
+
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mcumode  2>&1 | FileCheck %s -check-prefix=CUMODE
+// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx1010 %s \
+// RUN:   -mno-cumode  2>&1 | FileCheck %s -check-prefix=NOTCUMODE
+
+// CUMODE: {{.*}}clang{{.*}}"-target-feature" "+cumode"
+// NOTCUMODE: {{.*}}clang{{.*}}"-target-feature" "-cumode"
+// CUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=+cumode"
+// NOTCUMODE: {{.*}}lld{{.*}} "-plugin-opt=-mattr=-cumode"
 
 // RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
 // RUN:   --cuda-gpu-arch=gfx908:xnack+:sram-ecc+ %s \
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89487: [AMDGPU] gfx1032 target

2020-10-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:280

  names.
+ ``gfx1032`` ``amdgcn``   dGPU  - xnack   
*TBA*
+  [off]

foad wrote:
> xnack looks like a mistake here?
gfx1032 does not support XNACK. It should have the same target features as 
gfx1032.



Comment at: llvm/docs/AMDGPUUsage.rst:835
+ ``EF_AMDGPU_MACH_AMDGCN_GFX1032`` 0x038  ``gfx1032``
  *reserved*0x038  Reserved.
  *reserved*0x039  Reserved.

Delete as the line above is using this reserved number.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89487/new/

https://reviews.llvm.org/D89487

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89484: [AMDGPU][HIP] Switch default DWARF version to 5

2020-10-15 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89484/new/

https://reviews.llvm.org/D89484

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D87858: [hip] Add HIP scope atomic ops.

2020-09-17 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/include/clang/Basic/Builtins.def:792-799
+ATOMIC_BUILTIN(__hip_atomic_compare_exchange_strong, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_exchange, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_add, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_and, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_or, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_xor, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_min, "v.", "t")

t-tye wrote:
> Should HIP also consider adding a __hip_atomic_load and __hip_atomic_store?
What is being done for the memory fences?

Does HIP have any interest in adding a memory order?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9161-9179
+  case SyncScope::HIPSingleThread:
+Name = "singlethread";
+break;
+  case SyncScope::HIPWavefront:
+  case SyncScope::OpenCLSubGroup:
+Name = "wavefront";
+break;

t-tye wrote:
> The HIP memory model uses a single happens-before relation for all address 
> spaces which is different to OpenCL. So these need to use the "*-one-as" 
> names.
> 
> Before committing this we should decide if we want to implement the OpenCL 
> address space fence support using the syncScope, in which case we may want to 
> adjust this name before we start using it.
Actually the other way round, OpenCL should be using the  "*-one-as" names 
except for the seq_cst memory ordering which is all address spaces.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87858/new/

https://reviews.llvm.org/D87858

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D87858: [hip] Add HIP scope atomic ops.

2020-09-17 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/include/clang/Basic/Builtins.def:792-799
+ATOMIC_BUILTIN(__hip_atomic_compare_exchange_strong, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_exchange, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_add, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_and, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_or, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_xor, "v.", "t")
+ATOMIC_BUILTIN(__hip_atomic_fetch_min, "v.", "t")

Should HIP also consider adding a __hip_atomic_load and __hip_atomic_store?



Comment at: clang/lib/CodeGen/TargetInfo.cpp:9161-9179
+  case SyncScope::HIPSingleThread:
+Name = "singlethread";
+break;
+  case SyncScope::HIPWavefront:
+  case SyncScope::OpenCLSubGroup:
+Name = "wavefront";
+break;

The HIP memory model uses a single happens-before relation for all address 
spaces which is different to OpenCL. So these need to use the "*-one-as" names.

Before committing this we should decide if we want to implement the OpenCL 
address space fence support using the syncScope, in which case we may want to 
adjust this name before we start using it.



Comment at: clang/lib/Sema/SemaChecking.cpp:4877
   bool IsPassedByAddress = false;
-  if (!IsC11 && !IsN) {
+  if (!IsC11 && !IsHIP && !IsN) {
 ByValType = Ptr->getType();

Does HIP have atomic types for this to be meaningful? I would expect that HIP 
should be treated the same as OpenCL which does not appear to be here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87858/new/

https://reviews.llvm.org/D87858

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D73651: [OpenCL][CUDA][HIP][SYCL] Add norecurse

2020-01-29 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.cpp:918
+  //
+  // ToDo: clang does not support CUDA/HIP dynamic parallelism, therefore
+  // CUDA/HIP kernel can be marked with norecurse. This may change in the

tra wrote:
> I believe dynamic parallelism is more like spawning a separate process, not 
> recursion, so we should be OK.
Right, dynamic parallelism is equivalent to spawn a new thread to execute the 
function. Recursion is only about the same thread calling the function. So I 
agree that the TODO about it can be removed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73651/new/

https://reviews.llvm.org/D73651



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70424: clang/AMDGPU: Fix default for frame-pointer attribute

2019-11-25 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

@scott.linder can answer about the -g question, but I would expect that the CFI 
is capable of describing the address of the CFA regardless of whether there is 
a frame pointer by simply knowing the constant offset from the stack pointer.

For AMDGPU it seems to me what we really have is an FP and we optimize away the 
SP since the stack grows low address to high address, and S32 points to the 
base of the frame, and not the top of the stack.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70424/new/

https://reviews.llvm.org/D70424



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D68578: [HIP] Fix device stub name

2019-10-08 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In D68578#1700652 , @tra wrote:

> In D68578#1698864 , @t-tye wrote:
>
> > From a source language point of view, the device function comprises the 
> > code that is launched as a grid. We need this fact to be present in the 
> > symbols used. Only the device function should have a symbol name matching 
> > the mangled name of the device function.
>
>
> What do you have in mind when you use 'symbol name' here? Is that a symbol as 
> seen by linker? If that's the case, do host and device share this name space 
> on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and 
> each GPU have their own spaces), so they never clash, but the kernel names 
> must have identical mangled name on host and all devices, so the host can 
> refer to the device-side kernel when it needs to launch it.

We want to support a heterogeneous gdb debugger for a single source programming 
language. We would like to follow the same conventions used by compilers that 
implement other languages supported by gdb. The debugger can use symbols to 
find functions. It supports unmangling them and using the unmangled name to 
indicate the source language function it corresponds to. We would like this to 
remain true. The stub is not the kernel function, it is a helper function that 
will launch the kernel. In many ways it is acting like other trampolines. 
Therefore, it should be named as a internal helper function. The debugger can 
chose what it wants to do with it, but it does not want to be confused into 
thinking it actually IS the kernel function. If the user sets a breakpoint in 
the code of the kernel function then that breakpoint should be hit by every 
instance of the kernel that is created by the dispatch. It should not be hit by 
the code that is initiatig the dispatch. If that is what the user wanted they 
would set a breakpoint at the statement that performs the dispatch launch.

Whether the kernel is present in the CPU or GPU code is s separate concept. If 
it is present in both, then both would have the same symbol as they are both 
implementing the kernel. The debugger would set a breakpoint in both as from a 
language execution model poit of view if either piece of code executes it 
corresponds to the same source language kernel.

> 
> 
>> It the device function has both a host and device implementation then both 
>> can have the source language function name for the symbol since both 
>> actually implement the device function. If the user asks to set a breakpoint 
>> in the device function then the debugger would set in both implementations 
>> so the user is notified when the source program executes the device 
>> function, regardless of which implementation is invoked. This is similar to 
>> the debugger setting a breakpoint in a function that is inlined into 
>> multiple places: the debugger sets breeakpoints in all the inlined places so 
>> the user can tstill think of the program debugging in terms of the source 
>> language semantics.
> 
> OK. This sounds like `__host__`/`__device__` function overloads and what 
> you're saying does make sense for that.

Right. Well its not really and overload, not a request to have instances of the 
kernel avaiable for either the CPU or GPU to execute. They are the same 
function, not different overloads.

> 
> 
>> In contrast, the stub is effectively part of the implementation of actually 
>> launching the device function. It should have a distinct name.
> 
> I'm not sure how the requirement of distinct name follows from the fact that 
> the stub is the host-side part of the device-side kernel? To me it looks like 
> an argument for them to have the same name so it's clear that they are both 
> part of the same function as written in the source.
> 
> The don't have to be different. CUDA (and HIP) does not allow overloading of 
> kernels, so the stub and the kernel can have identical names as in the 
> example of `__host__` and `__device__` overloads you've described above, only 
> now it's `__host__` stub + `__global__` kernel itself, instead of two 
> user-implemented functions. Debugger, of course, will need to know about that 
> to pick the stub or kernel as the breakpoint location, but that appears 
> doable.

As mentioned, the stub is not the host side part of the device side kernel. The 
stub is a means to launch the kernel. That launching could happen on the device 
(device enqueue), or on the host. The kernel itself could execute on the device 
or the host. There is the act of launching the kernel (the function call 
statement if you will), and the kernel instances that come into existence (the 
threads created to execute the body of the kernel according to the launch 
bounds presented at the launch statement).

The user may want to set a breakpoint at the launch statement, or in the body 
of the kernel. The language execution model treats those

[PATCH] D68578: [HIP] Fix device stub name

2019-10-07 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In D68578#1697898 , @tra wrote:

> In D68578#1697851 , @yaxunl wrote:
>
> > In D68578#1697822 , @tra wrote:
> >
> > > Could you elaborate on how exactly current implementation does not work?
> > >
> > > I would expect the kernel and the stub to be two distinct entities, as 
> > > far as debugger is concerned. It does have enough information to track 
> > > each independently (different address, .stub suffix, perhaps knowledge 
> > > whether it's device or host code). Without the details, it looks to me 
> > > that this is something that can and should be dealt with in the debugger. 
> > > I've asked the same question in D63335  
> > > but I don't think I've got a good answer.
> >
> >
> > HIP debugger is a branch of gdb and the changes to support HIP will be 
> > upstreamed. When users set break point on a kernel, they intend to set a 
> > break point on the real kernel, not the device stub function. The device 
> > stub function is only a compiler generated helper function to help launch 
> > the kernel. Therefore it should have a different name so that it does not 
> > interfere with the symbol resolution of the real kernel.
>
>
> I would agree that having distinct names for the device-side kernel and it's 
> host-side stub would probably make things easier for debugger. 
>  However, debugger does have access to mangled names and does see the '.stub' 
> suffix in the mangled name. I don't understand why it can't be considered to 
> disambiguate between the kernel and the stub? 
>  I'm clearly missing something here. Is there a chance to get someone from 
> the debugger team to chime in on this review directly?
>
> Also, I would not agree that `they intend to set a break point on the real 
> kernel` is the only scenario. E.g. quite often when I debug CUDA stuff, I do 
> only care about host-side things and I do want to set breakpoint on the stub, 
> so I can check kernel call parameters as they are passed to the kernel. It 
> would be great if there were a way to explicitly tell debugger whether we 
> want host-side stub or the kernel without having user to know how particular 
> compiler transforms the name. For the user both entities have the same name, 
> but distinct location and there should be a way to express that in the 
> debugger.

From a source language point of view, the device function comprises the code 
that is launched as a grid. We need this fact to be present in the symbols 
used. Only the device function should have a symbol name matching the mangled 
name of the device function. It the device function has both a host and device 
implementation then both can have the source language function name for the 
symbol since both actually implement the device function. If the user asks to 
set a breakpoint in the device function then the debugger would set in both 
implementations so the user is notified when the source program executes the 
device function, regardless of which implementation is invoked. This is similar 
to the debugger setting a breakpoint in a function that is inlined into 
multiple places: the debugger sets breeakpoints in all the inlined places so 
the user can tstill think of the program debugging in terms of the source 
language semantics.

In contrast, the stub is effectively part of the implementation of actually 
launching the device function. It should have a distinct name. The debugger can 
still be used to set a breakpoint in it, or to step into it. But that should be 
done in terms of the stub name. If the debugger wants to support source 
language specific intelligence it can provide a helper library that understands 
the stub names. This helper library (similar to the thread helper library) can 
be used by the debugger to present a cleaner language view to the user. In fact 
OpenMP has also done this and provides a helper library called OMPD that can be 
used by tools such as a debugger to hide OpenMP trampoline functions etc.

I am a little unclear what this patch is doing as it is mentioned that the 
mangled name has a _stub in it. My understanding is that the intention was to 
create a distinct unmangled name for the stub, and then mangle it so that the 
resulting symbol was a legal mangled name. It sounded like this was the 
preferred approach, and makes sense to me based on my current understanding. Am 
I understanding this correctly?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68578/new/

https://reviews.llvm.org/D68578

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59008: [AMDGPU] Switch default dwarf version to 5

2019-03-25 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In D59008#1442014 , @dblaikie wrote:

> In D59008#1441903 , @t-tye wrote:
>
> > LGTM
> >
> > Do we know the state of split DWARF and DWARF compression for DWARF 5 
> > (compared to DWARF 2)?
>
>
> State of them in what sense? Compression is pretty orthogonal to any DWARF 
> version - it's more about the container (ELF, etc) you use. Split DWARF is 
> non-standardly supported in pre-v5, and I think it's functioning in the 
> standards conformant v5 mode too.


I had heard mention that at least split DWARF may not be fully supported yet 
for DWARF 5 in LLVM. But maybe that is old news:-)


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59008/new/

https://reviews.llvm.org/D59008



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59008: [AMDGPU] Switch default dwarf version to 5

2019-03-25 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM

Do we know the state of split DWARF and DWARF compression for DWARF 5 (compared 
to DWARF 2)?


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59008/new/

https://reviews.llvm.org/D59008



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D58518: [HIP] change kernel stub name

2019-02-21 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

To clarify, I am saying that the stub does have a different name since it is 
conceptually part of the implementation of doing the call to the device 
function implementation, and is not in fact the the device function being 
called itself. However, when we generate code for a function that is present on 
both the host and device, both copies of the code are for the same source level 
function and so can have the same symbol name (which was a question that was 
asked).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58518/new/

https://reviews.llvm.org/D58518



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D58518: [HIP] change kernel stub name

2019-02-21 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

Yes this relates to supporting the debugger.

For the same function being present on both host and device, having the same 
name is correct as the debugger must set a breakpoint at both places. This is 
similar to needing to set a breakpoint at every place a function is inlined.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58518/new/

https://reviews.llvm.org/D58518



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53153: [OpenCL] Mark kernel functions with default visibility

2018-11-05 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

In https://reviews.llvm.org/D53153#1288127, @rjmccall wrote:

> In https://reviews.llvm.org/D53153#1288112, @rjmccall wrote:
>
> > But do you want to support *dynamically* linking object files?  Because 
> > that's what visibility is about.
>
>
> To be specific, if you don't have multiple levels of linking — doing a slower 
> and relatively more expensive link to form a self-contained unit of code 
> distribution, then doing a faster link to form a runnable program from 
> multiple independently-distributed such units — then visibility isn't really 
> doing anything for you.

Currently the AMDGPU code objects are fully linked dynamically loader shared 
libraries. They are not relocatable code objects. This allows fewer and cheaper 
relocations and a smaller (dynsym) symbol table for the runtime loader.

https://reviews.llvm.org/D53153

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53153: [OpenCL] Mark namespace scope variables and kernel functions with default visibility

2018-11-05 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM

Summary needs updating as now only being done for kernels and not namespace 
scope variables.


https://reviews.llvm.org/D53153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53223: AMDGPU: Add sram-ecc feature options

2018-11-05 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D53223



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions

2018-10-07 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

Another word commonly used across languages is "offload".


https://reviews.llvm.org/D52891



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D46472: [HIP] Support offloading by linker script

2018-05-17 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM except for minor suggestions.




Comment at: lib/CodeGen/CGCUDANV.cpp:361-373
+  if (IsHIP)
+FatbinConstantName = ".hip_fatbin";
+  else if (RelocatableDeviceCode)
 // TODO: Figure out how this is called on mac OS!
 FatbinConstantName = "__nv_relfatbin";
   else
 FatbinConstantName =

Should this all be inside an if (IsHip) so the names can be different for HIP 
and NV CUDA? Seems HIP should not be using names with VV_CUDA in them. In fact 
maybe the following IF should be included as well to consilidate the NV CUDA 
code and HIP code together.



Comment at: lib/CodeGen/CGCUDANV.cpp:394-398
   // Fatbin wrapper magic.
   Values.addInt(IntTy, 0x466243b1);
   // Fatbin version.
   Values.addInt(IntTy, 1);
   // Data.

Should HIP use the same magic value and version number? Perhaps this should 
also be moved into the IsHip IF.



Comment at: lib/CodeGen/CGCUDANV.cpp:438
 
 // void __cudaRegisterLinkedBinary%NVModuleID%(void (*)(void *), void *,
 // void *, void (*)(void **))

Update comment for HIP



Comment at: lib/Driver/ToolChains/CommonArgs.cpp:150
+// If the current tool chain refers to an OpenMP or HIP offloading host, we
+// should ignore inputs that refer to OpenMP offloading devices - they will
+// be embedded according to a proper linker script.

OpenMP -> OpenMP or HIP



Comment at: lib/Driver/ToolChains/CommonArgs.cpp:1321-1322
+
+  // Construct clang-offload-bundler command to bundle object files for
+  // for different GPU archs.
+  ArgStringList BundlerArgs;

for for -> for


https://reviews.llvm.org/D46472



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43911: [AMDGPU] Clean up old address space mapping and fix constant address space value

2018-03-02 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM except minor comment.




Comment at: lib/Basic/Targets/AMDGPU.cpp:41-49
 0, // Default
 1, // opencl_global
 3, // opencl_local
 4, // opencl_constant
 5, // opencl_private
 0, // opencl_generic
 1, // cuda_device

Could the values be the AddrSpace enumerators?



Comment at: lib/Basic/Targets/AMDGPU.cpp:53-61
 5, // Default
 1, // opencl_global
 3, // opencl_local
 4, // opencl_constant
 5, // opencl_private
 0, // opencl_generic
 1, // cuda_device

Could the values be the AddrSpace enumerators?



Comment at: lib/Basic/Targets/AMDGPU.cpp:254
   }
-  auto IsGenericZero = isGenericZero(Triple);
   resetDataLayout(getTriple().getArch() == llvm::Triple::amdgcn
+  ? DataLayoutStringAMDGCN

To be consistent should this be:

```
!isAMDGCN(getTriple())
```


https://reviews.llvm.org/D43911



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43340: Clean up AMDGCN tests

2018-02-15 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM Plus I think there was a test with giz in its name. Should that be renamed?


https://reviews.llvm.org/D43340



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36802: AMDGPU: Cleanup most of the macros

2018-02-14 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D36802



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43094: AMDGPU: Enable PIC by default for amdgcn

2018-02-14 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.

For now seems reasonable to fix amdgpu as PIC. If/when other clients of amdgpu 
have tool chains defined then can switch to controling in the toolchain 
isPICDefault() function.


https://reviews.llvm.org/D43094



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36802: AMDGPU: Cleanup most of the macros

2018-02-14 Thread Tony Tye via Phabricator via cfe-commits

t-tye requested changes to this revision.
t-tye added inline comments.
This revision now requires changes to proceed.



Comment at: lib/Basic/Targets/AMDGPU.cpp:362
+Builder.defineMacro(Twine("__") + Twine(GPUName));
+Builder.defineMacro(Twine("__") + Twine(GPUName) + Twine("__"));
+  }

b-sumner wrote:
> At the meeting we discussed defining every alias of the given GPU, rather 
> than only the mcpu name.   Were we going to proceed with that?
Subsequent discussion settled on only defining the canonical names. If it was 
useful to also have the alternative names a header file could be provided that 
defines those names in response to the canonical names being defined.



Comment at: lib/Basic/Targets/AMDGPU.cpp:336
+  if (GPU.Kind != GK_NONE) {
+Builder.defineMacro(Twine("__") + Twine(GPU.CanonicalName));
+Builder.defineMacro(Twine("__") + Twine(GPU.CanonicalName) + Twine("__"));

Subsequent discussion decided to drop this macro.


https://reviews.llvm.org/D36802



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D43171: [AMDGPU] Change constant addr space to 4 for clang

2018-02-12 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM. Other clean up can be done as a separate patch.




Comment at: test/CodeGenOpenCL/address-spaces.cl:37
+// SPIR: i32 addrspace(2)* %arg
+// GIZ: i32 addrspace(4)* %arg
 void f__c(__constant int *arg) {}

Suggest using the same name across all the OpenCL tests as some are using GIZ, 
some AMD and some AMDGCN. AMDGCN seems the clearer now that only have a single 
address space mapping? Similar comment for other places GIZ/AMD is being 
mentioned.





Comment at: test/CodeGenOpenCL/amdgpu-env-amdgiz.cl:4
 
-// CHECK: target datalayout = 
"e-p:64:64-p1:64:64-p2:64:64-p3:32:32-p4:32:32-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5"
+// CHECK: target datalayout = 
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5"
 void foo(void) {}

Should this test be renamed to amdgpu-env-amdgcn.cl now there is a single 
address space mapping (and delete any relating to the old mapping if they 
exist)?


https://reviews.llvm.org/D43171



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D42800: Let CUDA toolchain support amdgpu target

2018-02-05 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/Basic/Cuda.h:49-57
+  GFX700,
+  GFX701,
+  GFX800,
+  GFX801,
+  GFX802,
+  GFX803,
+  GFX810,

Should complete list of processors for the amdgcn architecture be included? See 
https://llvm.org/docs/AMDGPUUsage.html#processors .



Comment at: include/clang/Basic/Cuda.h:79
   COMPUTE_72,
+  COMPUTE_GCN,
 };

Suggest using amdgcn which matches the architecture name in 
https://llvm.org/docs/AMDGPUUsage.html#processors .



Comment at: lib/Basic/Targets/AMDGPU.h:85
 return TT.getEnvironmentName() == "amdgiz" ||
+   TT.getOS() == llvm::Triple::CUDA ||
TT.getEnvironmentName() == "amdgizcl";

We still want to use the amdhsa OS for amdgpu which currently supports the 
different environments. So can cuda simply support the same environments? Is 
the plan is to eliminate the environments and simply always use the default 
address space for generic so this code is no longer needed?


https://reviews.llvm.org/D42800



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D40045: AMDGPU/GCN: Bring processors in sync with AMDGPUUsage

2017-11-14 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM except for Bonaire.




Comment at: lib/Basic/Targets/AMDGPU.cpp:271
   .Case("gfx700", GK_GFX7)
   .Case("bonaire", GK_GFX7)
   .Case("kaveri", GK_GFX7)

D40051 proposes making Bonaire separate from Kaveri and named gfx704.


https://reviews.llvm.org/D40045



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D39877: AMDGPU/NFC: Move getAMDGPUTargetFeatures to AMDGPU toolchain

2017-11-09 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D39877



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D39878: AMDGPU: Add -mxnack/-mno-xnack options that set +/-xnack feature

2017-11-09 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D39878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D38770: AMDGPU: Use stricter bounds for workitem builtins

2017-10-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/Basic/TargetInfo.h:1060
+  /// \returns Maximum device supported OpenCL workgroup size.
+  virtual unsigned getOpenCLMaxWorkGroupSize(unsigned Dim) const {
+return 0;

Is this specifically tied to OpenCL or is it the target's maximum supported 
work-group size? There are other languages besides OpenCL that have the notion 
of work-groups, and would seem better to generalize this.



Comment at: lib/Basic/Targets/AMDGPU.h:73-74
 
+  // The maximum supported group size is 1024, but some runtimes currently only
+  // support 256.
+  unsigned MaxWorkGroupSize = 1024;

Probably want to update this comment as all runtimes are moving to support the 
full capabilities of the hardware which is 1024 for all current AMDGCN targets.


https://reviews.llvm.org/D38770



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36771: AMDGPU: add missing amdgcn processors and tests

2017-08-17 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D36771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36802: AMDGPU: Insert devicename macros

2017-08-16 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/Basic/Targets/AMDGPU.cpp:364-367
+  if (GPUName.empty())
+return;
+
+  Builder.defineMacro(Twine("__") + Twine(GPUName) + Twine("__"));

Should this be the following since extra macros could be after it in the future:

```
if (!GPUName.empty())
  Builder.defineMacro(Twine("__") + Twine(GPUName) + Twine("__"));
```

Should we only be defining macros using the canonical target name (the one in 
column 1 of https://llvm.org/docs/AMDGPUUsage.html#processors) rather than the 
one specified on the command line?


https://reviews.llvm.org/D36802



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36580: [OpenCL] Support variable memory scope in atomic builtins

2017-08-15 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: cfe/trunk/lib/Frontend/InitPreprocessor.cpp:581
   // Define macros for the OpenCL memory scope.
   // The values should match clang SyncScope enum.
+  static_assert(

// The values should match clang AtomicScopeOpenCLModel::ID enum.


Repository:
  rL LLVM

https://reviews.llvm.org/D36580



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36580: [OpenCL] Support variable memory scope in atomic builtins

2017-08-13 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.

LGTM


https://reviews.llvm.org/D36580



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36580: [OpenCL] Support variable memory scope in atomic builtins

2017-08-13 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/CodeGen/CGAtomic.cpp:678
+  auto  = CGF.Builder;
+  auto Scopes = getAllLanguageSyncScopes();
+  llvm::DenseMap BB;

yaxunl wrote:
> t-tye wrote:
> > Should only the scopes that apply to the language be returned otherwise 
> > will be generating code for invalid (possibly duplicate ABI) values?
> getAllLanguageSyncScopes() only returns scope values for current language. I 
> will rename it to getRuntimeSyncScopeValuesForCurrentLanguage() to avoid 
> confusing.
Curretly getAllLanguageSyncScopes does not take a LangOpt so not sure how it 
knows what the language is, and did not see it checking that the language is 
OpenCL internally. For non-OpenCL languages do they still have to support 
system scope?


https://reviews.llvm.org/D36580



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36580: [OpenCL] Support variable memory scope in atomic builtins

2017-08-11 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM other than suggested documentation and static_assert/unreachable comments.




Comment at: lib/CodeGen/CGAtomic.cpp:696
+if (S != Default)
+  SI->addCase(Builder.getInt32(static_cast(S)), B);
+

rjmccall wrote:
> t-tye wrote:
> > Is it documented in the SyncScope enum that the enumerator values are in 
> > fact the values used for source language runtime values? Seems if other 
> > languages want to use scopes they may may have a different ordering. That 
> > would imply that there would be a function to map a SyncScope value to the 
> > value used by the source language. For OpenCL the mapping is identity.
> > 
> > The memory ordering has the isValidAtomicOrderingCABI() that does a similar 
> > thing.
> The values in the SyncScope enum are the source language values.  We already 
> have a step to translate them into LLVM values when we generate a native LLVM 
> construct.  To the extent that we call into a runtime instead, none of that 
> code has been written to be runtime-agnostic at all, and I've been assuming 
> that we're basically okay with that, at least for now.
That sounds reasonable to me. When/if another language comes along the task of 
mapping the multiple ABIs can be done then. Still wanted to make sure it is 
clear that the enum values in the SyncScope enum are documented as BEING the 
OpenCL runtime ABI values and not some arbitrary list as in other enums such as 
the address space enum. 


https://reviews.llvm.org/D36580



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36580: [OpenCL] Support variable memory scope in atomic builtins

2017-08-11 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/Basic/SyncScope.h:46
+  Scopes.push_back(SyncScope::OpenCLSubGroup);
+  return Scopes;
+}

Should there be an assert/static_assert in case SyncScope enum grows due to 
other languages?



Comment at: include/clang/Basic/SyncScope.h:59
+return "opencl_subgroup";
+  }
+}

Should there be a default/assert/static_assert to allow SyncScope enum to grow 
due to other languages?



Comment at: lib/CodeGen/CGAtomic.cpp:696
+if (S != Default)
+  SI->addCase(Builder.getInt32(static_cast(S)), B);
+

Is it documented in the SyncScope enum that the enumerator values are in fact 
the values used for source language runtime values? Seems if other languages 
want to use scopes they may may have a different ordering. That would imply 
that there would be a function to map a SyncScope value to the value used by 
the source language. For OpenCL the mapping is identity.

The memory ordering has the isValidAtomicOrderingCABI() that does a similar 
thing.


https://reviews.llvm.org/D36580



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-08-03 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.

LGTM


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-08-03 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: docs/LanguageExtensions.rst:1935
+builtin function, and are named with a ``__opencl_`` prefix.)
 
 Low-level ARM exclusive memory builtins

Should it also say:

```
The macros ``__OPENCL_MEMORY_SCOPE_WORK_ITEM``, 
``__OPENCL_MEMORY_SCOPE_WORK_GROUP``, ``__OPENCL_MEMORY_SCOPE_DEVICE``, 
``__OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES``, and 
``__OPENCL_MEMORY_SCOPE_SUB_GROUP`` are provided, with values corresponding to 
the enumerators of OpenCL's ``memory_scope`` enumeration.
```


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-08-02 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/Basic/SyncScope.h:23
+enum class SyncScope {
+  OpenCLWorkItem = 0,
+  OpenCLWorkGroup = 1,

The OpenCL workitem scope is only used for image fences and does not apply to 
atomic operations so not sure that it should be in this enumeration which is 
used only for memory atomics.



Comment at: lib/CodeGen/TargetInfo.cpp:7554-7555
+  switch (S) {
+  case SyncScope::OpenCLWorkItem:
+Name = "singlethread";
+break;

OpenCL only uses workitem for image fences which are not the same as atomic 
memory fences.

For image fences I don't think it would map to singlethread which is intended 
for signal handler support to ensure a signal handler has visibility of the 
updates done by a thread which is more of an optimization barrier. In contrast 
an image fence may need to flush caches to make the image and vector access 
paths coherent in a single thread.

Since this patch does not support fences probably want to leave workitem scope 
out. Current AMDGPU targets do not need to do anything for an OpenCL image 
fence, but other targets may need to generate an intrinsic since there is no 
LLVMIR instruction for this.



Comment at: lib/Headers/opencl-c.h:13145-13150
   memory_scope_work_item,
   memory_scope_work_group,
   memory_scope_device,
   memory_scope_all_svm_devices,
+#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups)
   memory_scope_sub_group

Do these have to have the same values as the SycnScope enumeration? Should that 
be ensured in a similar way to the memory_order enumeration?



Comment at: lib/Sema/SemaChecking.cpp:3103
+Diag(Scope->getLocStart(),
+ diag::err_atomic_op_has_non_constant_synch_scope)
+<< Scope->getSourceRange();

IIRC OpenCL allows the scope to be a runtime value. So will doing this will 
likely cause failures in conformance?


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

2017-07-25 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/Basic/Builtins.def:717
+ATOMIC_BUILTIN(__opencl_atomic_fetch_max, "v.", "t")
+
 #undef ATOMIC_BUILTIN

Will the OpenCL 2.0 memory fences also be supported which also have a memory 
order and memory scope?



Comment at: include/clang/Basic/SyncScope.h:25-29
+  SingleThread  = 0,
+  WorkGroup = 1,
+  Device= 2,
+  System= 3,
+  SubGroup  = 4,

Since the builtins are being named as __opencl then should these also be named 
as opencl_ since they are the memory scopes for OpenCL using the OpenCL numeric 
values?

If another language wants to use memory scopes, would it then add its own 
langx_* names in a similar way that is done for address spaces where the LangAS 
enumeration type has values for each distinct language. Each target is then 
responsible for mapping each language scope to the appropriate target specific 
scope as is done for address spaces?

If so then the builtins are really supporting the concept of memory scopes and 
are not language specific as this enumeration can support multiple languages in 
the same way as the LangAS enumeration supports multiple languages. If so the 
builtins would be better named to reflect this as @b-sumner suggested.


https://reviews.llvm.org/D28691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D32248: CodeGen: Cast alloca to expected address space

2017-05-15 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/CodeGen/CGDecl.cpp:1115
+  assert(T.getAddressSpace() == LangAS::Default ||
+ T.getQualifiers().hasTargetSpecificAddressSpace());
+  auto Addr = getTargetHooks().performAddrSpaceCast(*this,

Should allowing specifying an address space on a function local automatic 
variable in OpenCL be allowed? It seems generating LLVM IR that allocates the 
variable in the alloca address space then address space casting the pointer to 
some other address space is not what such a language feature is requesting.

It seems it is really requesting that the variable is allocated in a different 
address space. That could be indicated by putting a different address space on 
the alloca itself, but the builder only allows an alloca to use the default 
alloca address space. No target supports this ability.

So maybe the best solution is to make OpenCL be the same as C and not allow an 
address space qualifier on function scope automatic variables.

If that is done then this assert will only allow the language default address 
space.


https://reviews.llvm.org/D32248



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D33109: Enhance synchscope representation (clang)

2017-05-11 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D33109



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D32248: CodeGen: Cast alloca to expected address space

2017-05-08 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/CodeGen/CGDecl.cpp:1115
+  if (AddrTy->getAddressSpace() != ExpectedAddrSpace &&
+  Ty.getAddressSpace() != LangAS::opencl_constant) {
+address = Address(Builder.CreateAddrSpaceCast(Addr,

Anastasia wrote:
> Do you need to check this? I thought your change doesn't affect OpenCL 
> because alloca AS == private AS in OpenCL  and constant AS local objects are 
> treated as private const objects anyways.
Given patch D32977 that makes function local variables marked with the constant 
address space treated as static, will they ever be processed by this function 
now? If so then the extra test for opencl_constant would not be needed.


https://reviews.llvm.org/D32248



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D32248: CodeGen: Cast alloca to expected address space

2017-04-19 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/CodeGen/CGDecl.cpp:1105-1119
+  // Alloca always returns a pointer in alloca address space, which may
+  // be different from the type defined by the language. For example,
+  // in C++ the auto variables are in the default address space. Therefore
+  // cast alloca to the expected address space when necessary.
+  auto Addr = address.getPointer();
+  auto AddrTy = cast(Addr->getType());
+  auto ExpectedAddrSpace = 
CGM.getTypes().getVariableType(D)->getAddressSpace();

Is any assert done to ensure that it is legal to address space cast from 
variable address space to expected address space? Presumably the language, by 
definition, will only be causing legal casts. For example from alloca address 
space to generic (which includes the alloca address space).

For OpenCL, can you explain how the local variable can have the constant 
address space and use an alloca for allocation? Wouldn't a constant address 
space mean it was static and so should not be using alloca? And if it is using 
an alloca, how can it then be accessed as if it was in constant address space?



Comment at: lib/CodeGen/CodeGenTypes.h:200
+  /// Get the LLVM pointer type of a variable.
+  llvm::PointerType *getVariableType(VarDecl D);
+

Should the name reflect that the type returned is not the variable type, but a 
pointer to the variable type? For example, getVariablePointerType or 
getPointerToVariableType.


https://reviews.llvm.org/D32248



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31771: [AMDGPU] Temporarily change constant address space from 4 to 2 for the new address space mapping

2017-04-06 Thread Tony Tye via Phabricator via cfe-commits

t-tye added a comment.

LGTM




Comment at: lib/Basic/Targets.cpp:2083
 Local = 3;
-Constant  = 4;
+Constant  = 2;
 Private   = 5;

t-tye wrote:
> Since Constant is now the same regardless of the GIZ setting, should it be 
> moved to be a literal constant like the other values that do not change?
Disregard comment as was thinking of the LLVM way address spaces were being 
handled.


https://reviews.llvm.org/D31771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31771: [AMDGPU] Temporarily change constant address space from 4 to 2 for the new address space mapping

2017-04-06 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

Other than one comment:

LGTM




Comment at: lib/Basic/Targets.cpp:2083
 Local = 3;
-Constant  = 4;
+Constant  = 2;
 Private   = 5;

Since Constant is now the same regardless of the GIZ setting, should it be 
moved to be a literal constant like the other values that do not change?


https://reviews.llvm.org/D31771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-04-03 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-04-03 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/AST/Type.h:339-340
+auto Addr = getAddressSpace();
+if (Addr == 0)
+  return 0;
+return Addr - LangAS::target_first;

Anastasia wrote:
> yaxunl wrote:
> > t-tye wrote:
> > > Since you mention this is only used for  
> > > `__attribute__((address_space(n)))`, why is this check for 0 needed?
> > > 
> > > If it is needed then to match other places should it simply be:
> > > 
> > > ```
> > > if (Addr)
> > >   return Addr - LangAS::target_first;
> > > return 0;
> > > ```
> > It is for `__attribute__((address_space(n)))` and the default addr space 0.
> > 
> > For the default addr space 0, we want to print 0 instead of 
> > `-LangAS::target_first`.
> > 
> > I will make the change for matching other places.
> Could we use LangAS::Count instead?
I do not think the address space 0 should be returned as 0 as then it is 
impossible to distinguish between a type that has no address space attribute, 
and one that has an explicit address space attribute with the value 0.

But that seems to be a bug in the original code so I would suggest leaving this 
for now and fixing it as a separate patch. The diagnostic message should really 
be checking if an address space attribute was present (by checking for 0), and 
changing the working of the message accordingly.

Suggest add a TODO here to mention this which can be fixed in a later patch.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-04-03 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/Sema/SemaExprCXX.cpp:2055
+  << AllocType.getUnqualifiedType()
+  << AllocType.getQualifiers().getAddressSpacePrintValue();
   else if (getLangOpts().ObjCAutoRefCount) {

Would suggest renaming getAddressSpacePrintValue to 
getAddressSpaceAttributePrintValue since it only deals with address spaces 
coming from the` __attribute__((address_space(n)))`.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-04-03 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/AST/Type.h:339-340
+auto Addr = getAddressSpace();
+if (Addr == 0)
+  return 0;
+return Addr - LangAS::target_first;

Since you mention this is only used for  `__attribute__((address_space(n)))`, 
why is this check for 0 needed?

If it is needed then to match other places should it simply be:

```
if (Addr)
  return Addr - LangAS::target_first;
return 0;
```



Comment at: include/clang/Basic/DiagnosticSemaKinds.td:2451-2453
+def warn_attribute_address_space_negative : Warning<
+  "address space is negative">,
+  InGroup>;

Now the one questionable test has been fixed, should the handling of 
address_space attribute go back to it being an error if n is negative? That 
seems more logical.



Comment at: lib/AST/ASTContext.cpp:9555
+  // alloca.
+  if (AS == LangAS::Default && LangOpts.OpenCL)
+return getTargetInfo().getDataLayout().getAllocaAddrSpace();

To be consistent with other places should this simply be:
```
if (!AS && LangOpts.OpenCL)
```



Comment at: lib/AST/ASTContext.cpp:9556
+  if (AS == LangAS::Default && LangOpts.OpenCL)
+return getTargetInfo().getDataLayout().getAllocaAddrSpace();
+  if (AS >= LangAS::target_first)

An alternative to doing this would be to add an opencl_private to LangAS and 
have each target map it accordingly. Then this could be:

```
// If a target specific address space was specified, simply return it.
if (AS >= LangAS::target_first)
  return AS - LangAS::target_first;
// For OpenCL, only function local variables are not explicitly marked with
// an address space in the AST, so treat them as the OpenCL private address 
space.
if (!AS && LangOpts.OpenCL)
  AS = LangAS::opencl_private;
return (*AddrSpaceMap)[AS];
```
This seems to better express what is happening here. If no address space was 
specified, and the language is OpenCL, then treat it as OpenCL private and map 
it according to the target mapping.

If wanted to eliminate the LangAS::Default named constant then that would be 
possible as it is no longer being used by name. However, would need to ensure 
that the first named enumerators starts at 1 so that 0 is left as the "no value 
explicitly specified" value that each target must map to the target specific 
generic address space.



Comment at: lib/Sema/SemaExprCXX.cpp:2052
+  else if (AllocType.getQualifiers().getAddressSpace() &&
+  AllocType.getQualifiers().getAddressSpace() != 
LangAS::target_first)
 return Diag(Loc, diag::err_address_space_qualified_new)

Should this be `>= LangAS::target_first` ? Seems it is wanting to check if an 
`__attribute__((address_space(n)))` was specified.



Comment at: lib/Sema/SemaExprCXX.cpp:3123
+if (Pointee.getQualifiers().getAddressSpace() &&
+   Pointee.getQualifiers().getAddressSpace() != LangAS::target_first)
   return Diag(Ex.get()->getLocStart(),

Ditto.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-03-28 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: include/clang/AST/ASTContext.h:2319
 return AddrSpaceMapMangling || 
-   AS < LangAS::Offset || 
-   AS >= LangAS::Offset + LangAS::Count;
+   AS > LangAS::target_first;
   }

Should this be >= since it wants to return for all target address spaces, and 
target_first is the first one?



Comment at: include/clang/AST/Type.h:336-342
+  /// Get the address space value used in diagnostics.
+  unsigned getAddressSpacePrintValue() const {
+unsigned AS = getAddressSpace();
+if (AS >= LangAS::target_first)
+  return AS - LangAS::target_first;
+return AS;
+  }

Is this the right thing to do? This is making the CLANG named address spaces 
have the same numbers as the target-specific address space numbers which would 
seem rather confusing.

What do the diagnostics want to display? For example, they could display the 
address space as a human readable form such as "Default", "OpenCL-global", 
CUDA-device", "target-specific(5)", etc. To do that this function could take an 
iostream and a LangAS value and use << to write to the iostream.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-03-28 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/AST/ASTContext.cpp:9547-9555
+unsigned ASTContext::getTargetAddressSpace(unsigned AS) const {
+  // For OpenCL, the address space qualifier is 0 in AST.
+  if (AS == 0 && LangOpts.OpenCL)
+return getTargetInfo().getDataLayout().getAllocaAddrSpace();
+  if (AS < LangAS::Offset || AS >= LangAS::Offset + LangAS::Count)
+return AS;
+  else

yaxunl wrote:
> t-tye wrote:
> > yaxunl wrote:
> > > t-tye wrote:
> > > > Presumably this will not work if the user put an address space 
> > > > attribute on a variable with an address space of 0, since it is not 
> > > > possible to distinguish between a variable that never had an attribute 
> > > > and was function local, and one that has an explicit address space 
> > > > attribute specifying 0.
> > > > 
> > > > It would seem better if LangAS did not map the 0..LangAS::Offset to be 
> > > > target address spaces. Instead LangAS could use 0..LangAS::Count to be 
> > > > the CLANG explicitly specified values (reserving 0 to mean the default 
> > > > when none was specified), and values above LangAS::Count would map to 
> > > > the explicitly specified target address spaces. For example:
> > > > 
> > > > ```
> > > > namespace clang {
> > > >  
> > > >  namespace LangAS {
> > > >  
> > > >  /// \brief Defines the set of possible language-specific address 
> > > > spaces.
> > > >  ///
> > > >  /// This uses values above the language-specific address spaces to 
> > > > denote
> > > >  /// the target-specific address spaces biased by target_first.
> > > >  enum ID {
> > > >default = 0,
> > > >  
> > > >opencl_global,
> > > >opencl_local,
> > > >opencl_constant,
> > > >opencl_generic,
> > > >  
> > > >cuda_device,
> > > >cuda_constant,
> > > >cuda_shared,
> > > >  
> > > >Count,
> > > > 
> > > >target_first = Count
> > > >  };
> > > >  
> > > >  /// The type of a lookup table which maps from language-specific 
> > > > address spaces
> > > >  /// to target-specific ones.
> > > >  typedef unsigned Map[Count];
> > > >  
> > > >  }
> > > > ```
> > > > 
> > > > Then this function would be:
> > > > 
> > > > ```
> > > > unsigned ASTContext::getTargetAddressSpace(unsigned AS) const {
> > > >   if (AS == LangAS::default && LangOpts.OpenCL)
> > > > // For OpenCL, only function local variables are not explicitly 
> > > > marked with an
> > > > // address space in the AST, and these need to be the address space 
> > > > of alloca.
> > > > return getTargetInfo().getDataLayout().getAllocaAddrSpace();
> > > >   if (AS >= LangAS::target_first)
> > > > return AS - LangAS::target_first;
> > > >   else
> > > > return (*AddrSpaceMap)[AS];
> > > > }
> > > > ```
> > > > 
> > > > Each target AddrSpaceMap would map LangAS::default to that target's 
> > > > default generic address space since that matches most other languages.
> > > > 
> > > > The address space attribute would need a corresponding "+ 
> > > > LangAS::target_first" to the value it stored in the AST.
> > > > 
> > > > Then it is possible to definitively tell when an AST node has not had 
> > > > any address space specified as it will be the LangAS::default value.
> > > There is a lit test like this:
> > > 
> > > ```
> > > // RUN: %clang_cc1 %s -verify -pedantic -fsyntax-only
> > > 
> > > #define OPENCL_CONSTANT 8388354
> > > int __attribute__((address_space(OPENCL_CONSTANT))) c[3] = {0};
> > > 
> > > void foo() {
> > >   c[0] = 1; //expected-error{{read-only variable is not assignable}}
> > > }
> > > 
> > > ```
> > > It tries to set address space of opencl_constant through 
> > > `__attribute__((address_space(n)))`. If we "+ LangAS::target_first" to 
> > > the value before it is tored in the AST, we are not able to use 
> > > `__attribute__((address_space(n)))` to represent opencl_constant.
> > This seems a bit of a hack. It could be made to work by simply defining 
> > OPENCL_CONSTANT to be the negative value that would result in the correct 
> > LangAS value, which is pretty much what the test is doing anyway. Just 
> > seems conflating the default value with the first target address space 
> > value is undesirable as it prevents specifying target address space 0 as 
> > that gets treated differently than any other address space value.
> Clang will emit error if address space value is negative, but I can change it 
> to a warning.
I guess the observation is that without a change the proposed changes will make 
__attribute__((address_space(0))) behave in unexpected ways for some targets 
(for AMDGPU it would actually cause address space for private to be used). The 
suggested approach also seems cleaner as it explicitly defines a "default" 
value which does not overlay the target address space range of values.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-03-28 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/AST/ASTContext.cpp:9547-9555
+unsigned ASTContext::getTargetAddressSpace(unsigned AS) const {
+  // For OpenCL, the address space qualifier is 0 in AST.
+  if (AS == 0 && LangOpts.OpenCL)
+return getTargetInfo().getDataLayout().getAllocaAddrSpace();
+  if (AS < LangAS::Offset || AS >= LangAS::Offset + LangAS::Count)
+return AS;
+  else

yaxunl wrote:
> t-tye wrote:
> > Presumably this will not work if the user put an address space attribute on 
> > a variable with an address space of 0, since it is not possible to 
> > distinguish between a variable that never had an attribute and was function 
> > local, and one that has an explicit address space attribute specifying 0.
> > 
> > It would seem better if LangAS did not map the 0..LangAS::Offset to be 
> > target address spaces. Instead LangAS could use 0..LangAS::Count to be the 
> > CLANG explicitly specified values (reserving 0 to mean the default when 
> > none was specified), and values above LangAS::Count would map to the 
> > explicitly specified target address spaces. For example:
> > 
> > ```
> > namespace clang {
> >  
> >  namespace LangAS {
> >  
> >  /// \brief Defines the set of possible language-specific address spaces.
> >  ///
> >  /// This uses values above the language-specific address spaces to denote
> >  /// the target-specific address spaces biased by target_first.
> >  enum ID {
> >default = 0,
> >  
> >opencl_global,
> >opencl_local,
> >opencl_constant,
> >opencl_generic,
> >  
> >cuda_device,
> >cuda_constant,
> >cuda_shared,
> >  
> >Count,
> > 
> >target_first = Count
> >  };
> >  
> >  /// The type of a lookup table which maps from language-specific address 
> > spaces
> >  /// to target-specific ones.
> >  typedef unsigned Map[Count];
> >  
> >  }
> > ```
> > 
> > Then this function would be:
> > 
> > ```
> > unsigned ASTContext::getTargetAddressSpace(unsigned AS) const {
> >   if (AS == LangAS::default && LangOpts.OpenCL)
> > // For OpenCL, only function local variables are not explicitly marked 
> > with an
> > // address space in the AST, and these need to be the address space of 
> > alloca.
> > return getTargetInfo().getDataLayout().getAllocaAddrSpace();
> >   if (AS >= LangAS::target_first)
> > return AS - LangAS::target_first;
> >   else
> > return (*AddrSpaceMap)[AS];
> > }
> > ```
> > 
> > Each target AddrSpaceMap would map LangAS::default to that target's default 
> > generic address space since that matches most other languages.
> > 
> > The address space attribute would need a corresponding "+ 
> > LangAS::target_first" to the value it stored in the AST.
> > 
> > Then it is possible to definitively tell when an AST node has not had any 
> > address space specified as it will be the LangAS::default value.
> There is a lit test like this:
> 
> ```
> // RUN: %clang_cc1 %s -verify -pedantic -fsyntax-only
> 
> #define OPENCL_CONSTANT 8388354
> int __attribute__((address_space(OPENCL_CONSTANT))) c[3] = {0};
> 
> void foo() {
>   c[0] = 1; //expected-error{{read-only variable is not assignable}}
> }
> 
> ```
> It tries to set address space of opencl_constant through 
> `__attribute__((address_space(n)))`. If we "+ LangAS::target_first" to the 
> value before it is tored in the AST, we are not able to use 
> `__attribute__((address_space(n)))` to represent opencl_constant.
This seems a bit of a hack. It could be made to work by simply defining 
OPENCL_CONSTANT to be the negative value that would result in the correct 
LangAS value, which is pretty much what the test is doing anyway. Just seems 
conflating the default value with the first target address space value is 
undesirable as it prevents specifying target address space 0 as that gets 
treated differently than any other address space value.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31404: [OpenCL] Allow alloca return non-zero private pointer

2017-03-27 Thread Tony Tye via Phabricator via cfe-commits

t-tye added inline comments.



Comment at: lib/AST/ASTContext.cpp:9547-9555
+unsigned ASTContext::getTargetAddressSpace(unsigned AS) const {
+  // For OpenCL, the address space qualifier is 0 in AST.
+  if (AS == 0 && LangOpts.OpenCL)
+return getTargetInfo().getDataLayout().getAllocaAddrSpace();
+  if (AS < LangAS::Offset || AS >= LangAS::Offset + LangAS::Count)
+return AS;
+  else

Presumably this will not work if the user put an address space attribute on a 
variable with an address space of 0, since it is not possible to distinguish 
between a variable that never had an attribute and was function local, and one 
that has an explicit address space attribute specifying 0.

It would seem better if LangAS did not map the 0..LangAS::Offset to be target 
address spaces. Instead LangAS could use 0..LangAS::Count to be the CLANG 
explicitly specified values (reserving 0 to mean the default when none was 
specified), and values above LangAS::Count would map to the explicitly 
specified target address spaces. For example:

```
namespace clang {
 
 namespace LangAS {
 
 /// \brief Defines the set of possible language-specific address spaces.
 ///
 /// This uses values above the language-specific address spaces to denote
 /// the target-specific address spaces biased by target_first.
 enum ID {
   default = 0,
 
   opencl_global,
   opencl_local,
   opencl_constant,
   opencl_generic,
 
   cuda_device,
   cuda_constant,
   cuda_shared,
 
   Count,

   target_first = Count
 };
 
 /// The type of a lookup table which maps from language-specific address spaces
 /// to target-specific ones.
 typedef unsigned Map[Count];
 
 }
```

Then this function would be:

```
unsigned ASTContext::getTargetAddressSpace(unsigned AS) const {
  if (AS == LangAS::default && LangOpts.OpenCL)
// For OpenCL, only function local variables are not explicitly marked with 
an
// address space in the AST, and these need to be the address space of 
alloca.
return getTargetInfo().getDataLayout().getAllocaAddrSpace();
  if (AS >= LangAS::target_first)
return AS - LangAS::target_first;
  else
return (*AddrSpaceMap)[AS];
}
```

Each target AddrSpaceMap would map LangAS::default to that target's default 
generic address space since that matches most other languages.

The address space attribute would need a corresponding "+ LangAS::target_first" 
to the value it stored in the AST.

Then it is possible to definitively tell when an AST node has not had any 
address space specified as it will be the LangAS::default value.


https://reviews.llvm.org/D31404



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31210: [AMDGPU] Switch address space mapping by triple environment

2017-03-24 Thread Tony Tye via Phabricator via cfe-commits

t-tye accepted this revision.
t-tye added a comment.
This revision is now accepted and ready to land.

Just a couple of suggestions, otherwise:

LGTM




Comment at: lib/Basic/Targets.cpp:2015
 
-static const unsigned AMDGPUAddrSpaceMap[] = {
-  1,// opencl_global
-  3,// opencl_local
-  2,// opencl_constant
-  4,// opencl_generic
-  1,// cuda_device
-  2,// cuda_constant
-  3 // cuda_shared
+static LangAS::Map AMDGPUPrivateIsZeroMap = {
+1,  // opencl_global

Is there a reason this is no longer const?



Comment at: lib/Basic/Targets.cpp:2024
+};
+static LangAS::Map AMDGPUGenericIsZeroMap = {
+1,  // opencl_global

Same as above.



Comment at: lib/Basic/Targets.cpp:2357
+
+  AddrSpace AS;
 };

Could this be const since it is not changed after construction?


https://reviews.llvm.org/D31210



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D31210: [AMDGPU] Add new address space mapping

2017-03-24 Thread Tony Tye via Phabricator via cfe-commits

t-tye requested changes to this revision.
t-tye added a comment.
This revision now requires changes to proceed.

Also please upload as full diff.




Comment at: lib/Basic/Targets.cpp:2026-2069
+  struct AddrSpace {
+unsigned Generic, Global, Local, Constant, Private;
+bool IsGenericZero;
+AddrSpace(bool IsGenericZero_ = false){
+  reset(IsGenericZero_);
+}
+void reset(bool IsGenericZero_) {

Now that the target triple is being used to control the address space mapping, 
the mapping is only being set once. So suggest simplifying this and simply have 
two static arrays for the address spaces (like is done for the data layout), 
and then set AddrSpaceMap to the address of the appropriate array.



Comment at: lib/Basic/Targets.cpp:2119
 
-AddrSpaceMap = 
+AddrSpaceMap = AS.getMap();
 UseAddrSpaceMapMangling = true;

See comment above.


https://reviews.llvm.org/D31210



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

96 matches

Mail list logo