https://github.com/gbMattN updated https://github.com/llvm/llvm-project/pull/123595
>From 807c2c8be0517cbb1b9db890f48baeb6f226ba2f Mon Sep 17 00:00:00 2001 From: gbMattN <matthew.n...@sony.com> Date: Mon, 20 Jan 2025 11:02:06 +0000 Subject: [PATCH 1/9] [TySan] Add initial documentation --- clang/docs/TypeSanitizer.rst | 152 +++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 clang/docs/TypeSanitizer.rst diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst new file mode 100644 index 00000000000000..6b320f3bb1773d --- /dev/null +++ b/clang/docs/TypeSanitizer.rst @@ -0,0 +1,152 @@ +================ +TypeSanitizer +================ + +.. contents:: + :local: + +Introduction +============ + +TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler +instrumentation module and a run-time library. The tool detects violations such as the use +of an illegally cast pointer, or misuse of a union. + +The violations TypeSanitizer catches may cause the compiler to emit incorrect code. + +Typical slowdown introduced by TypeSanitizer is about **4x** [[CHECK THIS]]. Typical memory overhead introduced by TypeSanitizer is about **9x**. + +How to build +============ + +Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable +the ``compiler-rt`` runtime. An example CMake configuration that will allow +for the use/testing of TypeSanitizer: + +.. code-block:: console + + $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm + +Usage +===== + +Compile and link your program with ``-fsanitize=type`` flag. The +TypeSanitizer run-time library should be linked to the final executable, so +make sure to use ``clang`` (not ``ld``) for the final link step. To +get a reasonable performance add ``-O1`` or higher +(`This may currently lead to false-negatives <https://github.com/llvm/llvm-project/issues/120855>`). +TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` +to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and +``-g``. To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination +(``-fno-optimize-sibling-calls``). + +.. code-block:: console + + % cat example_AliasViolation.c + int main(int argc, char **argv) { + int x = 100; + float *y = (float*)&x; + *y += 2.0f; // Strict aliasing violation + return 0; + } + + # Compile and link + % clang++ -g -fsanitize=type example_AliasViolation.cc + +If a strict aliasing violation is detected, the program will print an error message to stderr. +The program won't terminate, which will allow you to detect many strict aliasing violations in one +run. + +.. code-block:: console + % ./a.out + ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532) + READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int + #0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10 + + ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532) + WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int + #0 0x5b3b11460089 in main example_AliasViolation.c:4:10 + +Error terminology +------------------ + +There are some terms that may appear in TypeSanitizer errors that are derived from TBAA Metadata. This +section hopes to provide a brief dictionary of these terms. + +* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ + type ``char``. +* ``type p[x]``: Sometimes a program could generate distinct TBAA metadata that resolve to the same name. + To make them unique, they have the character 'p' and a number prepended to their name. + +These terms are a result of non-user-facing processes, and not always self-explanatory. There is some +interest in changing TypeSanitizer in the future to translate these terms before printing them to users. + +Sanitizer features +================== + +``__has_feature(type_sanitizer)`` +------------------------------------ + +In some cases one may need to execute different code depending on whether +TypeSanitizer is enabled. +:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for +this purpose. + +.. code-block:: c + + #if defined(__has_feature) + # if __has_feature(type_sanitizer) + // code that builds only under TypeSanitizer + # endif + #endif + +``__attribute__((no_sanitize("type")))`` +----------------------------------------------- + +Some code you may not want to be instrumented by TypeSanitizer. One may use the +function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing. +Its possible, depending on what happens in non-instrumented code, that instrumented code +emits false-positives/ false-negatives. This attribute may not be supported by other +compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``. + +``__attribute__((disable_sanitizer_instrumentation))`` +-------------------------------------------------------- + +The ``disable_sanitizer_instrumentation`` attribute can be applied to functions +to prevent all kinds of instrumentation. As a result, it may introduce false +positives and incorrect stack traces. Therefore, it should be used with care, +and only if absolutely required; for example for certain code that cannot +tolerate any instrumentation and resulting side-effects. This attribute +overrides ``no_sanitize("type")``. + +Ignorelist +---------- + +TypeSanitizer supports ``src`` and ``fun`` entity types in +:doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing +violation reports in the specified source files or functions. Like +with other methods of ignoring instrumentation, this can result in false +positives/ false-negatives. + +Limitations +----------- + +* TypeSanitizer uses more real memory than a native run. It uses 8 bytes of + shadow memory for each byte of user memory. +* There are transformation passes which run before TypeSanitizer. If these + passes optimize out an aliasing violation, TypeSanitizer cannot catch it. +* Currently, all instrumentation is inlined. This can result in a **15x** + (on average) increase in generated file size, and **3x** to **7x** increase + in compile time. In some documented cases this can cause the compiler to hang. + A fix for this is in the last stages of release. +* Codebases that use unions and struct-initialized variables can see incorrect + results, as TypeSanitizer doesn't yet instrument these reliably. + +Current Status +-------------- + +TypeSanitizer is brand new, and still in development. There are some known +issues, especially in areas where clang doesn't generate valid TBAA metadata. + +We are actively working on enhancing the tool --- stay tuned. Any help, +issues, pull requests, ideas, is more than welcome. >From 5c9d8f8176ebcf1bd3f1ef49ffb0e685c50d0749 Mon Sep 17 00:00:00 2001 From: gbMattN <matthew.n...@sony.com> Date: Mon, 20 Jan 2025 11:41:35 +0000 Subject: [PATCH 2/9] Tweaks and edits --- clang/docs/TypeSanitizer.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index 6b320f3bb1773d..ceb2fca37df904 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -33,8 +33,7 @@ Usage Compile and link your program with ``-fsanitize=type`` flag. The TypeSanitizer run-time library should be linked to the final executable, so make sure to use ``clang`` (not ``ld``) for the final link step. To -get a reasonable performance add ``-O1`` or higher -(`This may currently lead to false-negatives <https://github.com/llvm/llvm-project/issues/120855>`). +get a reasonable performance add ``-O1`` or higher. TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and ``-g``. To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination @@ -70,8 +69,9 @@ run. Error terminology ------------------ -There are some terms that may appear in TypeSanitizer errors that are derived from TBAA Metadata. This -section hopes to provide a brief dictionary of these terms. +There are some terms that may appear in TypeSanitizer errors that are derived from +`TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a +brief dictionary of these terms. * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ type ``char``. @@ -105,7 +105,7 @@ this purpose. Some code you may not want to be instrumented by TypeSanitizer. One may use the function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing. -Its possible, depending on what happens in non-instrumented code, that instrumented code +It is possible, depending on what happens in non-instrumented code, that instrumented code emits false-positives/ false-negatives. This attribute may not be supported by other compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``. @@ -138,7 +138,7 @@ Limitations * Currently, all instrumentation is inlined. This can result in a **15x** (on average) increase in generated file size, and **3x** to **7x** increase in compile time. In some documented cases this can cause the compiler to hang. - A fix for this is in the last stages of release. + There are plans to improve this in the future. * Codebases that use unions and struct-initialized variables can see incorrect results, as TypeSanitizer doesn't yet instrument these reliably. >From 3645fc18e198d0642543b002f1853e983dab1b65 Mon Sep 17 00:00:00 2001 From: gbMattN <matthew.n...@sony.com> Date: Mon, 20 Jan 2025 15:17:54 +0000 Subject: [PATCH 3/9] Fixed error in code block --- clang/docs/TypeSanitizer.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index ceb2fca37df904..20d0fc71775237 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -57,6 +57,7 @@ The program won't terminate, which will allow you to detect many strict aliasing run. .. code-block:: console + % ./a.out ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532) READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int >From 3b27cf7b653b52d89d669db7b59f96a0ea719d03 Mon Sep 17 00:00:00 2001 From: gbMattN <matthew.n...@sony.com> Date: Mon, 20 Jan 2025 15:31:01 +0000 Subject: [PATCH 4/9] Add TySan links to other doc pages --- clang/docs/UsersManual.rst | 3 +++ clang/docs/index.rst | 1 + 2 files changed, 4 insertions(+) diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst index 260e84910c6f78..a56c9425ebb757 100644 --- a/clang/docs/UsersManual.rst +++ b/clang/docs/UsersManual.rst @@ -2103,7 +2103,10 @@ are listed below. ``-fsanitize=undefined``: :doc:`UndefinedBehaviorSanitizer`, a fast and compatible undefined behavior checker. + - .. _opt_fsanitize_type: + ``-fsanitize=type``: :doc:`TypeSanitizer`, a detector for strict + aliasing violations. - ``-fsanitize=dataflow``: :doc:`DataFlowSanitizer`, a general data flow analysis. - ``-fsanitize=cfi``: :doc:`control flow integrity <ControlFlowIntegrity>` diff --git a/clang/docs/index.rst b/clang/docs/index.rst index cc070059eede5d..26cc08e23a5762 100644 --- a/clang/docs/index.rst +++ b/clang/docs/index.rst @@ -35,6 +35,7 @@ Using Clang as a Compiler UndefinedBehaviorSanitizer DataFlowSanitizer LeakSanitizer + TypeSanitizer RealtimeSanitizer SanitizerCoverage SanitizerStats >From 8e3fbe17edbc6a8dd429743a8037b93d51deeb66 Mon Sep 17 00:00:00 2001 From: gbMattN <146744444+gbma...@users.noreply.github.com> Date: Mon, 20 Jan 2025 16:45:44 +0000 Subject: [PATCH 5/9] Update clang/docs/TypeSanitizer.rst Co-authored-by: Florian Hahn <f...@fhahn.com> --- clang/docs/TypeSanitizer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index 20d0fc71775237..96855d26186ead 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -52,7 +52,7 @@ to print the full trace. To get nicer stack traces in error messages add ``-fno- # Compile and link % clang++ -g -fsanitize=type example_AliasViolation.cc -If a strict aliasing violation is detected, the program will print an error message to stderr. +The program will print an error message to stderr each time a strict aliasing violation is detected. The program won't terminate, which will allow you to detect many strict aliasing violations in one run. >From b47bb47d5187dfa8507238826bac274f996d25c3 Mon Sep 17 00:00:00 2001 From: gbMattN <matthew.n...@sony.com> Date: Mon, 20 Jan 2025 17:03:33 +0000 Subject: [PATCH 6/9] Touchups --- clang/docs/TypeSanitizer.rst | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index 96855d26186ead..ed68690fafa7ca 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -9,12 +9,13 @@ Introduction ============ TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler -instrumentation module and a run-time library. The tool detects violations such as the use -of an illegally cast pointer, or misuse of a union. +instrumentation module and a run-time library. The tool detects violations where you access +memory under a different type than the dynamic type of the object. The violations TypeSanitizer catches may cause the compiler to emit incorrect code. -Typical slowdown introduced by TypeSanitizer is about **4x** [[CHECK THIS]]. Typical memory overhead introduced by TypeSanitizer is about **9x**. +As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed, +memory use, and code size. How to build ============ @@ -76,11 +77,11 @@ brief dictionary of these terms. * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ type ``char``. -* ``type p[x]``: Sometimes a program could generate distinct TBAA metadata that resolve to the same name. - To make them unique, they have the character 'p' and a number prepended to their name. +* ``type p[x]``: This signifies pointers to the type. x is the number of indirections to reach the final value. + As an example, a pointer to a pointer to an integer would be ``type p2 int``. -These terms are a result of non-user-facing processes, and not always self-explanatory. There is some -interest in changing TypeSanitizer in the future to translate these terms before printing them to users. +TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove +references to LLVM IR specific terms. Sanitizer features ================== @@ -147,7 +148,9 @@ Current Status -------------- TypeSanitizer is brand new, and still in development. There are some known -issues, especially in areas where clang doesn't generate valid TBAA metadata. +issues, especially in areas where Clang's emitted TBAA data isn't extensive +enough for TypeSanitizer's runtime. We are actively working on enhancing the tool --- stay tuned. Any help, -issues, pull requests, ideas, is more than welcome. +issues, pull requests, ideas, is more than welcome. You can find the +`issue tracker here.<https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20TySan%20label%3Acompiler-rt%3Atysan>` >From 9cc3aa3d4f7e08e3b9bc742b1087f1330f1639e2 Mon Sep 17 00:00:00 2001 From: gbMattN <146744444+gbma...@users.noreply.github.com> Date: Tue, 21 Jan 2025 16:38:14 +0000 Subject: [PATCH 7/9] Apply suggestions from code review Co-authored-by: Erich Keane <eke...@nvidia.com> --- clang/docs/TypeSanitizer.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index ed68690fafa7ca..19baf6a792f00a 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -8,7 +8,7 @@ TypeSanitizer Introduction ============ -TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler +The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler instrumentation module and a run-time library. The tool detects violations where you access memory under a different type than the dynamic type of the object. @@ -35,7 +35,7 @@ Compile and link your program with ``-fsanitize=type`` flag. The TypeSanitizer run-time library should be linked to the final executable, so make sure to use ``clang`` (not ``ld``) for the final link step. To get a reasonable performance add ``-O1`` or higher. -TypeSanitizer by default doesn't print the full stack trace on error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` +TypeSanitizer by default doesn't print the full stack trace in error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1`` to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and ``-g``. To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination (``-fno-optimize-sibling-calls``). >From e55c025d51e13a7808c7e6090864327be8aafb5a Mon Sep 17 00:00:00 2001 From: gbMattN <matthew.n...@sony.com> Date: Tue, 21 Jan 2025 17:00:13 +0000 Subject: [PATCH 8/9] Expanded the section on the point of TySan --- clang/docs/TypeSanitizer.rst | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index 19baf6a792f00a..c19356656f9ebd 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -9,10 +9,16 @@ Introduction ============ The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler -instrumentation module and a run-time library. The tool detects violations where you access -memory under a different type than the dynamic type of the object. +instrumentation module and a run-time library. C/C++ has type-based aliasing rules, and LLVM +can exploit these for optimizations given the TBAA metadata Clang emits. In general, a pointer +of a given type cannot access an object of a different type, with only a few exceptions. -The violations TypeSanitizer catches may cause the compiler to emit incorrect code. +These rules aren't always apparent to users, which leads to code that violates these rules +(e.g. for type punning). This can lead to optimization passes introducing bugs unless the +code is build with ``-fno-strict-aliasing``, sacrificing performance. + +TypeSanitizer is built to catch when these strict aliasing rules have been violated, helping +users find where such bugs originate in their code despite the code looking valid at first glance. As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed, memory use, and code size. >From 9204fc8c3598957cfbf51fccc4501f8f99516adb Mon Sep 17 00:00:00 2001 From: gbMattN <146744444+gbma...@users.noreply.github.com> Date: Thu, 23 Jan 2025 10:37:37 +0000 Subject: [PATCH 9/9] Apply suggestions from code review Co-authored-by: Aaron Ballman <aa...@aaronballman.com> --- clang/docs/TypeSanitizer.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst index c19356656f9ebd..33498791f1f5d7 100644 --- a/clang/docs/TypeSanitizer.rst +++ b/clang/docs/TypeSanitizer.rst @@ -1,6 +1,6 @@ -================ +============= TypeSanitizer -================ +============= .. contents:: :local: @@ -59,7 +59,7 @@ to print the full trace. To get nicer stack traces in error messages add ``-fno- # Compile and link % clang++ -g -fsanitize=type example_AliasViolation.cc -The program will print an error message to stderr each time a strict aliasing violation is detected. +The program will print an error message to ``stderr`` each time a strict aliasing violation is detected. The program won't terminate, which will allow you to detect many strict aliasing violations in one run. @@ -83,7 +83,7 @@ brief dictionary of these terms. * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ type ``char``. -* ``type p[x]``: This signifies pointers to the type. x is the number of indirections to reach the final value. +* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value. As an example, a pointer to a pointer to an integer would be ``type p2 int``. TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits