[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-16 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat closed 
https://github.com/llvm/llvm-project/pull/91635
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/91635
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/91635

From 57ad704c30866a7d85f43b016583675e70de8531 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Thu, 9 May 2024 18:32:57 +0200
Subject: [PATCH 1/2] [analyzer] Clean up list of taint propagation functions

This commit refactors GenericTaintChecker and performs various
improvements in the list of taint propagation functions:

(1) The matching mode (usually `CDM::CLibrary` or
`CDM::CLibraryMaybeHardened`) was specified to avoid matching e.g.
C++ methods or functions from a user-defined namespace that happen
to share the name of a well-known library function.
(2) With these matching modes, a `CallDescription` can automatically
match builtin variants of the functions, so entries that explicitly
specified a builtin function were removed. This eliminated
inconsistencies where the "normal" and the builtin variant of the
same function was handled differently (e.g. `__builtin_strlcat` was
covered, while plain `strlcat` wasn't; while `__builtin_memcpy` and
`memcpy` were both on the list with different propagation rules).
(3) The modeling of the functions `strlcat` and `strncat` was updated to
propagate taint from the first argument (index 0), because a tainted
string should remain tainted even if we append something else to it.
Note that this was already applied to `strcat` and `wcsncat` by
commit 6ceb1c0ef9f544be0eed65e46cc7d99941a001bf.
(4) Some functions were updated to propagate taint from a size/length
argument to the result: e.g. `memcmp(p, q, get_tainted_int())` will
now return a tainted value (because the attacker can manipulate it).
This principle was already present in some propagation rules (e.g.
`__builtin_memcpy` was handled this way), and even after this commit
there are still some functions where it isn't applied. (I only aimed
for consistency within the same function family.)
(5) Functions that have hardened `__FOO_chk()` variants are matched in
`CDM:CLibraryMaybeHardened` to ensure consitent handling of the
"normal" and the hardened variant. I added special handling for the
hardened variants of "sprintf" and "snprintf" because there the
extra parameters are inserted into the middle of the parameter list.
(6) Modeling of `sscanf_s` was added, to complete the group of `fscanf`,
fscanf_s` and `sscanf`.
(7) The `Source()` specifications for `gets`, `gets_s` and `wgetch` were
ill-formed: they were specifying variadic arguments starting at
argument index `ReturnValueIndex`. (That is, in addition to the
return value they were propagating taint to all arguments.)
(8) Functions that were related to each other were grouped together. (I
know that this makes the diff harder to read, but I felt that the
full list is unreadable without some reorganization.)
(9) I spotted and removed some redundant curly braces. Perhaps would be
good to switch to a cleaner layout with less nested braces...
(10) I updated some obsolete comments and added two TODOs for issues
that should be fixed in followup commits.
---
 .../Checkers/GenericTaintChecker.cpp  | 370 ++
 1 file changed, 204 insertions(+), 166 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index d17f5ddf07055..a4ade7aaf590c 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -400,17 +400,14 @@ class GenericTaintChecker : public 
Checker {
   void taintUnsafeSocketProtocol(const CallEvent ,
  CheckerContext ) const;
 
-  /// Default taint rules are initalized with the help of a CheckerContext to
-  /// access the names of built-in functions like memcpy.
+  /// The taint rules are initalized with the help of a CheckerContext to
+  /// access user-provided configuration.
   void initTaintRules(CheckerContext ) const;
 
-  /// CallDescription currently cannot restrict matches to the global namespace
-  /// only, which is why multiple CallDescriptionMaps are used, as we want to
-  /// disambiguate global C functions from functions inside user-defined
-  /// namespaces.
-  // TODO: Remove separation to simplify matching logic once CallDescriptions
-  // are more expressive.
-
+  // TODO: The two separate `CallDescriptionMap`s were introduced when
+  // `CallDescription` was unable to restric matches to the global namespace
+  // only. This limitation no longer exists, so the following two maps should
+  // be unified.
   mutable std::optional StaticTaintRules;
   mutable std::optional DynamicTaintRules;
 };
@@ -506,7 +503,8 @@ void GenericTaintRuleParser::consumeRulesFromConfig(const 
Config ,
 GenericTaintRule &,
 

[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Donát Nagy via cfe-commits


@@ -572,196 +570,236 @@ void GenericTaintChecker::initTaintRules(CheckerContext 
) const {
   std::vector>;
   using TR = GenericTaintRule;
 
-  const Builtin::Context  = C.getASTContext().BuiltinInfo;
-
   RulesConstructionTy GlobalCRules{
   // Sources
-  {{{"fdopen"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"fopen"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"freopen"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getch"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getchar"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getchar_unlocked"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"gets"}}, TR::Source({{0}, ReturnValueIndex})},
-  {{{"gets_s"}}, TR::Source({{0}, ReturnValueIndex})},
-  {{{"scanf"}}, TR::Source({{}, 1})},
-  {{{"scanf_s"}}, TR::Source({{}, {1}})},
-  {{{"wgetch"}}, TR::Source({{}, ReturnValueIndex})},
+  {{CDM::CLibrary, {"fdopen"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"fopen"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"freopen"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getch"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getchar"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getchar_unlocked"}}, 
TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"gets"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"gets_s"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"scanf"}}, TR::Source({{}, 1})},
+  {{CDM::CLibrary, {"scanf_s"}}, TR::Source({{}, 1})},
+  {{CDM::CLibrary, {"wgetch"}}, TR::Source({{ReturnValueIndex}})},
   // Sometimes the line between taint sources and propagators is blurry.
   // _IO_getc is choosen to be a source, but could also be a propagator.
   // This way it is simpler, as modeling it as a propagator would require
   // to model the possible sources of _IO_FILE * values, which the _IO_getc
   // function takes as parameters.
-  {{{"_IO_getc"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getcwd"}}, TR::Source({{0, ReturnValueIndex}})},
-  {{{"getwd"}}, TR::Source({{0, ReturnValueIndex}})},
-  {{{"readlink"}}, TR::Source({{1, ReturnValueIndex}})},
-  {{{"readlinkat"}}, TR::Source({{2, ReturnValueIndex}})},
-  {{{"get_current_dir_name"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"gethostname"}}, TR::Source({{0}})},
-  {{{"getnameinfo"}}, TR::Source({{2, 4}})},
-  {{{"getseuserbyname"}}, TR::Source({{1, 2}})},
-  {{{"getgroups"}}, TR::Source({{1, ReturnValueIndex}})},
-  {{{"getlogin"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getlogin_r"}}, TR::Source({{0}})},
+  {{CDM::CLibrary, {"_IO_getc"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getcwd"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getwd"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"readlink"}}, TR::Source({{1, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"readlinkat"}}, TR::Source({{2, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"get_current_dir_name"}},
+   TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"gethostname"}}, TR::Source({{0}})},
+  {{CDM::CLibrary, {"getnameinfo"}}, TR::Source({{2, 4}})},
+  {{CDM::CLibrary, {"getseuserbyname"}}, TR::Source({{1, 2}})},
+  {{CDM::CLibrary, {"getgroups"}}, TR::Source({{1, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getlogin"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getlogin_r"}}, TR::Source({{0}})},
 
   // Props
-  {{{"accept"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"atoi"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"atol"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"atoll"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fgetc"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fgetln"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fgets"}}, TR::Prop({{2}}, {{0, ReturnValueIndex}})},
-  {{{"fgetws"}}, TR::Prop({{2}}, {{0, ReturnValueIndex}})},
-  {{{"fscanf"}}, TR::Prop({{0}}, {{}, 2})},
-  {{{"fscanf_s"}}, TR::Prop({{0}}, {{}, {2}})},
-  {{{"sscanf"}}, TR::Prop({{0}}, {{}, 2})},
-
-  {{{"getc"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"getc_unlocked"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"getdelim"}}, TR::Prop({{3}}, {{0}})},
-  {{{"getline"}}, TR::Prop({{2}}, {{0}})},
-  {{{"getw"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"pread"}}, TR::Prop({{0, 1, 2, 3}}, {{1, ReturnValueIndex}})},
-  {{{"read"}}, TR::Prop({{0, 2}}, {{1, ReturnValueIndex}})},
-  {{{"strchr"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strrchr"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"tolower"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"toupper"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fread"}}, TR::Prop({{3}}, {{0, 

[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Balazs Benics via cfe-commits


@@ -572,196 +570,236 @@ void GenericTaintChecker::initTaintRules(CheckerContext 
) const {
   std::vector>;
   using TR = GenericTaintRule;
 
-  const Builtin::Context  = C.getASTContext().BuiltinInfo;
-
   RulesConstructionTy GlobalCRules{
   // Sources
-  {{{"fdopen"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"fopen"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"freopen"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getch"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getchar"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getchar_unlocked"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"gets"}}, TR::Source({{0}, ReturnValueIndex})},
-  {{{"gets_s"}}, TR::Source({{0}, ReturnValueIndex})},
-  {{{"scanf"}}, TR::Source({{}, 1})},
-  {{{"scanf_s"}}, TR::Source({{}, {1}})},
-  {{{"wgetch"}}, TR::Source({{}, ReturnValueIndex})},
+  {{CDM::CLibrary, {"fdopen"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"fopen"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"freopen"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getch"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getchar"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getchar_unlocked"}}, 
TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"gets"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"gets_s"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"scanf"}}, TR::Source({{}, 1})},
+  {{CDM::CLibrary, {"scanf_s"}}, TR::Source({{}, 1})},
+  {{CDM::CLibrary, {"wgetch"}}, TR::Source({{ReturnValueIndex}})},
   // Sometimes the line between taint sources and propagators is blurry.
   // _IO_getc is choosen to be a source, but could also be a propagator.
   // This way it is simpler, as modeling it as a propagator would require
   // to model the possible sources of _IO_FILE * values, which the _IO_getc
   // function takes as parameters.
-  {{{"_IO_getc"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getcwd"}}, TR::Source({{0, ReturnValueIndex}})},
-  {{{"getwd"}}, TR::Source({{0, ReturnValueIndex}})},
-  {{{"readlink"}}, TR::Source({{1, ReturnValueIndex}})},
-  {{{"readlinkat"}}, TR::Source({{2, ReturnValueIndex}})},
-  {{{"get_current_dir_name"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"gethostname"}}, TR::Source({{0}})},
-  {{{"getnameinfo"}}, TR::Source({{2, 4}})},
-  {{{"getseuserbyname"}}, TR::Source({{1, 2}})},
-  {{{"getgroups"}}, TR::Source({{1, ReturnValueIndex}})},
-  {{{"getlogin"}}, TR::Source({{ReturnValueIndex}})},
-  {{{"getlogin_r"}}, TR::Source({{0}})},
+  {{CDM::CLibrary, {"_IO_getc"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getcwd"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getwd"}}, TR::Source({{0, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"readlink"}}, TR::Source({{1, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"readlinkat"}}, TR::Source({{2, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"get_current_dir_name"}},
+   TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"gethostname"}}, TR::Source({{0}})},
+  {{CDM::CLibrary, {"getnameinfo"}}, TR::Source({{2, 4}})},
+  {{CDM::CLibrary, {"getseuserbyname"}}, TR::Source({{1, 2}})},
+  {{CDM::CLibrary, {"getgroups"}}, TR::Source({{1, ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getlogin"}}, TR::Source({{ReturnValueIndex}})},
+  {{CDM::CLibrary, {"getlogin_r"}}, TR::Source({{0}})},
 
   // Props
-  {{{"accept"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"atoi"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"atol"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"atoll"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fgetc"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fgetln"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fgets"}}, TR::Prop({{2}}, {{0, ReturnValueIndex}})},
-  {{{"fgetws"}}, TR::Prop({{2}}, {{0, ReturnValueIndex}})},
-  {{{"fscanf"}}, TR::Prop({{0}}, {{}, 2})},
-  {{{"fscanf_s"}}, TR::Prop({{0}}, {{}, {2}})},
-  {{{"sscanf"}}, TR::Prop({{0}}, {{}, 2})},
-
-  {{{"getc"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"getc_unlocked"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"getdelim"}}, TR::Prop({{3}}, {{0}})},
-  {{{"getline"}}, TR::Prop({{2}}, {{0}})},
-  {{{"getw"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"pread"}}, TR::Prop({{0, 1, 2, 3}}, {{1, ReturnValueIndex}})},
-  {{{"read"}}, TR::Prop({{0, 2}}, {{1, ReturnValueIndex}})},
-  {{{"strchr"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"strrchr"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"tolower"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"toupper"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-  {{{"fread"}}, TR::Prop({{3}}, {{0, 

[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Balazs Benics via cfe-commits


@@ -400,17 +400,14 @@ class GenericTaintChecker : public 
Checker {
   void taintUnsafeSocketProtocol(const CallEvent ,
  CheckerContext ) const;
 
-  /// Default taint rules are initalized with the help of a CheckerContext to
-  /// access the names of built-in functions like memcpy.
+  /// The taint rules are initalized with the help of a CheckerContext to
+  /// access user-provided configuration.
   void initTaintRules(CheckerContext ) const;
 
-  /// CallDescription currently cannot restrict matches to the global namespace
-  /// only, which is why multiple CallDescriptionMaps are used, as we want to
-  /// disambiguate global C functions from functions inside user-defined
-  /// namespaces.
-  // TODO: Remove separation to simplify matching logic once CallDescriptions
-  // are more expressive.
-
+  // TODO: The two separate `CallDescriptionMap`s were introduced when
+  // `CallDescription` was unable to restric matches to the global namespace

steakhal wrote:

```suggestion
  // `CallDescription` was unable to restrict matches to the global namespace
```

https://github.com/llvm/llvm-project/pull/91635
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

LGTM.
I haven't checked absolutely everything, rather sampled.
It looked correct, and free of copy-paste mistakes.

https://github.com/llvm/llvm-project/pull/91635
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-14 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/91635
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Clean up list of taint propagation functions (PR #91635)

2024-05-09 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat created 
https://github.com/llvm/llvm-project/pull/91635

This commit refactors GenericTaintChecker and performs various improvements in 
the list of taint propagation functions:

1. The matching mode (usually `CDM::CLibrary` or `CDM::CLibraryMaybeHardened`) 
was specified to avoid matching e.g. C++ methods or functions from a 
user-defined namespace that happen to share the name of a well-known library 
function.
2. With these matching modes, a `CallDescription` can automatically match 
builtin variants of the functions, so entries that explicitly specified a 
builtin function were removed. This eliminated inconsistencies where the 
"normal" and the builtin variant of the same function was handled differently 
(e.g. `__builtin_strlcat` was covered, while plain `strlcat` wasn't; while 
`__builtin_memcpy` and `memcpy` were both on the list with different 
propagation rules).
3. The modeling of the functions `strlcat` and `strncat` was updated to 
propagate taint from the first argument (index 0), because a tainted string 
should remain tainted even if we append something else to it. Note that this 
was already applied to `strcat` and `wcsncat` by commit 
6ceb1c0ef9f544be0eed65e46cc7d99941a001bf.
4. Some functions were updated to propagate taint from a size/length argument 
to the result: e.g. `memcmp(p, q, get_tainted_int())` will now return a tainted 
value (because the attacker can manipulate it). This principle was already 
present in some propagation rules (e.g. `__builtin_memcpy` was handled this 
way), and even after this commit there are still some functions where it isn't 
applied. (I only aimed for consistency within the same function family.)
5. Functions that have hardened `__FOO_chk()` variants are matched in 
`CDM:CLibraryMaybeHardened` to ensure consitent handling of the "normal" and 
the hardened variant. I added special handling for the hardened variants of 
"sprintf" and "snprintf" because there the extra parameters are inserted into 
the middle of the parameter list.
6. Modeling of `sscanf_s` was added, to complete the group of `fscanf`, 
`fscanf_s` and `sscanf`.
7. The `Source()` specifications for `gets`, `gets_s` and `wgetch` were 
ill-formed: they were specifying variadic arguments starting at argument index 
`ReturnValueIndex`. (That is, in addition to the return value they were 
propagating taint to all arguments.)
8. Functions that were related to each other were grouped together. (I know 
that this makes the diff harder to read, but I felt that the full list is 
unreadable without some reorganization.)
9. I spotted and removed some redundant curly braces. Perhaps would be good to 
switch to a cleaner layout with less nested braces...
10. I updated some obsolete comments and added two TODOs for issues that should 
be fixed in followup commits.

From 57ad704c30866a7d85f43b016583675e70de8531 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Thu, 9 May 2024 18:32:57 +0200
Subject: [PATCH] [analyzer] Clean up list of taint propagation functions

This commit refactors GenericTaintChecker and performs various
improvements in the list of taint propagation functions:

(1) The matching mode (usually `CDM::CLibrary` or
`CDM::CLibraryMaybeHardened`) was specified to avoid matching e.g.
C++ methods or functions from a user-defined namespace that happen
to share the name of a well-known library function.
(2) With these matching modes, a `CallDescription` can automatically
match builtin variants of the functions, so entries that explicitly
specified a builtin function were removed. This eliminated
inconsistencies where the "normal" and the builtin variant of the
same function was handled differently (e.g. `__builtin_strlcat` was
covered, while plain `strlcat` wasn't; while `__builtin_memcpy` and
`memcpy` were both on the list with different propagation rules).
(3) The modeling of the functions `strlcat` and `strncat` was updated to
propagate taint from the first argument (index 0), because a tainted
string should remain tainted even if we append something else to it.
Note that this was already applied to `strcat` and `wcsncat` by
commit 6ceb1c0ef9f544be0eed65e46cc7d99941a001bf.
(4) Some functions were updated to propagate taint from a size/length
argument to the result: e.g. `memcmp(p, q, get_tainted_int())` will
now return a tainted value (because the attacker can manipulate it).
This principle was already present in some propagation rules (e.g.
`__builtin_memcpy` was handled this way), and even after this commit
there are still some functions where it isn't applied. (I only aimed
for consistency within the same function family.)
(5) Functions that have hardened `__FOO_chk()` variants are matched in
`CDM:CLibraryMaybeHardened` to ensure consitent handling of the
"normal" and the hardened variant. I added special handling for the
hardened variants of "sprintf"