steakhal added a comment.

Huh, this was a long one. 😅 🚀



================
Comment at: clang/docs/analyzer/checkers.rst:2361
 Default propagations defined by ``GenericTaintChecker``:
-``atoi``, ``atol``, ``atoll``, ``fgetc``, ``fgetln``, ``fgets``, ``fscanf``, 
``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, 
``pread``, ``read``, ``strchr``, ``strrchr``, ``tolower``, ``toupper``
+``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``, 
``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``, 
``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``, 
``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``, 
``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``, 
``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``, 
``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``, 
``qsort_r``, ``rawmemchr``, ``read``, ``readv``, ``recv``, ``recvfrom``, 
``rindex``, ``strcasestr``, ``strchr``, ``strchrnul``,  ``strcasecmp``, 
``strcmp``, ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``, 
``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, ``strstr``, 
``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``, ``toupper``, 
``ttyname``, ``ttyname_r``, ``vfscanf``, ``vfscanf``, ``wctomb``, ``wcwidth``
 
----------------
I cannot see the corresponding propagation rule.
That being said, it would be handy to mention that this is for `zlib` 
decompression and this should be probably a taint source anyway.

`vfscanf` occurs two times.

`vscanf` is not mentioned here; and there are probably a couple others like 
this.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561
+      {{"vscanf"}, TR::Prop({{0}}, {{}, 1})},
+      {{"vfscanf"}, TR::Prop({{0}}, {{}, 2})},
+
----------------
This function has exactly 3 arguments.
I'm also puzzled how tainting `va_list` will work out. That should be modeled 
separately IMO.
This comment applies to all of the similar `va_list` accepting functions, such 
as `vscanf`, `vfscanf`, and possibly others.

That being said, I think `vscanf` is more like `scanf`, so it should be modeled 
as a taint source instead of a propagator.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:575
+      {{"fread"}, TR::Prop({{3}}, {{0, ReturnValueIndex}})},
+      {{"readv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+      {{"recv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
----------------
I'm on board with marking read operations as props/sources.
Let's look at the declaration: `ssize_t readv(int fd, const struct iovec *iov, 
int iovcnt);`
I'm not sure how could the pointee of `iov` be modified by this call, as its 
`const`.
Additionally, I doubt the effectiveness of the rule, since I don't think it 
would be too likely to have a path leading to a taint sink with an `iov` 
pointer. That being said, let it be, but don't expect much from this rule :D


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:577
+      {{"recv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+      {{"recvfrom"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+
----------------
It's sad that we don't model `recvmsg`, but it would be quite difficult to 
model. I can see why you left that out.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:579-580
+
+      {{"ttyname"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"ttyname_r"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+
----------------
I'm not sure how could these be used as gadgets, but let it be.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:582-583
+
+      {{"dirname"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"basename"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"fnmatch"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
----------------
These should be sorted, isn't it?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:584
+      {{"basename"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"fnmatch"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"memchr"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
----------------
> `int fnmatch(const char *pattern, const char *string, int flags);`
>The `fnmatch()` function checks whether the string argument matches the 
>pattern argument, which is a shell wildcard pattern (see `glob(7)`).
From this //man// entry I would think that we should propagate from the 
**second** argument.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:596
+      {{"memmove"}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
+      {{"memmem"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+
----------------
One could say that if `memmem` was called with a tainted //needle// and 
actually finds something in the //haystack//, that would mean essentially that 
the value pointed by the return value has the same content as the //needle//.
However, I still agree with the current propagation rule, depending only on 
`arg 0`.
This might reasoning might deserve a comment though.
`strstr` also shares this property.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:606
+
+      // FIXME: in case of arrays ,only the first element of the array gets
+      // tainted.
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:615-616
+      {{"strncasecmp"}, TR::Prop({{0, 1, 2}}, {{ReturnValueIndex}})},
+      {{"strspn"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"strcspn"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"strpbrk"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
----------------
IMO these should propagate from `{0,1}`, similarly how the `strncmp` does 
propagate from its last argument.


================
Comment at: clang/test/Analysis/taint-generic.c:372-378
+int testVscanf(int *d) {
+  char format[10];
+  scanf("%9s", format); // fake a tainted a file descriptor
+
+  vscanf(format, &d);
+  return 1 / *d; // expected-warning {{Division by a tainted value, possibly 
zero}}
+}
----------------
This test case definitely needs to be reworked.

1) We don't have FDs in this context, unlike the comment suggests.
2) `vscanf` should be an unconditional taint source, instead of being a 
propagator.


================
Comment at: clang/test/Analysis/taint-generic.c:392-395
+  if (some_global_flag_to_branch_on) // just to have 2 branches, and assert 2 
division by zero messages
+    return 1 / *buffer;              // expected-warning {{Division by a 
tainted value, possibly zero}}
+
+  return 1 / read; // expected-warning {{Division by a tainted value, possibly 
zero}}
----------------
For this type of things I can recommend using the `clang_analyzer_isTainted(T)` 
`debug.ExprInspection` introspection function.
It will do the right thing, without sinking the execution path.

I've seen in some other test cases that you used an extra top-level fn 
parameter for the same.
That's slightly better than using a global, but I would still recommend the 
debug function.


================
Comment at: clang/test/Analysis/taint-generic.c:399-400
+struct iovec {
+  void *iov_base; /* Starting address */
+  size_t iov_len; /* Number of bytes to transfer */
+};
----------------
Please use single-line comments.
It makes debugging test files easier in some cases.


================
Comment at: clang/test/Analysis/taint-generic.c:408
+  size_t read = readv(fd, iov, iovcnt);
+  // FIXME: should be able to assert that iov is also tainted
+  return 1 / read; // expected-warning {{Division by a tainted value, possibly 
zero}}
----------------
`clang_analyzer_isTainted(*iov)`


================
Comment at: clang/test/Analysis/taint-generic.c:905
+int isspace(int c);
+int testIsspace() {
+  char c;
----------------
You should group these all into a single test case.
This way the setup code for the test dominates compared to the actual content.


================
Comment at: clang/test/Analysis/taint-generic.c:927-928
+int cmp_less(const void *lhs, const void *rhs) {
+  return *(int *)lhs < *(int *)rhs ? -1 : *(int *)lhs > *(int *)rhs ? 1
+                                                                    : 0;
+}
----------------
>The comparison function must return an integer less than, equal to, or greater 
>than zero if the first argument is considered to be respectively less than, 
>equal to, or greater than the second.


================
Comment at: clang/test/Analysis/taint-generic.c:939-942
+int cmp_less_than(const void *lhs, const void *rhs, void *baseline) {
+  return *(int *)lhs < *(int *)baseline ? -1 : *(int *)lhs > *(int *)baseline 
? 1
+                                                                              
: 0;
+}
----------------
Just use the `cmp_less` instead.
You can also fuse the qsort test cases into a single function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120369/new/

https://reviews.llvm.org/D120369

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to