[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-25 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: clang/docs/analyzer/checkers.rst:2469-2472
+  // User csa_mark_sanitize function is for the analyzer only
+  #ifdef __clang_analyzer__
+void csa_mark_sanitized(const void *);
+  #endif

steakhal wrote:
> I was thinking of this when I mentioned "function with empty body".
> Notice the `inline`, so that one could put this into a header file without 
> violating ODR.
> 
> ---
> This way at the callsites, one wouldn't need to ifdef-clutter the code.
> I think this would lead to more maintainable code in the long run, with 
> happier users.
Good Point! I will add this to the next patch to this checker. thanks. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-25 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: clang/docs/analyzer/checkers.rst:2471
+  #ifdef __clang_analyzer__
+void csa_mark_sanitized(const void *);
+  #endif

steakhal wrote:
> Have you considered unconditionally having this function with an empty body?
> That way it would have no "noise" at callsite.
But that way the program would not compile, because the definition would not be 
found. Or maybe I misunderstand you.

Maybe in the future we could add an another type of filter function which would 
support validation type of functions: would sanitize the passed parameter only, 
if the function returns with non-null, non-zero.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-25 Thread Daniel Krupp via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG4dbe2db02d03: [clang][analyzer] Improved documentation for 
TaintPropagation Checker (authored by dkrupp).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

Files:
  clang/docs/analyzer/checkers.rst

Index: clang/docs/analyzer/checkers.rst
===
--- clang/docs/analyzer/checkers.rst
+++ clang/docs/analyzer/checkers.rst
@@ -2359,64 +2359,244 @@
 alpha.security.taint
 
 
-Checkers implementing `taint analysis `_.
+Checkers implementing
+`taint analysis `_.
 
 .. _alpha-security-taint-TaintPropagation:
 
 alpha.security.taint.TaintPropagation (C, C++)
 ""
 
-Taint analysis identifies untrusted sources of information (taint sources), rules as to how the untrusted data flows along the execution path (propagation rules), and points of execution where the use of tainted data is risky (taints sinks).
+Taint analysis identifies potential security vulnerabilities where the
+attacker can inject malicious data to the program to execute an attack
+(privilege escalation, command injection, SQL injection etc.).
+
+The malicious data is injected at the taint source (e.g. ``getenv()`` call)
+which is then propagated through function calls and being used as arguments of
+sensitive operations, also called as taint sinks (e.g. ``system()`` call).
+
+One can defend agains this type of vulnerability by always checking and
+santizing the potentially malicious, untrusted user input.
+
+The goal of the checker is to discover and show to the user these potential
+taint source-sink pairs and the propagation call chain.
+
 The most notable examples of taint sources are:
 
-  - network originating data
+  - data from network
+  - files or standard input
   - environment variables
-  - database originating data
+  - data from databases
 
-``GenericTaintChecker`` is the main implementation checker for this rule, and it generates taint information used by other checkers.
+Let us examine a practical example of a Command Injection attack.
 
 .. code-block:: c
 
- void test() {
-   char x = getchar(); // 'x' marked as tainted
-   system(&x); // warn: untrusted data is passed to a system call
- }
+  // Command Injection Vulnerability Example
+  int main(int argc, char** argv) {
+char cmd[2048] = "/bin/cat ";
+char filename[1024];
+printf("Filename:");
+scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+strcat(cmd, filename);
+system(cmd); // Warning: Untrusted data is passed to a system call
+  }
 
- // note: compiler internally checks if the second param to
- // sprintf is a string literal or not.
- // Use -Wno-format-security to suppress compiler warning.
- void test() {
-   char s[10], buf[10];
-   fscanf(stdin, "%s", s); // 's' marked as tainted
+The program prints the content of any user specified file.
+Unfortunately the attacker can execute arbitrary commands
+with shell escapes. For example with the following input the `ls` command is also
+executed after the contents of `/etc/shadow` is printed.
+`Input: /etc/shadow ; ls /`
 
-   sprintf(buf, s); // warn: untrusted data as a format string
- }
+The analysis implemented in this checker points out this problem.
 
- void test() {
-   size_t ts;
-   scanf("%zd", &ts); // 'ts' marked as tainted
-   int *p = (int *)malloc(ts * sizeof(int));
- // warn: untrusted data as buffer size
- }
+One can protect against such attack by for example checking if the provided
+input refers to a valid file and removing any invalid user input.
+
+.. code-block:: c
+
+  // No vulnerability anymore, but we still get the warning
+  void sanitizeFileName(char* filename){
+if (access(filename,F_OK)){// Verifying user input
+  printf("File does not exist\n");
+  filename[0]='\0';
+  }
+  }
+  int main(int argc, char** argv) {
+char cmd[2048] = "/bin/cat ";
+char filename[1024];
+printf("Filename:");
+scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+sanitizeFileName(filename);// filename is safe after this point
+if (!filename[0])
+  return -1;
+strcat(cmd, filename);
+system(cmd); // Superflous Warning: Untrusted data is passed to a system call
+  }
+
+Unfortunately, the checker cannot discover automatically that the programmer
+have performed data sanitation, so it still emits the warning.
 
-There are built-in sources, propagations and sinks defined in code inside ``GenericTaintChecker``.
-These operations are handled even if no external taint configuration is provided.
+One can get rid of this superflous warning by telling

[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-25 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 543862.
dkrupp added a comment.

-lines wrapped to 80 characters


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

Files:
  clang/docs/analyzer/checkers.rst

Index: clang/docs/analyzer/checkers.rst
===
--- clang/docs/analyzer/checkers.rst
+++ clang/docs/analyzer/checkers.rst
@@ -2359,64 +2359,244 @@
 alpha.security.taint
 
 
-Checkers implementing `taint analysis `_.
+Checkers implementing
+`taint analysis `_.
 
 .. _alpha-security-taint-TaintPropagation:
 
 alpha.security.taint.TaintPropagation (C, C++)
 ""
 
-Taint analysis identifies untrusted sources of information (taint sources), rules as to how the untrusted data flows along the execution path (propagation rules), and points of execution where the use of tainted data is risky (taints sinks).
+Taint analysis identifies potential security vulnerabilities where the
+attacker can inject malicious data to the program to execute an attack
+(privilege escalation, command injection, SQL injection etc.).
+
+The malicious data is injected at the taint source (e.g. ``getenv()`` call)
+which is then propagated through function calls and being used as arguments of
+sensitive operations, also called as taint sinks (e.g. ``system()`` call).
+
+One can defend agains this type of vulnerability by always checking and
+santizing the potentially malicious, untrusted user input.
+
+The goal of the checker is to discover and show to the user these potential
+taint source-sink pairs and the propagation call chain.
+
 The most notable examples of taint sources are:
 
-  - network originating data
+  - data from network
+  - files or standard input
   - environment variables
-  - database originating data
+  - data from databases
 
-``GenericTaintChecker`` is the main implementation checker for this rule, and it generates taint information used by other checkers.
+Let us examine a practical example of a Command Injection attack.
 
 .. code-block:: c
 
- void test() {
-   char x = getchar(); // 'x' marked as tainted
-   system(&x); // warn: untrusted data is passed to a system call
- }
+  // Command Injection Vulnerability Example
+  int main(int argc, char** argv) {
+char cmd[2048] = "/bin/cat ";
+char filename[1024];
+printf("Filename:");
+scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+strcat(cmd, filename);
+system(cmd); // Warning: Untrusted data is passed to a system call
+  }
 
- // note: compiler internally checks if the second param to
- // sprintf is a string literal or not.
- // Use -Wno-format-security to suppress compiler warning.
- void test() {
-   char s[10], buf[10];
-   fscanf(stdin, "%s", s); // 's' marked as tainted
+The program prints the content of any user specified file.
+Unfortunately the attacker can execute arbitrary commands
+with shell escapes. For example with the following input the `ls` command is also
+executed after the contents of `/etc/shadow` is printed.
+`Input: /etc/shadow ; ls /`
 
-   sprintf(buf, s); // warn: untrusted data as a format string
- }
+The analysis implemented in this checker points out this problem.
 
- void test() {
-   size_t ts;
-   scanf("%zd", &ts); // 'ts' marked as tainted
-   int *p = (int *)malloc(ts * sizeof(int));
- // warn: untrusted data as buffer size
- }
+One can protect against such attack by for example checking if the provided
+input refers to a valid file and removing any invalid user input.
+
+.. code-block:: c
+
+  // No vulnerability anymore, but we still get the warning
+  void sanitizeFileName(char* filename){
+if (access(filename,F_OK)){// Verifying user input
+  printf("File does not exist\n");
+  filename[0]='\0';
+  }
+  }
+  int main(int argc, char** argv) {
+char cmd[2048] = "/bin/cat ";
+char filename[1024];
+printf("Filename:");
+scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+sanitizeFileName(filename);// filename is safe after this point
+if (!filename[0])
+  return -1;
+strcat(cmd, filename);
+system(cmd); // Superflous Warning: Untrusted data is passed to a system call
+  }
+
+Unfortunately, the checker cannot discover automatically that the programmer
+have performed data sanitation, so it still emits the warning.
 
-There are built-in sources, propagations and sinks defined in code inside ``GenericTaintChecker``.
-These operations are handled even if no external taint configuration is provided.
+One can get rid of this superflous warning by telling by specifying the
+sanitation functions in the taint configuation file (see
+:doc:`user-docs/TaintAnalysisConfiguration`).
 
-Default sources defined by ``GenericTaintChecker``:
- ``_IO_getc`

[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-24 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

Thanks @donat.nagy for your review. I addressed your remarks. After patch 
https://reviews.llvm.org/D155848 these sanitizing examples work properly.




Comment at: clang/docs/analyzer/checkers.rst:78-80
+The ``SuppressAddressSpaces`` option suppresses
 warnings for null dereferences of all pointers with address spaces. You can
 disable this behavior with the option

donat.nagy wrote:
> Why is this paragraph (and the one above it) wrapped inconsistently? If we 
> are touching these docs, perhaps we could re-wrap them to e.g 80 characters / 
> line.
The formatting of this paragraph should not be impacted by this unrleated 
change. I will revert all unrelated formatting changes.



Comment at: clang/docs/analyzer/checkers.rst:2404
+  }
+  strcat(cmd, filename);
+  system(cmd); // Warning: Untrusted data is passed to a system call

donat.nagy wrote:
> If the filename is too long (more than 1014 characters), this is a buffer 
> overflow. I admit that having a secondary unrelated vulnerability makes the 
> example more realistic :), but I think we should still avoid it. (This also 
> appears in other variants of the example code, including the "No 
> vulnerability anymore" one.)
True. cmd buffer increased to 2048



Comment at: clang/docs/analyzer/checkers.rst:2457-2461
+  if (access(filename,F_OK)){//sanitizing user input
+printf("File does not exist\n");
+return -1;
+  }
+  csa_sanitize(filename); // Indicating to CSA that filename variable is safe 
to be used after this point

donat.nagy wrote:
> Separating the actual sanitization and the function that's magically 
> recognized by the taint checker doesn't seem to be a good design pattern. 
> Here `csa_sanitize()` is just a synonym for the "silence this checker here" 
> marker, which is //very// confusing, because if someone is not familiar with 
> this locally introduced no-op function, they will think that it's performing 
> actual sanitization! At the very least we should rename this magical no-op to 
> `csa_mark_sanitized()` or something similar.
> 
> The root issue is that in this example we would like to use a verifier 
> function (that determines whether the tainted data is safe) instead of a 
> sanitizer function (that can convert //any// tainted data into safe data) and 
> our taint handling engine is not prepared to handle conditional Filter 
> effects like "this function removes taint from its first argument //if its 
> return value is true//".
> 
> I think it would be much better to switch to a different example where the 
> "natural" solution is more aligned with the limited toolbox provided by our 
> taint framework (i.e. it's possible define a filter function that actually 
> removes problematic parts of the untrusted input).
I changed this fist example to be a data sanitation example, where the 
sanitizeFileName(..) function changes the user input to an empty string if the 
filneme is invalid. 

Then in the next example we show the generic csa_mark_sanitized() function and 
how it can be used to mark the valid code paths of verifier functions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-07-24 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 543586.
dkrupp marked 6 inline comments as done.
dkrupp added a comment.
Herald added a subscriber: wangpc.

- changed the main example to a data sanitation example (sanitizeFileName())  
instead of a data verification example
- fixed typos
- fixed sphinx warnings


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229

Files:
  clang/docs/analyzer/checkers.rst

Index: clang/docs/analyzer/checkers.rst
===
--- clang/docs/analyzer/checkers.rst
+++ clang/docs/analyzer/checkers.rst
@@ -2366,58 +2366,202 @@
 alpha.security.taint.TaintPropagation (C, C++)
 ""
 
-Taint analysis identifies untrusted sources of information (taint sources), rules as to how the untrusted data flows along the execution path (propagation rules), and points of execution where the use of tainted data is risky (taints sinks).
+Taint analysis identifies potential security vulnerabilities where the
+attacker can inject malicious data to the program to execute an attack
+(privilege escalation, command injection, SQL injection etc.).
+
+The malicious data is injected at the taint source (e.g. ``getenv()`` call)
+which is then propagated through function calls and being used as arguments of
+sensitive operations, also called as taint sinks (e.g. ``system()`` call).
+
+One can defend agains this type of vulnerability by always checking and
+santizing the potentially malicious, untrusted user input.
+
+The goal of the checker is to discover and show to the user these potential
+taint source-sink pairs and the propagation call chain.
+
 The most notable examples of taint sources are:
 
-  - network originating data
+  - data from network
+  - files or standard input
   - environment variables
-  - database originating data
+  - data from databases
 
-``GenericTaintChecker`` is the main implementation checker for this rule, and it generates taint information used by other checkers.
+Let us examine a practical example of a Command Injection attack.
 
 .. code-block:: c
 
- void test() {
-   char x = getchar(); // 'x' marked as tainted
-   system(&x); // warn: untrusted data is passed to a system call
- }
+  // Command Injection Vulnerability Example
+  int main(int argc, char** argv) {
+char cmd[2048] = "/bin/cat ";
+char filename[1024];
+printf("Filename:");
+scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+strcat(cmd, filename);
+system(cmd); // Warning: Untrusted data is passed to a system call
+  }
 
- // note: compiler internally checks if the second param to
- // sprintf is a string literal or not.
- // Use -Wno-format-security to suppress compiler warning.
- void test() {
-   char s[10], buf[10];
-   fscanf(stdin, "%s", s); // 's' marked as tainted
+The program prints the content of any user specified file.
+Unfortunately the attacker can execute arbitrary commands
+with shell escapes. For example with the following input the `ls` command is also
+executed after the contents of `/etc/shadow` is printed.
+`Input: /etc/shadow ; ls /`
 
-   sprintf(buf, s); // warn: untrusted data as a format string
- }
+The analysis implemented in this checker points out this problem.
 
- void test() {
-   size_t ts;
-   scanf("%zd", &ts); // 'ts' marked as tainted
-   int *p = (int *)malloc(ts * sizeof(int));
- // warn: untrusted data as buffer size
- }
+One can protect against such attack by for example checking if the provided
+input refers to a valid file and removing any invalid user input.
+
+.. code-block:: c
+
+  // No vulnerability anymore, but we still get the warning
+  void sanitizeFileName(char* filename){
+if (access(filename,F_OK)){// Verifying user input
+  printf("File does not exist\n");
+  filename[0]='\0';
+  }
+  }
+  int main(int argc, char** argv) {
+char cmd[2048] = "/bin/cat ";
+char filename[1024];
+printf("Filename:");
+scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+sanitizeFileName(filename);// filename is safe after this point
+if (!filename[0])
+  return -1;
+strcat(cmd, filename);
+system(cmd); // Superflous Warning: Untrusted data is passed to a system call
+  }
+
+Unfortunately, the checker cannot discover automatically
+that the programmer have performed data sanitation, so it still emits the warning.
+
+One can get rid of this superflous warning by telling by specifying the
+sanitation functions in the taint configuation file
+(see :doc:`user-docs/TaintAnalysisConfiguration`).
+
+.. code-block:: YAML
+
+  Filters:
+  - Name: sanitizeFileName
+Args: [0]
+
+The clang invocation to pass the configuration file location:
+
+.. code-block:: bash
+
+  clang  --analyze -Xclang -analyzer-config  -Xclang alpha.security.taint.TaintPropagation:Config=`pwd`/taint_config.yml ...
+
+If you are validating y

[PATCH] D155848: [clang][analyzer]Fix non-effective taint sanitation

2023-07-21 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:970
-  ForEachCallArg(
-  [&](ArgIdxTy I, const Expr *E, SVal V) {
-if (PropDstArgs.contains(I)) {

steakhal wrote:
> dkrupp wrote:
> > dkrupp wrote:
> > > steakhal wrote:
> > > > Is it only a formatting hunk?
> > > > I cannot really tell from my phone xd
> > > yeap. formatting only.
> > Yes. Only formatting.
> I would suggest not mixing formatting changes with semantic changes in a 
> single revision. Consider not doing the formatting, or at least format in a 
> different revision.
> 
> Adding and updating comments are fine.
Yes. I realized. I reverted the formatting change.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155848/new/

https://reviews.llvm.org/D155848

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155848: [clang][analyzer]Fix non-effective taint sanitation

2023-07-21 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 542903.
dkrupp added a comment.

-formatting issues fixed
-sanitizecCmd changed to void


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155848/new/

https://reviews.llvm.org/D155848

Files:
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/test/Analysis/Inputs/taint-generic-config.yaml
  clang/test/Analysis/taint-generic.c


Index: clang/test/Analysis/taint-generic.c
===
--- clang/test/Analysis/taint-generic.c
+++ clang/test/Analysis/taint-generic.c
@@ -1010,7 +1010,8 @@
 void myScanf(const char*, ...);
 int myPropagator(int, int*);
 int mySnprintf(char*, size_t, const char*, ...);
-bool isOutOfRange(const int*);
+bool isOutOfRange(const int*); // const filter function
+void sanitizeCmd(char*); // non-const filter function
 void mySink(int, int, int);
 
 void testConfigurationSources1(void) {
@@ -1044,6 +1045,19 @@
   Buffer[x] = 1; // no-warning
 }
 
+void testConfigurationFilterNonConst(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  system(buffer); // expected-warning {{Untrusted data is passed to a system 
call}}
+}
+
+void testConfigurationFilterNonConst2(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  sanitizeCmd(buffer); // removes taintedness
+  system(buffer); // no-warning
+}
+
 void testConfigurationSinks(void) {
   int x = mySource1();
   mySink(x, 1, 2);
Index: clang/test/Analysis/Inputs/taint-generic-config.yaml
===
--- clang/test/Analysis/Inputs/taint-generic-config.yaml
+++ clang/test/Analysis/Inputs/taint-generic-config.yaml
@@ -69,6 +69,11 @@
 Scope: "myAnotherNamespace::"
 Args:  [0]
 
+  # char *str; // str is tainted
+  # sanitizeCmd(str) // str is not tainted anymore
+  - Name: sanitizeCmd
+Args: [0]
+
 # A list of sink functions
 Sinks:
   # int x, y; // x and y are tainted
Index: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -926,7 +926,9 @@
   });
 
   /// Check for taint propagation sources.
-  /// A rule is relevant if PropSrcArgs is empty, or if any of its signified
+  /// A rule will make the destination variables tainted if PropSrcArgs
+  /// is empty (taints the destination
+  /// arguments unconditionally), or if any of its signified
   /// args are tainted in context of the current CallEvent.
   bool IsMatching = PropSrcArgs.isEmpty();
   std::vector TaintedSymbols;
@@ -949,6 +951,8 @@
 }
   });
 
+  // Early return for propagation rules which dont match.
+  // Matching propagations, Sinks and Filters will pass this point.
   if (!IsMatching)
 return;
 
@@ -975,10 +979,13 @@
   Result = F.add(Result, I);
 }
 
+// Taint property gets lost if the variable is passed as a
+// non-const pointer or reference to a function which is
+// not inlined. For matching rules we want to preserve the taintedness.
 // TODO: We should traverse all reachable memory regions via the
 // escaping parameter. Instead of doing that we simply mark only the
 // referred memory region as tainted.
-if (WouldEscape(V, E->getType())) {
+if (WouldEscape(V, E->getType()) && getTaintedPointeeOrPointer(State, 
V)) {
   LLVM_DEBUG(if (!Result.contains(I)) {
 llvm::dbgs() << "PreCall<";
 Call.dump(llvm::dbgs());


Index: clang/test/Analysis/taint-generic.c
===
--- clang/test/Analysis/taint-generic.c
+++ clang/test/Analysis/taint-generic.c
@@ -1010,7 +1010,8 @@
 void myScanf(const char*, ...);
 int myPropagator(int, int*);
 int mySnprintf(char*, size_t, const char*, ...);
-bool isOutOfRange(const int*);
+bool isOutOfRange(const int*); // const filter function
+void sanitizeCmd(char*); // non-const filter function
 void mySink(int, int, int);
 
 void testConfigurationSources1(void) {
@@ -1044,6 +1045,19 @@
   Buffer[x] = 1; // no-warning
 }
 
+void testConfigurationFilterNonConst(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  system(buffer); // expected-warning {{Untrusted data is passed to a system call}}
+}
+
+void testConfigurationFilterNonConst2(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  sanitizeCmd(buffer); // removes taintedness
+  system(buffer); // no-warning
+}
+
 void testConfigurationSinks(void) {
   int x = mySource1();
   mySink(x, 1, 2);
Index: clang/test/Analysis/Inputs/taint-generic-config.yaml
===
--- clang/test/Analysis/Inputs/taint-generic-config.yaml
+++ clang/test/Analysis/Inputs/taint-generic-config.yam

[PATCH] D155848: [clang][analyzer]Fix non-effective taint sanitation

2023-07-21 Thread Daniel Krupp via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
dkrupp marked an inline comment as done.
Closed by commit rG26b19a67e5c3: [clang][analyzer]Fix non-effective taint 
sanitation (authored by dkrupp).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D155848?vs=542619&id=542885#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155848/new/

https://reviews.llvm.org/D155848

Files:
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/test/Analysis/Inputs/taint-generic-config.yaml
  clang/test/Analysis/taint-generic.c


Index: clang/test/Analysis/taint-generic.c
===
--- clang/test/Analysis/taint-generic.c
+++ clang/test/Analysis/taint-generic.c
@@ -1010,7 +1010,8 @@
 void myScanf(const char*, ...);
 int myPropagator(int, int*);
 int mySnprintf(char*, size_t, const char*, ...);
-bool isOutOfRange(const int*);
+bool isOutOfRange(const int*); // const filter function
+void sanitizeCmd(char*); // non-const filter function
 void mySink(int, int, int);
 
 void testConfigurationSources1(void) {
@@ -1044,6 +1045,19 @@
   Buffer[x] = 1; // no-warning
 }
 
+void testConfigurationFilterNonConst(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  system(buffer); // expected-warning {{Untrusted data is passed to a system 
call}}
+}
+
+void testConfigurationFilterNonConst2(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  sanitizeCmd(buffer); // removes taintedness
+  system(buffer); // no-warning
+}
+
 void testConfigurationSinks(void) {
   int x = mySource1();
   mySink(x, 1, 2);
Index: clang/test/Analysis/Inputs/taint-generic-config.yaml
===
--- clang/test/Analysis/Inputs/taint-generic-config.yaml
+++ clang/test/Analysis/Inputs/taint-generic-config.yaml
@@ -69,6 +69,11 @@
 Scope: "myAnotherNamespace::"
 Args:  [0]
 
+  # char *str; // str is tainted
+  # sanitizeCmd(str) // str is not tainted anymore
+  - Name: sanitizeCmd
+Args: [0]
+
 # A list of sink functions
 Sinks:
   # int x, y; // x and y are tainted
Index: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -926,7 +926,9 @@
   });
 
   /// Check for taint propagation sources.
-  /// A rule is relevant if PropSrcArgs is empty, or if any of its signified
+  /// A rule will make the destination variables tainted if PropSrcArgs
+  /// is empty (taints the destination
+  /// arguments unconditionally), or if any of its signified
   /// args are tainted in context of the current CallEvent.
   bool IsMatching = PropSrcArgs.isEmpty();
   std::vector TaintedSymbols;
@@ -949,6 +951,8 @@
 }
   });
 
+  // Early return for propagation rules which dont match.
+  // Matching propagations, Sinks and Filters will pass this point.
   if (!IsMatching)
 return;
 
@@ -975,10 +979,13 @@
   Result = F.add(Result, I);
 }
 
+// Taint property gets lost if the variable is passed as a
+// non-const pointer or reference to a function which is
+// not inlined. For matching rules we want to preserve the taintedness.
 // TODO: We should traverse all reachable memory regions via the
 // escaping parameter. Instead of doing that we simply mark only the
 // referred memory region as tainted.
-if (WouldEscape(V, E->getType())) {
+if (WouldEscape(V, E->getType()) && getTaintedPointeeOrPointer(State, 
V)) {
   LLVM_DEBUG(if (!Result.contains(I)) {
 llvm::dbgs() << "PreCall<";
 Call.dump(llvm::dbgs());


Index: clang/test/Analysis/taint-generic.c
===
--- clang/test/Analysis/taint-generic.c
+++ clang/test/Analysis/taint-generic.c
@@ -1010,7 +1010,8 @@
 void myScanf(const char*, ...);
 int myPropagator(int, int*);
 int mySnprintf(char*, size_t, const char*, ...);
-bool isOutOfRange(const int*);
+bool isOutOfRange(const int*); // const filter function
+void sanitizeCmd(char*); // non-const filter function
 void mySink(int, int, int);
 
 void testConfigurationSources1(void) {
@@ -1044,6 +1045,19 @@
   Buffer[x] = 1; // no-warning
 }
 
+void testConfigurationFilterNonConst(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  system(buffer); // expected-warning {{Untrusted data is passed to a system call}}
+}
+
+void testConfigurationFilterNonConst2(void) {
+  char buffer[1000];
+  myScanf("%s", buffer); // makes buffer tainted
+  sanitizeCmd(buffer); // removes taintedness
+  system(b

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-26 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp closed this revision.
dkrupp added a comment.

Committed in 343bdb10940cb2387c0b9bd3caccee7bb56c937b 
.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-24 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@steakhal  thanks for the review. I fixed all outstanding remarks.
I left the test taint-diagnostic-visitor.c formatting as is to remain 
consistent with the rest of the file. I think we should keep it as is, or 
reformat the whole file.




Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109
+  if ((stateNotZero && stateZero)) {
+std::vector taintedSyms = getTaintedSymbols(C.getState(), *DV);
+if (!taintedSyms.empty()) {
+  reportTaintBug("Division by a tainted value, possibly zero", stateZero, 
C,

steakhal wrote:
> dkrupp wrote:
> > steakhal wrote:
> > > 
> > We cannot get rid off the getTaintedSymbols() call, as we need to pass all 
> > tainted symbols to reportTaintBug if we want to track back multiple 
> > variables. taintedSyms is a parameter of reportTaintBug(...)
> Yes, makes sense. mb.
> One more thing: if `reportTaintBug()` takes the `taintedSyms` vector 
> "by-value", you should express your intent by `std::move()`-ing your 
> collection expressing that it's meant to be consumed instead of taking a copy.
> Otherwise, you could express this intent if the `reportTaintBug()` take a 
> //view// type for the collection, such as `llvm::ArrayRef` - which 
> would neither copy nor move from the callsite's vector, being more performant 
> and expressive.
> 
> I get that this vector is small and bugreport construction will dominate the 
> runtime anyway so I'm not complaining about this specific case, I'm just 
> noting this for the next time. So, here I'm not expecting any actions.
Fixed as suggested. thanks.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:947-948
+  if (!TaintedArgSyms.empty()) {
+TaintedSymbols.insert(TaintedSymbols.begin(), TaintedArgSyms.begin(),
+  TaintedArgSyms.end());
+TaintedIndexes.push_back(I);

steakhal wrote:
> steakhal wrote:
> > 
> I observed you didn't take any action about this suggestion.
> It leaves me wonder if this suggestion - in general - makes sense or if there 
> are other reasons what I cannot foresee.
> I've seen you using the fully spelled-out version in total 8 times.
> Shouldn't we prefer the shorter, more expressive version instead?
Sorry I overlooked this comment. I like this shorter version. It is so much 
consize!  Changed at all places. Thanks for the suggestion.



Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:53-67
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the 
return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}

steakhal wrote:
> steakhal wrote:
> > I know this is subjective, but I'd suggest to reformat the tests to match 
> > LLVM style guidelines, unless the formatting is important for the test.
> > Consistency helps the reader and reviewer, as code and tests are read many 
> > more times than written.
> > 
> > This applies to the rest of the touched tests.
> Originally I meant this to the rest of the test cases you change or add part 
> of this patch. I hope it clarifies.
I made some formatting changes you suggested, but 
I would like to leave the //expected-note tags as they are now, because then it 
remains consistent with the rest of the test cases.

Would it be okay like this, or should I reformat the whole file (untouched 
parts too)?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-24 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 516389.
dkrupp marked an inline comment as done.
dkrupp added a comment.

-using llvm::ArrayRef in the reportTaintBug(..) function in the 
DivZero Checker


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,24 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef __typeof(sizeof(int)) size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to the 2nd argument}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +27,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to the 2nd argument}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +35,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +44,72 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated here}}
+   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+
+// Tests if the originated note is correctly placed even if the path is
+// propagating through variables and expressions
+char *taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used to specify the buffer size}}
+// expected-note@-2 {{Taint propagated to the return value}}
+return pathbuf;
+  }
+  return 0;
+}
+
+// Taint origin should be marked correctly even if there are multiple taint
+// sources in the function
+char *taintDiagnosticPropagation2(){
+  char *pathbuf;
+  char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  char *user_env=get

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-24 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 516380.
dkrupp marked 10 inline comments as done.
dkrupp added a comment.

-append_range(..) used instead of std::vector.insert(...) to improve readability
-minor updates based on @steakhal comments


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,24 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef __typeof(sizeof(int)) size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to the 2nd argument}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +27,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to the 2nd argument}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +35,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +44,72 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated here}}
+   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+
+// Tests if the originated note is correctly placed even if the path is
+// propagating through variables and expressions
+char *taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used to specify the buffer size}}
+// expected-note@-2 {{Taint propagated to the return value}}
+return pathbuf;
+  }
+  return 0;
+}
+
+// Taint origin should be marked correctly even if there are multiple taint
+// sources in the function
+char *taintDiagnosticPropagation2(){
+  char *pathbuf;
+  char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-22 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@steakhal your comments are fixed. Thanks for the review.




Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109
+  if ((stateNotZero && stateZero)) {
+std::vector taintedSyms = getTaintedSymbols(C.getState(), *DV);
+if (!taintedSyms.empty()) {
+  reportTaintBug("Division by a tainted value, possibly zero", stateZero, 
C,

steakhal wrote:
> 
We cannot get rid off the getTaintedSymbols() call, as we need to pass all 
tainted symbols to reportTaintBug if we want to track back multiple variables. 
taintedSyms is a parameter of reportTaintBug(...)



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-868
+  std::vector TaintedSyms =
+  getTaintedSymbols(State, Call.getReturnValue());
+  if (!TaintedSyms.empty()) {
+TaintedSymbols.push_back(TaintedSyms[0]);
+TaintedIndexes.push_back(ArgNum);
+  }

steakhal wrote:
> 
I think this suggested solution would not be correct here, as ArgSym might not 
be the actual _tainted_ symbol (inside a more complex expression).

So I would prefer to leave it like this for correctness.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-879
+  std::vector TaintedSyms = getTaintedSymbols(State, *V);
+  if (!TaintedSyms.empty()) {
+TaintedSymbols.push_back(TaintedSyms[0]);
+TaintedIndexes.push_back(ArgNum);
+  }

steakhal wrote:
> In these cases, the code would acquire all the tainted subsymbols, which then 
> we throw away and keep only the first one.
> This is why I suggested the approach I did I'm my last review.
I think this suggested solution would not be correct here, as ArgSym might not 
be the actual _tainted_ symbol (inside a more complex expression).

So I would prefer to leave it like this for correctness.



Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:294
+  std::vector TaintedCasts =
+  getTaintedSymbols(State, SC->getOperand(), Kind);
+  TaintedSymbols.insert(TaintedSymbols.begin(), TaintedCasts.begin(),

steakhal wrote:
> If `returnFirstOnly` is `true`, this `getTaintedSymbols()` call would still 
> eagerly (needlessly) collect all of the symbols.
> I'd recommend propagating the `returnFirstOnly` parameter to the recursive 
> calls to avoid this problem.
> I also encourage you to make use of the `llvm::append_range()` whenever makes 
> sense.
You are perfectly right. I overlooked these calls and because of the the 
default parameter did got get a warning.  now fixed.



Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
 bool taint::isTainted(ProgramStateRef State, const Stmt *S,
   const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind).empty();
 }

dkrupp wrote:
> steakhal wrote:
> > TBH I'm not sure if I like that now we allocate unbounded amount of times 
> > (because of `getTaintedSymbols()` is recursive and returns by value), where 
> > we previously did not.
> > 
> > What we could possibly do is to compute the elements of this sequence 
> > lazily.
> > I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's 
> > possible to have something like that as a return type as it might encode 
> > the map function in the type or something like that.
> > Anyway, I'm just saying that it would be nice to not do more than it's 
> > necessary, and especially not allocate a lot of short-lived objects there.
> > 
> > Do you think there is a way to have the cake and eat it too?
> > 
> > ---
> > 
> > I did some investigation, and one could get pretty far in the 
> > implementation, and maybe even complete it but it would be really 
> > complicated as of now. Maybe we could revisit this subject when we have 
> > coroutines.
> > 
> > So, I would suggest to have two sets of APIs:
> >  - the usual `isTainted(.) -> bool`
> >  - and a `getTaintedSymbols(.) -> vector`
> > The important point would be that the `isTainted()` version would not 
> > eagerly collect all tainted sub-syms but return on finding the first one.
> > While, the `getTaintedSymbols()` would collect eagerly all of them, as its 
> > name suggests.
> > 
> > Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool 
> > returnAfterFirstMatch`. This way `isTainted()` can call it like that. While 
> > in the other case, the parameter would be `false`, and eagerly collect all 
> > symbols.
> > 
> > This is probably the best of both worlds, as it prevents `isTainted` from 
> > doing extra work and if we need to iterate over the tainted symbols, we 
> > always iterate over all of them, so doing it lazily wouldn't gain us much 
> > in that case anyway.
> > As a bonus, the user-facing API would be s

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-22 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 516077.
dkrupp marked 13 inline comments as done.
dkrupp added a comment.

-getTaintedSymbols(.) -> getTaintedSymbolsImpl() proxy function introduced for 
interface safety
-Other minor fixes based on comments from @steakhal


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,24 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef __typeof(sizeof(int)) size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to the 2nd argument}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +27,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to the 2nd argument}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +35,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +44,71 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated here}}
+   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used to specify the buffer size}}
+// expected-note@-2 {{Taint propagated to the return value}}
+return pathbuf;
+  }
+  return 0;
+}
+
+//Taint origin should be marked correctly even if there are multiple taint
+//sources in the function
+char* taintDiagnosticPropagation2(){
+  char *pathbuf;
+  char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-21 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked an inline comment as done.
dkrupp added a comment.

@steakhal is there anything else to do before we merge this? Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-19 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked an inline comment as done.
dkrupp added a comment.

@steakhal  thanks for your review. All your remarks have been fixed.




Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
 bool taint::isTainted(ProgramStateRef State, const Stmt *S,
   const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind).empty();
 }

steakhal wrote:
> TBH I'm not sure if I like that now we allocate unbounded amount of times 
> (because of `getTaintedSymbols()` is recursive and returns by value), where 
> we previously did not.
> 
> What we could possibly do is to compute the elements of this sequence lazily.
> I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's 
> possible to have something like that as a return type as it might encode the 
> map function in the type or something like that.
> Anyway, I'm just saying that it would be nice to not do more than it's 
> necessary, and especially not allocate a lot of short-lived objects there.
> 
> Do you think there is a way to have the cake and eat it too?
> 
> ---
> 
> I did some investigation, and one could get pretty far in the implementation, 
> and maybe even complete it but it would be really complicated as of now. 
> Maybe we could revisit this subject when we have coroutines.
> 
> So, I would suggest to have two sets of APIs:
>  - the usual `isTainted(.) -> bool`
>  - and a `getTaintedSymbols(.) -> vector`
> The important point would be that the `isTainted()` version would not eagerly 
> collect all tainted sub-syms but return on finding the first one.
> While, the `getTaintedSymbols()` would collect eagerly all of them, as its 
> name suggests.
> 
> Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool 
> returnAfterFirstMatch`. This way `isTainted()` can call it like that. While 
> in the other case, the parameter would be `false`, and eagerly collect all 
> symbols.
> 
> This is probably the best of both worlds, as it prevents `isTainted` from 
> doing extra work and if we need to iterate over the tainted symbols, we 
> always iterate over all of them, so doing it lazily wouldn't gain us much in 
> that case anyway.
> As a bonus, the user-facing API would be self-descriptive.
> 
> WDYT?
Good idea. I implemented the early return option in getTaintedSymbols(). This 
is used now by the isTainted() function.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-19 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 514973.
dkrupp marked an inline comment as done.
dkrupp added a comment.

- Implemented early return in getTaintedSymbols() when it is called by 
isTainted() for efficiency
- Fixed test incompatibility on Windows


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,24 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef __typeof(sizeof(int)) size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to the 2nd argument}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +27,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to the 2nd argument}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +35,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +44,71 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated here}}
+   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used to specify the buffer size}}
+// expected-note@-2 {{Taint propagated to the return value}}
+return pathbuf;
+  }
+  return 0;
+}
+
+//Taint origin should be marked correctly even if there are multiple taint
+//sources in the function
+char* taintDiagnosticPropagation2(){
+  char *pathbuf;
+  char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint 

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-15 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

You can find the improved reports on tmux, postgres, twin, openssl here:

here 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-14 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

All remarks from @steakhal has been fixed. Thanks for the review.
This new version now can handle the tracking back of multiple symbols!




Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:129-130
 /// Given a pointer/reference argument, return the value it refers to.
-std::optional getPointeeOf(const CheckerContext &C, SVal Arg) {
+std::optional getPointeeOf(ASTContext &ASTCtx,
+ const ProgramStateRef State, SVal Arg) {
   if (auto LValue = Arg.getAs())

steakhal wrote:
> BTW I don't know but `State->getStateManager().getContext()` can give you an 
> `ASTContext`. And we tend to not put `const` to variable declarations. See [[ 
> https://releases.llvm.org/4.0.0/tools/clang/tools/extra/docs/clang-tidy/checks/readability-avoid-const-params-in-decls.html
>  | readability-avoid-const-params-in-decls ]]
> 
> In other places we tend to refer to `ASTContext` by the `ACtx` I think.
> We also prefer const refs over mutable refs. Is the mutable ref justified for 
> this case?
Thanks for the suggestion. I took out ASTContext from the signature.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214
+  if (nofTaintedArgs == 0)
+Out << "Taint propagated to argument "
+<< TaintedArgs.at(Sym.index()) + 1;
+  else
+Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+  nofTaintedArgs++;

steakhal wrote:
> I'd recommend using `llvm::interleaveComma()` in such cases.
> You can probably get rid of `nofTaintedArgs` as well - by using this function.
I chose another solution. I hope that is ok too.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:213
+  else
+Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+  nofTaintedArgs++;

steakhal wrote:
> I believe this branch is uncovered by tests.
Now it is covered. See multipleTaintedSArgs(..) test in 
taint-diagnostic-visitor.c



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:221
+}
+return std::string(Out.str());
+  });

steakhal wrote:
> I think since you explicitly specify the return type of the lambda, you could 
> omit the spelling of `std::string` here.
not sure. Got a "cannot convert raw_svector_ostream::str() from llvm:StringRef" 
error.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-878
+  // FIXME: The call argument may be a complex expression
+  // referring to multiple tainted variables.
+  // Now we generate notes and track back only one of them.
+  SymbolRef TaintedSym = isTainted(State, *V);

steakhal wrote:
> You could iterate over the symbol dependencies of the SymExpr (of the `*V` 
> SVal).
> 
> ```lang=c++
> SymbolRef PointeeAsSym = V->getAsSymbol();
> // eee, can it be null? Sure it can. See isTainted(Region),... for those 
> cases we would need to descend and check their symbol dependencies.
> for (SymbolRef SubSym : llvm::make_range(PointeeAsSym->symbol_begin(), 
> PointeeAsSym->symbol_end())) {
>   // TODO: check each if it's also tainted, and update the `TaintedSymbols` 
> accordingly, IDK.
> }
> ```
> Something like this should work for most cases (except when `*V` refers to a 
> tainted region instead of a symbol), I think.
I implememented a new function getTaintedSymbols(..) in Taint.cpp which returns 
all tainted symbols for a complex expr, SVal etc. With this addition, now we 
can track back multiple tainted symbols reaching a sink.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-14 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 513556.
dkrupp marked 11 inline comments as done.
dkrupp edited the summary of this revision.
dkrupp added a comment.

-All remarks from @steakhal was fixed. Thanks for the review!
-Now we can generate diagnostics for all tainted values when they reach a sink.

Se for example the following test case:

  void multipleTaintedArgs(void) {
int x,y;
scanf("%d %d", &x, &y); // expected-note {{Taint originated here}}
// expected-note@-1 {{Taint propagated to the 2nd 
argument, 3rd argument}}
int* ptr = (int*) malloc(x + y); // expected-warning {{Untrusted data is 
used to specify the buffer size}}
 // expected-note@-1{{Untrusted data is 
used to specify the buffer size}}
free (ptr);
  }


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,24 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef unsigned long size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to the 2nd argument}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +27,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to the 2nd argument}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +35,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +44,71 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated here}}
+   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used t

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-05 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

All comments addressed. Thanks for the review @steakhal .




Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162
+  const CallEvent& Call) {
+  const LocationContext* LC = Call.getCalleeStackFrame(0);
+

steakhal wrote:
> If the `Call` parameter is only used for acquiring the `LocationContext`, 
> wouldn't it be more descriptive to directly pass the `LocationContext` to the 
> function instead?
> I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see 
> this function, so I'm a bit worried if this pick was intentional. That we 
> pass the `0` as the `BlockCount` argument only reinforces this instinct.
The call.getCalleeStackFrame(0) gets the location context of the actual call 
that we are analyzing (in the pre or postcall), and that's what we need to mark 
interesting. It is intentionally used like this. I changed the parameter to 
locationcontext as use suggested.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:169
+// We give diagnostics only for taint related reports
+if (!BR.isInteresting(LC) ||
+BR.getBugType().getCategory() != categories::TaintedData) {

steakhal wrote:
> What does the first half of this condition guard against?
> Do you have a test for it to demonstrate?
To only follow taint propagation to function calls which actually result in 
tainted variables used in the report and not every function which returns a 
tainted variable. 

char* taintDiagnosticPropagation2() is such a test which is failing without 
this due to giving extra unrelated propagation notes.





Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:214
+  << TaintedArgs.at(i) << "\n");
+  Out << "Taint propagated to argument " << TaintedArgs.at(i);
+} else {

steakhal wrote:
> So, if `TaintedSymbols.size() > 1`, then the note message will look weird.
> Could you please have a test for this?
Test added multipleTaintedArgs (). I could not provoke the multi-argument 
message as we only track-back one tainted symbol now.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:871
+  }else{
+ LLVM_DEBUG(llvm::dbgs() << "Strange. Return value not tainted.\n");
+ LLVM_DEBUG(Call.getReturnValue().dumpToStream(llvm::dbgs()));

steakhal wrote:
> I cannot see a test against the "Strange" string. Is this dead code?
> Same for the other block.
It was a debugging code, which I removed. I noticed that in some cases (e.g. if 
the argument pointer is pointing to an unknown area) we don't get back a 
tainted symbol even though we call the addtaint on the arg/return value.



Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:919-922
+//Add taintedness to stdin parameters
+if (isStdin(C.getSVal(E),C.getASTContext())){
+  State = addTaint(State, C.getSVal(E));
+}

steakhal wrote:
> I just want to get it confirmed that this hunk is unrelated to your change 
> per-say. Is it?
> BTW I don't mind this change being part of this patch, rather the opposite. 
> Finally, we will have it.
It is related in the sense that in isTainted() function call did not return a 
valid SymbolRef for stdin if we did not make the stdin tainted when we first 
see it. Caused  testcase to fail as it was before. Now it is handled similarly 
to other tainted symbols.



Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:228-230
+  if (SymbolRef TSR =
+  isTainted(State, SRV->getRegion(), Kind))
+return TSR;

steakhal wrote:
> What does `TSR` abbreviate? I would find `TaintedSym` more descriptive.
TSR = Tainted Symbol Ref

but I changed it as you suggested.



Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228
+if (Kind == VLA_Tainted)
+  BT.reset(new BugType(this,
+   "Dangerous variable-length array (VLA) declaration",
+   categories::TaintedData));
+else
+  BT.reset(new BugType(this,
+   "Dangerous variable-length array (VLA) declaration",

steakhal wrote:
> Why don't we use a distinct BugType for this?
You mean a new bug type instances? Would there be an advantage for that? Seemed 
to be simpler this way. To distinguish identify the tainted reports with the 
bug category.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-04-05 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 511078.
dkrupp marked 21 inline comments as done.
dkrupp added a comment.

@steakhal thanks for your review. I tried to address all your concerns.
I added an extra test case too (multipleTaintSources(..)) which highlights the 
limitation of the current patch: If multiple tainted "variables" reach a sink, 
we only generate diagnostics for one of them. The main reason is that the 
isTainted() function returns a single tainted Symbolref instead of a vector of 
Symbolrefs if there are multiple instances. 
I highlighted this in the test and the implementation.

I think this could be still an acceptable limitation for now, because as the 
user sanitizes one of the tainted variables, he will get a new diagnostics for 
the remaining one(s).

So I suggest to address this limitation in  follow-up patche(s).
The implementation as is already greatly improves the understandability of the 
reports.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,24 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef unsigned long size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to argument 2}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +27,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to argument 2}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +35,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to argument 2}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +44,71 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated here}}
+   // expected-note@-2 {{Taint propagated to argument 2}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+ // expected-note@-1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buf

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-03-31 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 510108.
dkrupp added a comment.

This is a totally rewritten version of the patch which solely relies on the 
existing "interestingness" utility to track back the taint propagation.  (And  
does not introduce a new FlowID in the ProgramState as requested in the 
reviews.)

-The new version also places a Note, when the taintedness is propagated to an 
argument or to a return value. So it should be easier for the user to follow 
how the taint information is spreading. 
-"The taint originated here" is printed correctly at the taint source function, 
which introduces taintedness. (Main goal of this patch.)

Implementation:
-The createTaintPreTag() function places a NoteTag at the taint propagation 
function calls, if taintedness is propagated. Then at report creation, the 
tainted arguments are marked interesting if propagated taintedness is relevant 
for the bug report.

- The isTainted() function is extended to return the actually tainted 
SymbolRef. This is important to be able to consistently mark relevant symbol 
interesting which carries the taintedness in a complex expression.

-createTaintPostTag(..) function places a NoteTag to the taint generating 
function calls to mark them interesting if they are relevant for a taintedness 
report. So if they propagated taintedness to interesting symbol(s).

The tests are passing and the reports on the open source projects are much 
better understandable than before (twin, tmux, curl):

https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=curl_curl-7_66_0_dkrupp_taint_origin_fix_new&run=tmux_2.6_dkrupp_taint_origin_fix_new&run=twin_v0.8.1_dkrupp_taint_origin_fix_new&is-unique=on&diff-type=New&checker-msg=%2auntrusted%2a&checker-msg=Out%20of%20bound%20memory%20access%20%28index%20is%20tainted%29


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-tester.c

Index: clang/test/Analysis/taint-tester.c
===
--- clang/test/Analysis/taint-tester.c
+++ clang/test/Analysis/taint-tester.c
@@ -122,7 +122,7 @@
   fscanf(pp, "%d", &ii);
   int jj = ii;// expected-warning + {{tainted}}
 
-  fscanf(p, "%d", &ii);
+  fscanf(p, "%d", &ii);// expected-warning + {{tainted}}
   int jj2 = ii;// expected-warning + {{tainted}}
 
   ii = 3;
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,13 +2,23 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef unsigned long size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
   char buf[128];
   scanf("%s", buf); // expected-note {{Taint originated here}}
+// expected-note@-1 {{Taint propagated to argument 1}}
   system(buf); // expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
 }
 
@@ -16,6 +26,7 @@
   int index;
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
+   // expected-note@-1 {{Taint propagated to argument 1}}
   return Array[index]; // expected-warning {{Out of bound memory access (index is tainted)}}
// expected-note@-1 {{Out of bound memory access (index is tainted)}}
 }
@@ -23,6 +34,7 @@
 int taintDiagnosticDivZero(int operand) {
   scanf("%d", &operand); // expected-note {{Value assigned to 'operand'}}
  // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to argument 1}}
   return 10 / operand; // expected-warning {{Division by a tainted value, possibly zero}}
// expected-note@-1 {{Division by a tainted value, possibly zero}}
 }
@@ -31,6 +43,50 @@
   int x;
   scanf("%d", &x); // expected-note {{Value assigned to 'x'}}
// expected-note@-1 {{Taint originated her

[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

2023-03-03 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added a project: clang.
Herald added subscribers: steakhal, manas, ASDenysPetrov, martong, gamesh411, 
donat.nagy, Szelethus, mikhail.ramalho, a.sidorin, rnkovacs, szepet, 
baloghadamsoftware, xazax.hun.
Herald added a project: All.
dkrupp requested review of this revision.
Herald added a subscriber: cfe-commits.

We extend the checker documentation with the following information

-How the user can mark inline that a data sanitation was performed performed to 
make a superflous report disappear
-Add references to CWEs and coding guidelines
-Describe the limitations of the checker
-Provide life-like examples


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D145229

Files:
  clang/docs/analyzer/checkers.rst

Index: clang/docs/analyzer/checkers.rst
===
--- clang/docs/analyzer/checkers.rst
+++ clang/docs/analyzer/checkers.rst
@@ -66,7 +66,7 @@
 
 core.NullDereference (C, C++, ObjC)
 """
-Check for dereferences of null pointers. 
+Check for dereferences of null pointers.
 
 This checker specifically does
 not report null pointer dereferences for x86 and x86-64 targets when the
@@ -75,7 +75,7 @@
 `__
 for reference.
 
-The ``SuppressAddressSpaces`` option suppresses 
+The ``SuppressAddressSpaces`` option suppresses
 warnings for null dereferences of all pointers with address spaces. You can
 disable this behavior with the option
 ``-analyzer-config core.NullDereference:SuppressAddressSpaces=false``.
@@ -2366,17 +2366,119 @@
 alpha.security.taint.TaintPropagation (C, C++)
 ""
 
-Taint analysis identifies untrusted sources of information (taint sources), rules as to how the untrusted data flows along the execution path (propagation rules), and points of execution where the use of tainted data is risky (taints sinks).
+Taint analysis identifies potential security vulnerabilities where
+attacker can inject malicious data to the program to execute an attack
+(privilege escalation, command injection, SQL injection etc.).
+
+The malicious data is injected at the taint source (e.g. getenv() call)
+which is then propagated through function calls and being used as arguments of
+sensitive operations, also called as taint sinks (e.g. system() call).
+
+One can defend agains this type of vulnerability by always checking and
+santizing the potentially malicious, untrusted user input.
+
+The goal of the checker is to discover and show to the user these potential
+taint source-sink pairs and the propagation call chain.
+
 The most notable examples of taint sources are:
 
   - network originating data
+  - files or standard input
   - environment variables
   - database originating data
 
-``GenericTaintChecker`` is the main implementation checker for this rule, and it generates taint information used by other checkers.
+Let us examine a practical example of a Command Injection attack.
+.. code-block:: C
+ //Command Injection Vulnerability Example
 
-.. code-block:: c
+ int main(int argc, char** argv) {
+  char cmd[1024] = "/bin/cat ";
+  char filename[1024];
+  printf("Filename:");
+  scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+
+  if (access(filename,F_OK)){
+printf("File does not exist\n");
+return -1;
+  }
+  strcat(cmd, filename);
+  system(cmd); // Warning: Untrusted data is passed to a system call
+ }
+
+The program prints the content of any user specified file.
+Unfortunately the attacker can execute arbitrary commands
+with shell escapes. For example with the following input the `ls` command is also
+executed after the contents of `/etc/shadow` is printed.
+`Input: /etc/shadow ; ls /`
+
+The analysis implemented in this checker points out this problem.
+
+One can protect against such attack by for example checking if the provided
+input refers to a valid file.
+.. code-block:: C
+ //No vulnerability anymore, but we still get the warning.
+ int main(int argc, char** argv) {
+  char cmd[1024] = "/bin/cat ";
+  char filename[1024];
+  printf("Filename:");
+  scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
+
+  if (access(filename,F_OK)){//sanitizing user input
+printf("File does not exist\n");
+return -1;
+  }
+  // filename is safe after this point
+  strcat(cmd, filename);
+  system(cmd); // Superflous Warning: Untrusted data is passed to a system call
+ }
+
+Unfortunately, the checker cannot discover automatically
+that the programmer have performed data sanitation, so it still emits the warning.
+
+One can get rid of this superflous warning by telling about such data sanitation
+actions to the analyzer by adding the following lines.
+
+.. code-block:: C
+ //Marking sanitized variables safe.
+ //No vulnerability anymore, no warning.
+
+ //User defiend csa

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-02-27 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp planned changes to this revision.
dkrupp added a comment.

@steakhal , @NoQ  thanks for the reviews. I will try to implement an 
alternative solution based on your suggestions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-02-24 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

> TaintBugReport is brilliant and we already have a precedent for subclassing 
> BugReport in another checker. However I'm somewhat worried that once we start 
> doing more of this, we'll eventually end up with multiple inheritance 
> situations when the report needs multiple kinds of information. So at a 
> glance my approach with a "generic data map" in bugreport objects looks a bit 
> more future-proof to me. Also a bit easier to set up, no need to deal with 
> custom RTTI.

Adding a data map (like a string->sval map) to the `PathSensitiveBugreport` 
instead of relying on dynamic casting sounds an easy addition. I will update 
the patch with this. Or you specifically mean this kind of datamap ?  `typedef 
llvm::ImmutableMap   GenericDataMap;` (`ProgramState.h:74`) I 
guess it should not be immutable…

> I guess my main point is, there shouldn't be a need to assist tracking by 
> adding extra information to the program state. Information in the state 
> should ideally be "material" to program execution, "tangible", it has to 
> describe something that's actually stored somewhere in memory (either by 
> directly defining it, or by constraining it). In particular, if two nodes 
> result in indistinguishable future behavior of the program, we're supposed to 
> merge them; but any "immaterial" bits of information in the state will 
> prevent that from happening.

@NoQ aha! Now I see where  you are coming from! If an SVal is tainted on both 
analysis branches, but their taint flow value is different (meaning that they 
carry taint values from different taint sources), then they cannot be merged 
which causes inefficiency.
I understand the generic principle, but I wonder how frequent would that be in 
practice. I would think not too much, because taint sources are uncommon. 
Especially having multiple taint sources in the same source file/Translation 
Unit (only that creates different taint flow ids).

> In our case it should be enough to have the lambda for propagation method ask 
> "Hey, is this freshly produced propagation target value relevant to this 
> specific report?" and if yes, mark the corresponding propagation source value 
> as relevant to the report as well; also emit a note and "consume" the mark on 
> the target value. Such chain of local decisions can easily replace the global 
> taint flow identifier, and it's more flexible because this way the flow 
> doesn't need to be "linear", it may branch in various ways and that's ok.

Taint propagation is not only handled in the `GenericTaintChecker:892`, where 
we calculate that the taintedness should propagate from function argument `x` 
to `y` or return value, but also it spreads in peculiar ways within 
expressions, from subregion to parent region etc. handled in the `addtaint(..)` 
and `addPartialTaint(..)` functions in `Taint.cpp`. What your proposed solution 
would essentially mean that we would need to implement the taint propagation in 
backward direction too. I think this design would be fragile and difficult to 
maintain (especially if taint propagation would change in the future).

I definitely don’t have the full picture here, so if you think that the sval 
backtracking is the better way, because of the potential performance penalty of 
the taintflow solution with the merges, I will go down that road and start to 
work on an alternative patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-02-23 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@steakhal, @NoQ

thanks for your reviews.

Please note that I am not extending `TaintBugVisitor`. On the contrary I 
removed it.
Instead I use NoteTag to generate the "Taint Originated here" text (see 
GenericTaintChecker.cpp:156).

I can also add additional NoteTags for generating propagation messages "Taint 
was propagated here" easily.

> The challenging part with note tags is how do you figure out whether your bug 
> report is taint-related.

I solved this by checking if the report is an instance of `TaintBugReport` a 
new BugReport type, which should be used by all taint related reports 
(ArrayBoundCheckerV2 checker and divisionbyzero checker 
was changed to use this new report type for taint related reports).

The tricky part was is to how to show only that single "Taint originated here" 
note tag at the taint source only which is relevant to the report. This is done 
by remembering the unique flowid in the
NoteTag in the forward analysis direction (see GenericTaintChecker:cpp:859) 
`InjectionTag = createTaintOriginTag(C, TaintFlowId);` and then filtering out 
the irrelevant 
NoteTags when the the report is generated (when the lamdba callback is called). 
See that flowid which reaches the sink is backpropagated in the 
PathSensitiveBugreport (see GenericTaintCHekcer.cpp:167).

FlowIds are unique and increased at every taint source 
(GenericTaintChecker:869) and it is stored as an additional simple `int` in the 
program state along with the already existing (Taint.cpp:22) TaintTagType.

My fear with the interestingness is that it may propagating backwards according 
to different "rules" than whot the taintedness is popagating in the foward 
direction even with the "extensions" pointed out by steakhal.
So the two types of propagation may be or can get out of sync.

So if the above is not a concern and you think implementing this with 
interestingness is more elegant, idiomatic and causes less maintenance burden, 
I am happy to create an alternative patch with that solution.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-02-20 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 498795.
dkrupp added a comment.

Added documentation to the newly introduced types: TaintData, TaintBugReport.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/TaintBugType.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/BugReporter.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  
clang/test/Analysis/diagnostics/Inputs/expected-sarif/sarif-diagnostics-taint-test.c.sarif
  
clang/test/Analysis/diagnostics/Inputs/expected-sarif/sarif-multi-diagnostic-test.c.sarif
  clang/test/Analysis/diagnostics/sarif-diagnostics-taint-test.c
  clang/test/Analysis/taint-checker-callback-order-has-definition.c
  clang/test/Analysis/taint-checker-callback-order-without-definition.c
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-dumps.c

Index: clang/test/Analysis/taint-dumps.c
===
--- clang/test/Analysis/taint-dumps.c
+++ clang/test/Analysis/taint-dumps.c
@@ -6,7 +6,7 @@
 int getchar(void);
 
 // CHECK: Tainted symbols:
-// CHECK-NEXT: conj_$2{{.*}} : 0
+// CHECK-NEXT: conj_$2{{.*}} : Type:0 Flow:0
 int test_taint_dumps(void) {
   int x = getchar();
   clang_analyzer_printState();
Index: clang/test/Analysis/taint-diagnostic-visitor.c
===
--- clang/test/Analysis/taint-diagnostic-visitor.c
+++ clang/test/Analysis/taint-diagnostic-visitor.c
@@ -2,8 +2,17 @@
 
 // This file is for testing enhanced diagnostics produced by the GenericTaintChecker
 
+typedef unsigned long size_t;
+struct _IO_FILE;
+typedef struct _IO_FILE FILE;
+
 int scanf(const char *restrict format, ...);
 int system(const char *command);
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+char *fgets(char *str, int n, FILE *stream);
+FILE *stdin;
 
 void taintDiagnostic(void)
 {
@@ -34,3 +43,41 @@
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
   // expected-note@-1 {{Declared variable-length array (VLA) has tainted size}}
 }
+
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used to specify the buffer size}}
+return pathbuf;
+  }
+  return 0;
+}
+
+
+//Taint origin should be marked correctly even if there are multiple taint
+//sources in the function
+char* taintDiagnosticPropagation2(){
+  char *pathbuf;
+  char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  char *user_env=getenv("USER_ENV_VAR");//unrelated taint source
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+	   // expected-note@-1 {{Taking true branch}}
+pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+// expected-note@-1{{Untrusted data is used to specify the buffer size}}
+return pathbuf;
+  }
+  return 0;
+}
+
+void testReadStdIn(){
+  char buf[1024];
+  fgets(buf, sizeof(buf), stdin);// expected-note {{Taint originated here}}
+  system(buf);// expected-warning {{Untrusted data is passed to a system call}} // expected-note {{Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)}}
+
+}
Index: clang/test/Analysis/taint-checker-callback-order-without-definition.c
===
--- clang/test/Analysis/taint-checker-callback-order-without-definition.c
+++ clang/test/Analysis/taint-checker-callback-order-without-definition.c
@@ -13,19 +13,22 @@
 
 void top(const char *fname, char *buf) {
   FILE *fp = fopen(fname, "r"); // Introduce taint.
-  // CHECK:  PreCall prepares tainting arg index: -1
-  // CHECK-NEXT: PostCall actually wants to 

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

2023-02-17 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added a reviewer: Szelethus.
dkrupp added a project: clang.
Herald added subscribers: steakhal, manas, ASDenysPetrov, martong, gamesh411, 
donat.nagy, mikhail.ramalho, a.sidorin, JDevlieghere, rnkovacs, szepet, 
baloghadamsoftware, xazax.hun.
Herald added a reviewer: NoQ.
Herald added a project: All.
dkrupp requested review of this revision.
Herald added a subscriber: cfe-commits.

This patch improves the diagnostics of the 
alpha.security.taint.TaintPropagation and taint related checkers by showing the 
"Taint originated here" note at the correct place, where the attacker may 
inject it. This greatly improves the understandability of the taint reports.

Taint Analysis: The attacker injects the malicious data at the taint source 
(e.g. getenv() call) which is then propagated and used at taint sink (e.g. 
exec() call) causing a security vulnerability (e.g. shell injection 
vulnerability), without data sanitation.

The goal of the checker is to discover and show to the user these potential 
taint source, sink pairs and the propagation call chain.

In the baseline the taint source was pointing to an invalid location, typically 
somewhere between the real taint source and sink.

After the fix, the "Taint originated here" tag is correctly shown at the taint 
source. This is the function call where the attacker can inject a malicious 
data (e.g. reading from environment variable, reading from file, reading from 
standard input etc.).

Before the patch the clang static analyzer puts the taint origin note wrongly 
to the `strtol(..)` call.

  c
  int main(){
char *pathbuf;
char *user_data=getenv("USER_INPUT"); 
char *end;  
long size=strtol(user_data, &end, 10); // note: Taint originated here. 
if (size > 0){
  pathbuf=(char*) malloc(size+1);//note: Untrusted data is used to specify 
the buffer size ...
  // ... 
  free(pathbuf);
}
return 0;
  }

After the fix, the taint origin point is correctly annotated at `getenv()` 
where the attacker really injects the value.

  c
  int main(){
char *pathbuf;
char *user_data=getenv("USER_INPUT");  // note: Taint originated here. 
char *end;
long size=strtol(user_data, &end, 10); 
if (size > 0){
  pathbuf=(char*) malloc(size+1);//note: Untrusted data is used to specify 
the buffer size ...
  // ... 
  free(pathbuf);
}
return 0;
  }

The BugVisitor placing the note was wrongly going back only until introduction 
of the tainted SVal in the sink.

This patch creates a new uniquely identified taint flow for each taint source 
(e.g.getenv()) it traverses and places a NoteTag ("Taint originated here") with 
the new id. Then, when the bug report is generated, the taint flow id is 
propagated back (in the new TainBugReport) along the bug path and the correct 
"Taint originated here." annotation is generated (matching the flow id).

You can find the new improved reports

here 


And the old reports (look out for "Taint originated here" notes. They are at 
the wrong place, close to the end of the reports)

here 



Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D144269

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Taint.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/BugReporter/TaintBugType.h
  clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
  clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/Taint.cpp
  clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
  clang/lib/StaticAnalyzer/Core/BugReporter.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  
clang/test/Analysis/diagnostics/Inputs/expected-sarif/sarif-diagnostics-taint-test.c.sarif
  
clang/test/Analysis/diagnostics/Inputs/expected-sarif/sarif-multi-diagnostic-test.c.sarif
  clang/test/Analysis/diagnostics/sarif-diagnostics-taint-test.c
  clang/test/Analysis/taint-checker-callback-order-has-definition.c
  clang/test/Analysis/taint-checker-callback-order-without-definition.c
  clang/test/Analysis/taint-diagnostic-visitor.c
  clang/test/Analysis/taint-dumps.c

Index: clang/test/Analysis/taint-dumps.c
=

[PATCH] D138777: [clang-tidy] Add check bugprone-multiple-new-in-one-expression.

2022-11-28 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

Thanks for this new check. Could you please link here results of this checker 
on som relevant open source projects?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138777/new/

https://reviews.llvm.org/D138777

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133698: [clang][dataflow] SignAnalysis, edgeTransfer, branchTransfer

2022-09-16 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: clang/unittests/Analysis/FlowSensitive/SignAnalysisTest.cpp:241
+? SignLattice(R.Val.getInt().getExtValue())
+: SignLattice::bottom();
+  } else {

Isn't this SignLattice::top() instead?

This is an initialization expression, which we cannot evaluate to int, but the 
variable is initialized.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133698/new/

https://reviews.llvm.org/D133698

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D117568: [Analyzer] Add docs to StdCLibraryFunctionArgsChecker

2022-01-19 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: clang/docs/analyzer/checkers.rst:2371
+diagnostics) for functions that are defined in the POSIX standard. This option
+is disabled by default.
+

I think it would be useful for the user to see one example per constraint type 
that this checker supports.
RangeConstraint (was covered), ComparisonConstraint, ValueConstraint, Not null  
Constraint, BufferSize constraint etc.


It would be also nice to add a section "Limitations".

Describe there well known false positive cases or limitations in the bug 
diagnostics that limits understandability.
Essentially the most important well known cases why this checker is alpha.

This section would be useful for users to understand and help identifying cases 
that are known false positives and for the developers to know how to improve 
this checker. I remember many cases when we had to test multiple times "why a 
checker is in alpha", because we forgot about it. I think it is best to 
document it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117568/new/

https://reviews.llvm.org/D117568

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91948: [WIP][analyzer][doc] Add Container- and IteratorModeling developer docs

2020-11-23 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D91948#2411058 , @whisperity wrote:

> Actually, while the explanation is understandable for me with additional 
> knowledge about the representation... I think it would be useful to add the 
> most simple example from the iterator checkers to the end of the document, 
> how this whole thing ties together and how it is useful in a checker.

I cannot agree more. Please add examples (with indicators where we want 
warning) and explain how they would be handled according to the different 
implementation options. Also please highlight, if possible, what are the 
limitations of the different solutions: potential false positives that we 
cannot filter out, or bugs we will not be able to find.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91948/new/

https://reviews.llvm.org/D91948

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77150: [Analyzer] New Option for ContainerModeling: AggressiveEraseModeling

2020-07-14 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp requested changes to this revision.
dkrupp added a comment.
This revision now requires changes to proceed.

Since the analyzer cannot cannot model the size of the containers just yet (as 
I believe this is the case now), what we are saying with this checker is to 
"always check the return value of the erase() function before use (for 
increment/decrement etc.) whether it is not past-end" .
Adam, are you sure that the user would like to enforce such a generic coding 
rule? Depending on the actual code analyzed, this would require this clumsy 
check at potentially too many places.

Wouldn't it be better to wait for the container size modeling patch? Then the 
user would only get a warning when we are sure that we are erasing the last 
element of the container.

In general I think we should keep the number of config parameters to a minimum 
as it is very hard to explain to the users which one to configure to what value 
and why.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77150/new/

https://reviews.llvm.org/D77150



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-11-04 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

If this is good to go, could you please commit this? Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69308: [analyzer] Test cases for the unsupported features for Clang Static Analyzer

2019-11-04 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D69308#1727625 , @NoQ wrote:

> Another interesting problem that we forgot to mention on the open projects 
> page is the modeling of C++17 bindings and decompositions: 
> https://bugs.llvm.org/show_bug.cgi?id=43042
>
> Also, in my opinion, out of all construction context problems mentioned on 
> the open projects page, the NRVO problem is probably the easiest. It might as 
> well be the least rewarding of them, but i think it is the friendliest 
> possible problem to start with, as it doesn't force you to invent large new 
> facilities.


Thanks. I will try to create a simple reproducer for this too and add this to 
the open projects page (in another patch).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69308/new/

https://reviews.llvm.org/D69308



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69308: [analyzer] Test cases for the unsupported features for Clang Static Analyzer

2019-11-04 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D69308#1727587 , @NoQ wrote:

> In D69308#1727108 , @Szelethus wrote:
>
> > Would love to see this comment in its entirety on the open projects page :^)
>
>
> I'd rather have a mention that @dkrupp is already working on this project, so 
> that if somebody wanted to help out they could cooperate nicely.


I am not working on this yet. First I would like to explore with these test 
cases what exactly is not working as expected.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69308/new/

https://reviews.llvm.org/D69308



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69308: [analyzer] Test cases for the unsupported features for Clang Static Analyzer

2019-11-04 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 227714.
dkrupp marked 2 inline comments as done.
dkrupp added a comment.

Thanks for your comments @NoQ 
I fixed them. Also added your implementation hints to the open projects page.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69308/new/

https://reviews.llvm.org/D69308

Files:
  
clang/test/Analysis/clangsa_unsupported_features/handle_constructors_for_default_arguments.cpp
  
clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp
  clang/www/analyzer/open_projects.html

Index: clang/www/analyzer/open_projects.html
===
--- clang/www/analyzer/open_projects.html
+++ clang/www/analyzer/open_projects.html
@@ -95,6 +95,26 @@
  We should model (potentially some of) such evaluations,
  and the same applies for destructors called from
  operator delete[].
+ See tests cases in https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp";>handle_constructors_with_new_array.cpp.
+  
+  
+  Hints for implementation by Artem Dergachev (NoQ):
+  Operator new[] requires invoking multiple (potentially unknown) amount of 
+  constructors with the same construct-expression. 
+  Apart from the technical difficulties of juggling program points around 
+  correctly to avoid accidentally merging paths together, you'll have to 
+  be a judge on when to exit the loop and how to widen it. 
+  Given that the constructor is going to be a default constructor, 
+  a nice 95% solution might be to execute exactly one constructor and 
+  then default-bind the resulting LazyCompoundVal to the whole array; 
+  it'll work whenever the default constructor doesn't touch global state 
+  but only initializes the object to various default values. 
+  But if, say, you're making an array of strings, 
+  depending on the implementation you might have to allocate a new buffer 
+  for each string, and in this case default-binding won't cut it. 
+  You might want to come up with an auxiliary analysis in order to perform 
+  widening of these simple loops more precisely.
+  
   
 
 
@@ -117,6 +137,25 @@
   Default arguments in C++ are recomputed at every call,
  and are therefore local, and not static, variables.
   
+  See tests cases in https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/clangsa_unsupported_features/handle_constructors_for_default_arguments.cpp";>handle_constructors_for_default_arguments.cpp.
+  
+  Hints for implementation by Artem Dergachev (NoQ):
+  Default arguments are annoying because the initializer expression is 
+  evaluated at the call site but doesn't syntactically belong to the 
+  caller's AST; instead it belongs to the ParmVarDecl for the default 
+  parameter. This can lead to situations when the same expression has to 
+  carry different values simultaneously - 
+  when multiple instances of the same function are evaluated as part of the 
+  same full-expression without specifying the default arguments. 
+  Even simply calling the function twice (not necessarily within the 
+  same full-expression) may lead to program points agglutinating because 
+  it's the same expression. There are some nasty test cases already 
+  in temporaries.cpp (struct DefaultParam and so on). I recommend adding a 
+  new LocationContext kind specifically to deal with this problem. It'll 
+  also help you figure out the construction context when you evaluate the 
+  construct-expression (though you might still need to do some additional 
+  CFG work to get construction contexts right).
+  
 
 
 Enhance the modeling of the standard library.
Index: clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp
===
--- /dev/null
+++ clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp
@@ -0,0 +1,86 @@
+// RUN: %clang_cc1 -fsyntax-only -analyze \
+// RUN:   -analyzer-checker=core,debug.ExprInspection %s -verify
+
+// These test cases demonstrate lack of Static Analyzer features.
+// The FIXME: tags indicate where we expect different output.
+
+// Handle constructors within new[]
+
+// When an array of objects is allocated using the operator new[],
+// constructors for all elements of the array are called.
+// We should model (potentially some of) such evaluations,
+// and the same applies for destructors called from operator delete[].
+
+void clang_analyzer_eval(bool);
+
+struct init_with_list {
+  int a;
+  init_with_list() : a(1) {}
+};
+
+struct init_in_body {
+  int a;
+  init_in_body() { a = 1; }
+};
+
+struct init_default_member {
+  int a = 1;
+};
+
+void test_automati

[PATCH] D69308: [analyzer] Test cases for the unsupported features for Clang Static Analyzer

2019-10-22 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added reviewers: NoQ, Szelethus.
dkrupp added a project: clang.
Herald added subscribers: cfe-commits, Charusso, gamesh411, donat.nagy, 
mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware, xazax.hun, 
whisperity.

These test cases demonstrate some of the missing features of the Clang Static 
Analyzer.

In this patch 2 missing C++ features are demonstrated from  
https://clang-analyzer.llvm.org/open_projects.html

1. Handle constructors within new[]
2. Handle constructors for default arguments


Repository:
  rC Clang

https://reviews.llvm.org/D69308

Files:
  
clang/test/Analysis/clangsa_unsupported_features/handle_constructors_for_default_arguments.cpp
  
clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp
  clang/www/analyzer/open_projects.html

Index: clang/www/analyzer/open_projects.html
===
--- clang/www/analyzer/open_projects.html
+++ clang/www/analyzer/open_projects.html
@@ -95,6 +95,7 @@
  We should model (potentially some of) such evaluations,
  and the same applies for destructors called from
  operator delete[].
+ See tests cases in https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp";>handle_constructors_with_new_array.cpp.
   
 
 
@@ -117,6 +118,7 @@
   Default arguments in C++ are recomputed at every call,
  and are therefore local, and not static, variables.
   
+  See tests cases in https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/clangsa_unsupported_features/handle_constructors_for_default_arguments.cpp";>handle_constructors_for_default_arguments.cpp.
 
 
 Enhance the modeling of the standard library.
Index: clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp
===
--- /dev/null
+++ clang/test/Analysis/clangsa_unsupported_features/handle_constructors_with_new_array.cpp
@@ -0,0 +1,81 @@
+// RUN: %clang_cc1 -fsyntax-only -analyze \
+// RUN:   -analyzer-checker=core,debug.ExprInspection %s -verify
+// REQUIRES: non-existing-system
+
+// These test cases demonstrate lack of Static Analyzer features.
+// Remove REQUIRES line, when the feature gets implemented.
+
+// Handle constructors within new[]
+
+// When an array of objects is allocated using the operator new[],
+// constructors for all elements of the array are called.
+// We should model (potentially some of) such evaluations,
+// and the same applies for destructors called from operator delete[].
+
+void clang_analyzer_eval(bool);
+
+struct init_with_list {
+  int a;
+  init_with_list() : a(1) {}
+};
+
+struct init_in_body {
+  int a;
+  init_in_body() { a = 1; }
+};
+
+struct init_default_member {
+  int a = 1;
+};
+
+void test_automatic() {
+
+  init_with_list a1;
+  init_in_body a2;
+  init_default_member a3;
+
+  clang_analyzer_eval(a1.a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a2.a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a3.a == 1); // expected-warning {{TRUE}}
+}
+
+void test_dynamic() {
+
+  auto *a1 = new init_with_list;
+  auto *a2 = new init_in_body;
+  auto *a3 = new init_default_member;
+
+  clang_analyzer_eval(a1->a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a2->a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a3->a == 1); // expected-warning {{TRUE}}
+
+  delete a1;
+  delete a2;
+  delete a3;
+}
+
+void test_automatic_aggregate() {
+
+  init_with_list a1[1];
+  init_in_body a2[1];
+  init_default_member a3[1];
+
+  clang_analyzer_eval(a1[0].a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a2[0].a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a3[0].a == 1); // expected-warning {{TRUE}}
+}
+
+void test_dynamic_aggregate() {
+
+  auto *a1 = new init_with_list[1];
+  auto *a2 = new init_in_body[1];
+  auto *a3 = new init_default_member[1];
+
+  clang_analyzer_eval(a1[0].a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a2[0].a == 1); // expected-warning {{TRUE}}
+  clang_analyzer_eval(a3[0].a == 1); // expected-warning {{TRUE}}
+
+  delete[] a1;
+  delete[] a2;
+  delete[] a3;
+}
Index: clang/test/Analysis/clangsa_unsupported_features/handle_constructors_for_default_arguments.cpp
===
--- /dev/null
+++ clang/test/Analysis/clangsa_unsupported_features/handle_constructors_for_default_arguments.cpp
@@ -0,0 +1,86 @@
+// RUN: %clang_cc1 -fsyntax-only -analyze \
+// RUN:   -analyzer-checker=core,debug.ExprInspection %s -verify
+// REQUIRES: non-existing-system
+
+// These test cases demonstrate lack of Static Analyzer features.
+// Remove REQUIRES line, when the feature gets implemented.
+
+// Handle constructors for default arguments
+// Default arguments 

[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2019-10-22 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 226037.
dkrupp added a comment.

The patch is rebased to the latest master.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
  clang-tools-extra/test/clang-tidy/checkers/misc-redundant-expression.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-redundant-expression.cpp
===
--- clang-tools-extra/test/clang-tidy/checkers/misc-redundant-expression.cpp
+++ clang-tools-extra/test/clang-tidy/checkers/misc-redundant-expression.cpp
@@ -114,6 +114,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -122,11 +123,27 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+#define   COND_OP_THIRD_MACRO 8
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+
+  k += (y < 0) ? sizeof(I64) : sizeof(I64);
+  // CHECK-MESSAGES: :[[@LINE-1]]:30: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(TestConditional(k,y));
+  // CHECK-MESSAGES: :[[@LINE-1]]:47: warning: 'true' and 'false' expressions are equivalent
+  // No warning if the expression arguments are different.
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(Valid(k,y));
+
   return k;
 }
 #undef COND_OP_MACRO
@@ -134,7 +151,7 @@
 
 // Overloaded operators that compare two instances of a struct.
 struct MyStruct {
-  int x;  
+  int x;
   bool operator==(const MyStruct& rhs) const {return this->x == rhs.x; } // not modifing
   bool operator>=(const MyStruct& rhs) const { return this->x >= rhs.x; } // not modifing
   bool operator<=(MyStruct& rhs) const { return this->x <= rhs.x; }
Index: clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -131,6 +131,20 @@
   case Stmt::BinaryOperatorClass:
 return cast(Left)->getOpcode() ==
cast(Right)->getOpcode();
+  case Stmt::UnaryExprOrTypeTraitExprClass:
+const auto *LeftUnaryExpr =
+cast(Left);
+const auto *RightUnaryExpr =
+cast(Right);
+if (LeftUnaryExpr->isArgumentType() && RightUnaryExpr->isArgumentType())
+  return LeftUnaryExpr->getArgumentType() ==
+ RightUnaryExpr->getArgumentType();
+else if (!LeftUnaryExpr->isArgumentType() &&
+ !RightUnaryExpr->isArgumentType())
+  return areEquivalentExpr(LeftUnaryExpr->getArgumentExpr(),
+   RightUnaryExpr->getArgumentExpr());
+
+return false;
   }
 }
 
@@ -604,23 +618,62 @@
   return true;
 }
 
+static bool isSameRawIdentifierToken(const Token &T1, const Token &T2,
+const SourceManager &SM) {
+  if (T1.getKind() != T2.getKind())
+return false;
+  if (T1.isNot(tok::raw_identifier))
+return true;
+  if (T1.getLength() != T2.getLength())
+return false;
+  return StringRef(SM.getCharacterData(T1.getLocation()), T1.getLength()) ==
+ StringRef(SM.getCharacterData(T2.getLocation()), T2.getLength());
+}
+
+bool isTokAtEndOfExpr(SourceRange ExprSR, Token T, const SourceManager &SM) {
+  return SM.getExpansionLoc(ExprSR.getEnd()) == T.getLocation();
+}
+
+/// Returns true if both LhsEpxr and RhsExpr are
+/// macro expressions and they are expanded
+/// from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange();
+  if (!Lsr.getBegin().isMacroID

[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2019-10-22 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@aaron.ballman  could you please check now? Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-10-11 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

Thanks for the reviews!
Could you pls commit this for me?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2019-10-10 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@aaron.ballman could you please commit? 
I don't have commit access. Thx.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-10-09 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 224090.
dkrupp added a comment.

Fixing minor capitalization issue and removing an extra newline.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049

Files:
  clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  clang/test/Analysis/bsd-string.c

Index: clang/test/Analysis/bsd-string.c
===
--- clang/test/Analysis/bsd-string.c
+++ clang/test/Analysis/bsd-string.c
@@ -9,6 +9,7 @@
 typedef __typeof(sizeof(int)) size_t;
 size_t strlcpy(char *dst, const char *src, size_t n);
 size_t strlcat(char *dst, const char *src, size_t n);
+size_t strlen(const char *s);
 void clang_analyzer_eval(int);
 
 void f1() {
@@ -18,9 +19,11 @@
 
 void f2() {
   char buf[5];
-  strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
-  // FIXME: This should not warn. The string is safely truncated.
-  strlcat(buf, "efgh", sizeof(buf)); // expected-warning{{Size argument is greater than the free space in the destination buffer}}
+  size_t len;
+  len = strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 4); // expected-warning{{TRUE}}
+  len = strlcat(buf, "efgh", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 8); // expected-warning{{TRUE}}
 }
 
 void f3() {
@@ -48,3 +51,83 @@
   char buf[8];
   return strlcpy(buf, "1234567", 0); // no-crash
 }
+
+void f8(){
+  char buf[5];
+  size_t len;
+
+  // basic strlcpy
+  len = strlcpy(buf,"123", sizeof(buf));
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+
+  // testing bounded strlcat
+  len = strlcat(buf,"456", sizeof(buf));
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcat with size==0
+  len = strlcat(buf,"789", 0);
+  clang_analyzer_eval(len==7);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcpy with size==0
+  len = strlcpy(buf,"123",0);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+}
+
+void f9(int unknown_size, char* unknown_src, char* unknown_dst){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)==4);// expected-warning{{TRUE}}
+
+  //size is unknown
+  len = strlcat(buf,"cd", unknown_size);
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)>=4);// expected-warning{{TRUE}}
+
+  //dst is unknown
+  len = strlcpy(unknown_dst,"abbc",unknown_size);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //src is unknown
+  len = strlcpy(buf,unknown_src, sizeof(buf));
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(buf));// expected-warning{{UNKNOWN}}
+
+  //src, dst is unknown
+  len = strlcpy(unknown_dst, unknown_src, unknown_size);
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //size is unknown
+  len = strlcat(buf+2,unknown_src+1, sizeof(buf));// expected-warning{{Size argument is greater than the length of the destination buffer}};
+}
+
+void f10(){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  strlcat(buf, "efghi",9);// expected-warning{{Size argument is greater than the length of the destination buffer}}
+}
+
+void f11() {
+  //test for Bug 41729
+  char a[256], b[256];
+  strlcpy(a, "world", sizeof(a));
+  strlcpy(b, "hello ", sizeof(b));
+  strlcat(b, a, sizeof(b)); // no-warning
+}
Index: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -28,6 +28,7 @@
 using namespace ento;
 
 namespace {
+enum class ConcatFnKind { none = 0, strcat = 1, strlcat = 2 };
 class CStringChecker : public Checker< eval::Call,
  check::PreStmt,
  check::LiveSymbols,
@@ -129,11 +130,8 @@
   void evalStrncpy(CheckerContext &C, const CallExpr *CE) const;
   void evalStpcpy(CheckerContext &C, const CallExpr *CE) const;
   void evalStrlcpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrcpyCommon(CheckerContext &C,
-const CallExpr *CE,
-bool returnEnd,
-bool isBounded,
-bool isAppending

[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-10-09 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

I also analyzed openssl with the baseline and this version, but did not find 
any new warnings.
See:
http://codechecker-demo.eastus.cloudapp.azure.com/Default/#run=D66049_baseline&newcheck=D66049_improved&review-status=Unreviewed&review-status=Confirmed&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=D66049_baseline_diff_D66049_improved




Comment at: lib/StaticAnalyzer/Checkers/CStringChecker.cpp:1580
 
   // If the function is strncpy, strncat, etc... it is bounded.
   if (isBounded) {

Szelethus wrote:
> Ah, okay, so the assumption is that bounded functions' third argument is 
> always a numerical size parameter. Why isn't that enforced at all?
How should we enforce this?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-10-09 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 224088.
dkrupp marked 9 inline comments as done.
dkrupp added a comment.

@Szelethus thanks for your review.
I fixed your suggestions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049

Files:
  clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  clang/test/Analysis/bsd-string.c

Index: clang/test/Analysis/bsd-string.c
===
--- clang/test/Analysis/bsd-string.c
+++ clang/test/Analysis/bsd-string.c
@@ -9,6 +9,7 @@
 typedef __typeof(sizeof(int)) size_t;
 size_t strlcpy(char *dst, const char *src, size_t n);
 size_t strlcat(char *dst, const char *src, size_t n);
+size_t strlen(const char *s);
 void clang_analyzer_eval(int);
 
 void f1() {
@@ -18,9 +19,11 @@
 
 void f2() {
   char buf[5];
-  strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
-  // FIXME: This should not warn. The string is safely truncated.
-  strlcat(buf, "efgh", sizeof(buf)); // expected-warning{{Size argument is greater than the free space in the destination buffer}}
+  size_t len;
+  len = strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 4); // expected-warning{{TRUE}}
+  len = strlcat(buf, "efgh", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 8); // expected-warning{{TRUE}}
 }
 
 void f3() {
@@ -44,7 +47,88 @@
   clang_analyzer_eval(len == 7); // expected-warning{{TRUE}}
 }
 
+
 int f7() {
   char buf[8];
   return strlcpy(buf, "1234567", 0); // no-crash
 }
+
+void f8(){
+  char buf[5];
+  size_t len;
+
+  // basic strlcpy
+  len = strlcpy(buf,"123", sizeof(buf));
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+
+  // testing bounded strlcat
+  len = strlcat(buf,"456", sizeof(buf));
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcat with size==0
+  len = strlcat(buf,"789", 0);
+  clang_analyzer_eval(len==7);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcpy with size==0
+  len = strlcpy(buf,"123",0);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+}
+
+void f9(int unknown_size, char* unknown_src, char* unknown_dst){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)==4);// expected-warning{{TRUE}}
+
+  //size is unknown
+  len = strlcat(buf,"cd", unknown_size);
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)>=4);// expected-warning{{TRUE}}
+
+  //dst is unknown
+  len = strlcpy(unknown_dst,"abbc",unknown_size);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //src is unknown
+  len = strlcpy(buf,unknown_src, sizeof(buf));
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(buf));// expected-warning{{UNKNOWN}}
+
+  //src, dst is unknown
+  len = strlcpy(unknown_dst, unknown_src, unknown_size);
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //size is unknown
+  len = strlcat(buf+2,unknown_src+1, sizeof(buf));// expected-warning{{Size argument is greater than the length of the destination buffer}};
+}
+
+void f10(){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  strlcat(buf, "efghi",9);// expected-warning{{Size argument is greater than the length of the destination buffer}}
+}
+
+void f11() {
+  //test for Bug 41729
+  char a[256], b[256];
+  strlcpy(a, "world", sizeof(a));
+  strlcpy(b, "hello ", sizeof(b));
+  strlcat(b, a, sizeof(b)); // no-warning
+}
Index: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
===
--- clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -28,6 +28,7 @@
 using namespace ento;
 
 namespace {
+enum class ConcatFnKind { none = 0, strcat = 1, strlcat = 2 };
 class CStringChecker : public Checker< eval::Call,
  check::PreStmt,
  check::LiveSymbols,
@@ -129,11 +130,8 @@
   void evalStrncpy(CheckerContext &C, const CallExpr *CE) const;
   void evalStpcpy(CheckerContext &C, const CallExpr *CE) const;
   void evalStrlcpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrcpyCommon(CheckerContext &C,
-const CallExpr *CE,
-

[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2019-10-07 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 223501.
dkrupp marked 5 inline comments as done.
dkrupp added a comment.

Thanks @aaron.ballman and @alexfh for your review.
I fixed your findings.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
  clang-tools-extra/test/clang-tidy/misc-redundant-expression.cpp

Index: clang-tools-extra/test/clang-tidy/misc-redundant-expression.cpp
===
--- clang-tools-extra/test/clang-tidy/misc-redundant-expression.cpp
+++ clang-tools-extra/test/clang-tidy/misc-redundant-expression.cpp
@@ -114,6 +114,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -122,11 +123,27 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+#define   COND_OP_THIRD_MACRO 8
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+
+  k += (y < 0) ? sizeof(I64) : sizeof(I64);
+  // CHECK-MESSAGES: :[[@LINE-1]]:30: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(TestConditional(k,y));
+  // CHECK-MESSAGES: :[[@LINE-1]]:47: warning: 'true' and 'false' expressions are equivalent
+  // No warning if the expression arguments are different.
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(Valid(k,y));
+
   return k;
 }
 #undef COND_OP_MACRO
@@ -134,7 +151,7 @@
 
 // Overloaded operators that compare two instances of a struct.
 struct MyStruct {
-  int x;  
+  int x;
   bool operator==(const MyStruct& rhs) const {return this->x == rhs.x; } // not modifing
   bool operator>=(const MyStruct& rhs) const { return this->x >= rhs.x; } // not modifing
   bool operator<=(MyStruct& rhs) const { return this->x <= rhs.x; }
Index: clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -131,6 +131,20 @@
   case Stmt::BinaryOperatorClass:
 return cast(Left)->getOpcode() ==
cast(Right)->getOpcode();
+  case Stmt::UnaryExprOrTypeTraitExprClass:
+const auto *LeftUnaryExpr =
+cast(Left);
+const auto *RightUnaryExpr =
+cast(Right);
+if (LeftUnaryExpr->isArgumentType() && RightUnaryExpr->isArgumentType())
+  return LeftUnaryExpr->getArgumentType() ==
+ RightUnaryExpr->getArgumentType();
+else if (!LeftUnaryExpr->isArgumentType() &&
+ !RightUnaryExpr->isArgumentType())
+  return areEquivalentExpr(LeftUnaryExpr->getArgumentExpr(),
+   RightUnaryExpr->getArgumentExpr());
+
+return false;
   }
 }
 
@@ -604,23 +618,62 @@
   return true;
 }
 
+static bool isSameRawIdentifierToken(const Token &T1, const Token &T2,
+const SourceManager &SM) {
+  if (T1.getKind() != T2.getKind())
+return false;
+  if (T1.isNot(tok::raw_identifier))
+return true;
+  if (T1.getLength() != T2.getLength())
+return false;
+  return StringRef(SM.getCharacterData(T1.getLocation()), T1.getLength()) ==
+ StringRef(SM.getCharacterData(T2.getLocation()), T2.getLength());
+}
+
+bool isTokAtEndOfExpr(SourceRange ExprSR, Token T, const SourceManager &SM) {
+  return SM.getExpansionLoc(ExprSR.getEnd()) == T.getLocation();
+}
+
+/// Returns true if both LhsEpxr and RhsExpr are
+/// macro expressions and they are expanded
+/// from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange

[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-08-13 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 214817.
dkrupp added a comment.

Fix comments from @NoQ


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049

Files:
  lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  test/Analysis/bsd-string.c

Index: test/Analysis/bsd-string.c
===
--- test/Analysis/bsd-string.c
+++ test/Analysis/bsd-string.c
@@ -9,6 +9,7 @@
 typedef __typeof(sizeof(int)) size_t;
 size_t strlcpy(char *dst, const char *src, size_t n);
 size_t strlcat(char *dst, const char *src, size_t n);
+size_t strlen(const char *s);
 void clang_analyzer_eval(int);
 
 void f1() {
@@ -18,9 +19,11 @@
 
 void f2() {
   char buf[5];
-  strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
-  // FIXME: This should not warn. The string is safely truncated.
-  strlcat(buf, "efgh", sizeof(buf)); // expected-warning{{Size argument is greater than the free space in the destination buffer}}
+  size_t len;
+  len = strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 4); // expected-warning{{TRUE}}
+  len = strlcat(buf, "efgh", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 8); // expected-warning{{TRUE}}
 }
 
 void f3() {
@@ -44,7 +47,88 @@
   clang_analyzer_eval(len == 7); // expected-warning{{TRUE}}
 }
 
+
 int f7() {
   char buf[8];
   return strlcpy(buf, "1234567", 0); // no-crash
 }
+
+void f8(){
+  char buf[5];
+  size_t len;
+
+  // basic strlcpy
+  len = strlcpy(buf,"123", sizeof(buf));
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+
+  // testing bounded strlcat
+  len = strlcat(buf,"456", sizeof(buf));
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcat with size==0
+  len = strlcat(buf,"789", 0);
+  clang_analyzer_eval(len==7);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcpy with size==0
+  len = strlcpy(buf,"123",0);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+}
+
+void f9(int unknown_size, char* unknown_src, char* unknown_dst){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)==4);// expected-warning{{TRUE}}
+
+  //size is unknown
+  len = strlcat(buf,"cd", unknown_size);
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)>=4);// expected-warning{{TRUE}}
+
+  //dst is unknown
+  len = strlcpy(unknown_dst,"abbc",unknown_size);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //src is unknown
+  len = strlcpy(buf,unknown_src, sizeof(buf));
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(buf));// expected-warning{{UNKNOWN}}
+
+  //src, dst is unknown
+  len = strlcpy(unknown_dst, unknown_src, unknown_size);
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //size is unknown
+  len = strlcat(buf+2,unknown_src+1, sizeof(buf));// expected-warning{{Size argument is greater than the length of the destination buffer}};
+}
+
+void f10(){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  strlcat(buf, "efghi",9);// expected-warning{{Size argument is greater than the length of the destination buffer}}
+}
+
+void f11() {
+  //test for Bug 41729
+  char a[256], b[256];
+  strlcpy(a, "world", sizeof(a));
+  strlcpy(b, "hello ", sizeof(b));
+  strlcat(b, a, sizeof(b)); // no-warning
+}
Index: lib/StaticAnalyzer/Checkers/CStringChecker.cpp
===
--- lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -28,6 +28,7 @@
 using namespace ento;
 
 namespace {
+enum class appendKind { none = 0, strcat = 1, strlcat = 2 };
 class CStringChecker : public Checker< eval::Call,
  check::PreStmt,
  check::LiveSymbols,
@@ -129,11 +130,8 @@
   void evalStrncpy(CheckerContext &C, const CallExpr *CE) const;
   void evalStpcpy(CheckerContext &C, const CallExpr *CE) const;
   void evalStrlcpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrcpyCommon(CheckerContext &C,
-const CallExpr *CE,
-bool returnEnd,
-bool isBounded,
-boo

[PATCH] D66049: [analyzer] PR41729: Fix some false positives and improve strlcat and strlcpy modeling

2019-08-13 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked 3 inline comments as done.
dkrupp added a comment.

Thanks for the comments @NoQ , all of them addressed.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66049/new/

https://reviews.llvm.org/D66049



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2019-08-10 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 214529.
dkrupp added a comment.

@aaron.ballman 's comments are fixed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tidy/misc/RedundantExpressionCheck.cpp
  test/clang-tidy/misc-redundant-expression.cpp

Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -114,6 +114,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -122,16 +123,33 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+#define   COND_OP_THIRD_MACRO 8
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+
+  k += (y < 0) ? sizeof(I64) : sizeof(I64);
+  // CHECK-MESSAGES: :[[@LINE-1]]:30: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(TestConditional(k,y));
+  // CHECK-MESSAGES: :[[@LINE-1]]:47: warning: 'true' and 'false' expressions are equivalent
+  // No warning if the expression arguments are different.
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(Valid(k,y));
+
   return k;
 }
 #undef COND_OP_MACRO
 #undef COND_OP_OTHER_MACRO
 
+
 // Overloaded operators that compare two instances of a struct.
 struct MyStruct {
   int x;  
Index: clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -131,6 +131,20 @@
   case Stmt::BinaryOperatorClass:
 return cast(Left)->getOpcode() ==
cast(Right)->getOpcode();
+  case Stmt::UnaryExprOrTypeTraitExprClass:
+const auto *LeftUnaryExpr =
+cast(Left);
+const auto *RightUnaryExpr =
+cast(Right);
+if (LeftUnaryExpr->isArgumentType() && RightUnaryExpr->isArgumentType())
+  return LeftUnaryExpr->getArgumentType() ==
+ RightUnaryExpr->getArgumentType();
+else if (!LeftUnaryExpr->isArgumentType() &&
+ !RightUnaryExpr->isArgumentType())
+  return areEquivalentExpr(LeftUnaryExpr->getArgumentExpr(),
+   RightUnaryExpr->getArgumentExpr());
+
+return false;
   }
 }
 
@@ -604,23 +618,61 @@
   return true;
 }
 
+static bool isSameToken(const Token &T1, const Token &T2,
+const SourceManager &SM) {
+  if (T1.getKind() != T2.getKind())
+return false;
+  if (T1.isNot(tok::raw_identifier))
+return true;
+  if (T1.getLength() != T2.getLength())
+return false;
+  return strncmp(SM.getCharacterData(T1.getLocation()),
+ SM.getCharacterData(T2.getLocation()), T1.getLength()) == 0;
+}
+
+bool isTokAtEndOfExpr(SourceRange ExprSR, Token T, const SourceManager &SM) {
+  return SM.getExpansionLoc(ExprSR.getEnd()) == T.getLocation();
+}
+
+/// Returns true if both LhsEpxr and RhsExpr are
+/// macro expressions and they are expanded
+/// from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange();
+  if (!Lsr.getBegin().isMacroID() || !Rsr.getBegin().isMacroID())
 return false;
 
   const SourceManager &SM = AstCtx->getSourceManager();
   const LangOptions &LO = AstCtx->getLangOpts();
 
-  return !(Lexer::getImmediateMacroName(LhsLoc, SM, LO) ==
-  Lexer::getImmediateMacroName(RhsLoc, SM, LO));
+  std::pair LsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Lsr.getBegin()));
+  std::pair RsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Rsr.getBegin()));
+  

[PATCH] D66049: Fixes Bug 41729 and improves strlcat and strlcpy modeling

2019-08-10 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added reviewers: NoQ, Szelethus, gamesh411.
Herald added subscribers: cfe-commits, rnkovacs.
Herald added a project: clang.

Fixes Bug 41729 (https://bugs.llvm.org/show_bug.cgi?id=41729)
 and the following errors:

  -Fixes false positive reports of strlcat
  -The return value of strlcat and strlcpy is now correctly calculated
  -The resulting string length of strlcat and strlcpy is now correctly 
calculated


Repository:
  rC Clang

https://reviews.llvm.org/D66049

Files:
  lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  test/Analysis/bsd-string.c
  test/Analysis/cstring-syntax.c

Index: test/Analysis/cstring-syntax.c
===
--- test/Analysis/cstring-syntax.c
+++ test/Analysis/cstring-syntax.c
@@ -53,4 +53,10 @@
   strlcpy(dest, src, 5);
   strlcat(dest + 5, src, badlen); // expected-warning {{The third argument allows to potentially copy more bytes than it should. Replace with the value sizeof() or lower}}
   strlcat(dest, "aaa", 10); // no-warning
+
+  //test for Bug 41729
+  char a[256], b[256];
+  strlcpy(a, "world", sizeof(a));
+  strlcpy(b, "hello ", sizeof(b));
+  strlcat(b, a, sizeof(b)); // no-warning
 }
Index: test/Analysis/bsd-string.c
===
--- test/Analysis/bsd-string.c
+++ test/Analysis/bsd-string.c
@@ -9,6 +9,7 @@
 typedef __typeof(sizeof(int)) size_t;
 size_t strlcpy(char *dst, const char *src, size_t n);
 size_t strlcat(char *dst, const char *src, size_t n);
+size_t strlen(const char *s);
 void clang_analyzer_eval(int);
 
 void f1() {
@@ -18,9 +19,11 @@
 
 void f2() {
   char buf[5];
-  strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
-  // FIXME: This should not warn. The string is safely truncated.
-  strlcat(buf, "efgh", sizeof(buf)); // expected-warning{{Size argument is greater than the free space in the destination buffer}}
+  size_t len;
+  len = strlcpy(buf, "abcd", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 4); // expected-warning{{TRUE}}
+  len = strlcat(buf, "efgh", sizeof(buf)); // expected-no-warning
+  clang_analyzer_eval(len == 8); // expected-warning{{TRUE}}
 }
 
 void f3() {
@@ -44,7 +47,80 @@
   clang_analyzer_eval(len == 7); // expected-warning{{TRUE}}
 }
 
+
 int f7() {
   char buf[8];
   return strlcpy(buf, "1234567", 0); // no-crash
 }
+
+void f8(){
+  char buf[5];
+  size_t len;
+
+  // basic strlcpy
+  len = strlcpy(buf,"123", sizeof(buf));
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+
+  // testing bounded strlcat
+  len = strlcat(buf,"456", sizeof(buf));
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcat with size==0
+  len = strlcat(buf,"789", 0);
+  clang_analyzer_eval(len==7);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+  // testing strlcpy with size==0
+  len = strlcpy(buf,"123",0);
+  clang_analyzer_eval(len==3);// expected-warning{{TRUE}}
+  len = strlen(buf);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+
+}
+
+void f9(int unknown_size, char* unknown_src, char* unknown_dst){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)==4);// expected-warning{{TRUE}}
+
+  //size is unknown
+  len = strlcat(buf,"cd", unknown_size);
+  clang_analyzer_eval(len==6);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(buf)>=4);// expected-warning{{TRUE}}
+
+  //dst is unknown
+  len = strlcpy(unknown_dst,"abbc",unknown_size);
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //src is unknown
+  len = strlcpy(buf,unknown_src, sizeof(buf));
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(buf));// expected-warning{{UNKNOWN}}
+
+  //src, dst is unknown
+  len = strlcpy(unknown_dst, unknown_src, unknown_size);
+  clang_analyzer_eval(len);// expected-warning{{UNKNOWN}}
+  clang_analyzer_eval(strlen(unknown_dst));// expected-warning{{UNKNOWN}}
+
+  //size is unknown
+  len = strlcat(buf+2,unknown_src+1, sizeof(buf));// expected-warning{{Size argument is greater than the length of the destination buffer}};
+}
+
+void f10(){
+  char buf[8];
+  size_t len;
+
+  len = strlcpy(buf,"abba",sizeof(buf));
+  clang_analyzer_eval(len==4);// expected-warning{{TRUE}}
+  strlcat(buf, "efghi",9);// expected-warning{{Size argument is greater than the length of the destination buffer}}
+}
Index: lib/StaticAnalyzer/Checkers/CStringChecker.cpp
===
--- lib/StaticAnalyzer/Checkers/CStringChecker.cpp

[PATCH] D64801: [analyzer] Add CTU user docs

2019-07-17 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp requested changes to this revision.
dkrupp added a comment.
This revision now requires changes to proceed.

Thanks Gabor for writing this. 
I suggested some minor changes to the txt. Otherwise LGTM.




Comment at: clang/docs/analyzer/user-docs/CrossTranslationUnit.rst:98
+
+This manual procedure is boring and error-prone, so sooner or later we'd like 
to have a script which automates this for us.
+

since CodeChecker automation is already available I suggest rephrasing.

This manual procedure is error-prone and not scalable, so for analyzing real 
projects it is recommended to use the built-in support of  CodeChecker or 
scan-build-py.



Comment at: clang/docs/analyzer/user-docs/CrossTranslationUnit.rst:142
+The `plist` files contain the results of the analysis, which may be viewed 
with the regular analysis tools.
+E.g. one may use `CodeChecker server` and `CodeChecker store` to store and 
view the results in a web browser.
+

I suggest to replace this line with the following:
E.g. one may use CodeChecker parse to view the results in command line or 
CodeChecker parse -e html to export them into HTML format.

pure command line usage without worrying about the server setup better fits 
this command line use-case.



Comment at: clang/docs/analyzer/user-docs/CrossTranslationUnit.rst:147
+We actively develop CTU with CodeChecker as a "runner" script, `scan-build` is 
not actively developed for CTU.
+`scan-build` has different errors and issues, expect it to work with the very 
basic projects only.

Some basic usage description could be added still...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64801/new/

https://reviews.llvm.org/D64801



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64494: [analyzer]Add user docs rst

2019-07-10 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp accepted this revision.
dkrupp added a comment.
This revision is now accepted and ready to land.

I guess this is a placeholder for the subpages of "User Manual" @ 
https://clang-analyzer.llvm.org, which will be ported in follow-up patches.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64494/new/

https://reviews.llvm.org/D64494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57858: [analyzer] Add a new frontend flag to display all checker options

2019-05-14 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D57858#1500635 , @NoQ wrote:

> In D57858#146 , @dkrupp wrote:
>
> > Some alpha checkers are considerably more mature than others and are quite 
> > usable. In our experience, there are some users who are keen to run these 
> > checkers on their code and report back any false positives to us. So in 
> > this sense these are not "developer only" checkers. So I think we should 
> > let the users list them, read their descriptions and try them out. Some of 
> > them will come back with useful feedback as to how to improve them further.
>
>
> What are such checkers currently? Like, the ones that aren't clearly "missing 
> limbs" and that have somebody happy to //address// feedback sent against them?
>
> Do you have a chance to call out to your users for testing the checker and 
> actively request feedback, as @Szelethus did on the mailing list?
>
> I feel that we could do some sort of "early access checkers" programme, but i 
> believe this would require a more careful PR than just dumping a list of 
> alpha checkers on our users' heads.
>
> In D57858#146 , @dkrupp wrote:
>
> > Some users would not care if the checker gives some more false positives 
> > than the "mature" checkers if they can catch some true positives with them.
>
>
> Yeah, and these are pretty much the users we're trying to protect from 
> themselves :)


These are the alpha checkers that we are testing in Ericsson:
 alpha.core.BoolAssignment
 alpha.core.CastSize
 alpha.core.Conversion
 alpha.core.DynamicTypeChecker
 alpha.core.SizeofPtr
 alpha.core.TestAfterDivZero
 alpha.cplusplus.DeleteWithNonVirtualDtor
 alpha.cplusplus.MisusedMovedObject
 alpha.cplusplus.UninitializedObject
 alpha.security.MallocOverflow
 alpha.security.MmapWriteExec
 alpha.security.ReturnPtrRange
 alpha.security.taint.TaintPropagation
 alpha.unix.BlockInCriticalSection
 alpha.unix.Chroot
 alpha.unix.PthreadLock
 alpha.unix.SimpleStream
 alpha.unix.Stream
 alpha.unix.cstring.NotNullTerminated
 alpha.unix.cstring.OutOfBounds

This 2 have just been moved out of alpha lately:
 alpha.cplusplus.MisusedMovedObject
 alpha.cplusplus.UninitializedObject

According to our tests these checkers do not crash and do not give a large 
number of reports (<~50)  even on large code base.
So we can check for false positives in them one by one. 
Probably these are the closest to come out from alpha. 
We could maybe try to test these checkers one-by-one on large open source code 
bases and move them out from alpha when we are confident enough.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57858/new/

https://reviews.llvm.org/D57858



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57858: [analyzer] Add a new frontend flag to display all checker options

2019-05-13 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D57858#1499414 , @Szelethus wrote:

> In D57858#1498640 , @NoQ wrote:
>
> > So, like, the global picture is as follows. In our case the Driver (i.e., 
> > --analyze) is not much more user facing than frontend flags. It's still 
> > fairly unusable directly, as our Static Analyzer is generally not a 
> > command-line tool. The real user-facing stuff is the GUIs such as 
> > scan-build or CodeChecker. These GUIs decide themselves on what options 
> > they want to expose. For instance, you have a right to decide that 
> > CodeChecker shouldn't support the aggressive mode of the move-checker and 
> > don't expose it as an option in your GUI. In this sense it's not really 
> > useful to provide a centralized `-help` of all user-facing options.
> >
> > But it sounds as if you want to change this situation and provide a single 
> > source of truth on what are the user-facing options. Particular GUIs can 
> > still ignore them, but you don't want to hardcode flags in CodeChecker, but 
> > instead you want to rely on clang to provide the list of supported options 
> > for you and, as a side effect, for scan-build users (if you also add a 
> > scan-build help flag). I'm totally in favor of crystallizing such list of 
> > user-facing flags, and this patch sounds like a good step in that 
> > direction, assuming that non-user-facing options are hidden.
>
>
> That describes my intention quite well :)
>
> > I'm still in favor of hiding alpha checkers (as they are for development 
> > only, much like debug flags; i'd recommend hiding them in the CodeChecker 
> > UI as well)
> > 
> > Now, why do we care about frontend/driver flags when they're unusable by 
> > definition? That's because we have a mental trauma after seeing a few 
> > powerusers actively explore those flags, observe that they don't work, and 
> > then tell everybody that the Analyzer is broken. So there's a threshold, 
> > based on a tiny but painful bit of practical experience, that says that 
> > documentation of developer-only features on llvm.org or in code comments is 
> > fine, but documentation printed by the released binary itself is not fine.
>
> What you say sounds very reasonable. Still, I am kind of hesitant about 
> hiding all alpha checkers: I initially intended to hide only are 
> developer-only checkers (modeling, debug). I guess if we define alpha 
> checkers (as you stated numerous times) as incomplete, under development, are 
> missing half their limbs and crash if you look at them the wrong way, sure, 
> they belong in the developer-only category. But checkers such as mine 
> (UninitializedObjectChecker), for the longest time were very stable, have 
> been enabled by default for our internal projects, despite only recently 
> moving out of alpha.
>
> Then agaaain, if we're that stubborn about alpha checkers, we could might as 
> well dig them out of `-analyzer-checker-help-hidden`, and leave the rest 
> there. Untangling what alpha checkers depend on one another could be solved 
> by making yet another frontend flag that would display checker dependencies, 
> which would be super easy since D54438 , or 
> create an `-analyzer-checker-help-alpha` flag that would display alpha, but 
> not developer-only checkers. @dkrupp @o.gyorgy Do you have any feelings on 
> this?
>
> > and we should probably automatically hide options of checker that are 
> > hidden.
>
> Checker options are a different kind of animal entirely. I think we should 
> definitely let the user fine-tune some modeling checkers, for instance, 
> `unix.DynamicMemoryModeling:Optimistic`, despite us not really wanting to let 
>  anyone (even developers really) mess around whether 
> `unix.DynamicMemoryModeling` should be enabled. While that specific option 
> is, to put it nicely, a little esoteric, making some decisions the analyzer 
> makes less conservative, or limiting state splits to help performance may be 
> desirable in the future.
>
> Let's move the rest of the discussion directly related to hiding checker 
> options to D61839 !


Yes, it would be great if the clang static analyzer would be the ultimate 
source of information with respect to the checkers and checker options. Then we 
would not need to split this info between the front-end (scanbuild, 
codechecker) and the analyzer.

Some alpha checkers are considerably more mature than others and are quite 
usable. In our experience, there are some users who are keen to run these 
checkers on their code and report back any false positives to us. So in this 
sense these are not "developer only" checkers. So I think we should let the 
users list them, read their descriptions and try them out. Some of them will 
come back with useful feedback as to how to improve them further. Some users 
would not care if the checker gives some more false positives 

[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2019-05-03 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 198041.
dkrupp marked 6 inline comments as done.
dkrupp added a comment.

I have fixed all your comments and rebased the patch to the latest master.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tidy/misc/RedundantExpressionCheck.cpp
  test/clang-tidy/misc-redundant-expression.cpp

Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -106,6 +106,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -114,16 +115,33 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+#define   COND_OP_THIRD_MACRO 8
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+
+  k += (y < 0) ? sizeof(I64) : sizeof(I64);
+  // CHECK-MESSAGES: :[[@LINE-1]]:30: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(TestConditional(k,y));
+  // CHECK-MESSAGES: :[[@LINE-1]]:47: warning: 'true' and 'false' expressions are equivalent
+  // No warning if the expression arguments are different.
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(Valid(k,y));
+
   return k;
 }
 #undef COND_OP_MACRO
 #undef COND_OP_OTHER_MACRO
 
+
 // Overloaded operators that compare two instances of a struct.
 struct MyStruct {
   int x;  
Index: clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -131,6 +131,20 @@
   case Stmt::BinaryOperatorClass:
 return cast(Left)->getOpcode() ==
cast(Right)->getOpcode();
+  case Stmt::UnaryExprOrTypeTraitExprClass:
+const UnaryExprOrTypeTraitExpr *LeftUnaryExpr =
+cast(Left);
+const UnaryExprOrTypeTraitExpr *RightUnaryExpr =
+cast(Right);
+if (LeftUnaryExpr->isArgumentType() && RightUnaryExpr->isArgumentType())
+  return LeftUnaryExpr->getArgumentType() ==
+ RightUnaryExpr->getArgumentType();
+else if (!LeftUnaryExpr->isArgumentType() &&
+ !RightUnaryExpr->isArgumentType())
+  return areEquivalentExpr(LeftUnaryExpr->getArgumentExpr(),
+   RightUnaryExpr->getArgumentExpr());
+
+return false;
   }
 }
 
@@ -513,7 +527,8 @@
 Value = APSInt(32, false);
   } else if (const auto *OverloadedOperatorExpr =
  Result.Nodes.getNodeAs(OverloadId)) {
-const auto *OverloadedFunctionDecl = dyn_cast_or_null(OverloadedOperatorExpr->getCalleeDecl());
+const auto *OverloadedFunctionDecl =
+dyn_cast_or_null(OverloadedOperatorExpr->getCalleeDecl());
 if (!OverloadedFunctionDecl)
   return false;
 
@@ -529,7 +544,8 @@
 
 Symbol = OverloadedOperatorExpr->getArg(0);
 OperandExpr = OverloadedOperatorExpr;
-Opcode = BinaryOperator::getOverloadedOpcode(OverloadedOperatorExpr->getOperator());
+Opcode = BinaryOperator::getOverloadedOpcode(
+OverloadedOperatorExpr->getOperator());
 
 return BinaryOperator::isComparisonOp(Opcode);
   } else {
@@ -547,7 +563,8 @@
 }
 
 // Checks for expressions like (X == 4) && (Y != 9)
-static bool areSidesBinaryConstExpressions(const BinaryOperator *&BinOp, const ASTContext *AstCtx) {
+static bool areSidesBinaryConstExpressions(const BinaryOperator *&BinOp,
+   const ASTContext *AstCtx) {
   const auto *LhsBinOp = dyn_cast(BinOp->getLHS());
   const auto *RhsBinOp = dyn_cast(BinOp->getRHS());
 
@@ -597,23 +614,60 @@
   return true;
 }
 
+static bool isSameToken(const Token &T1, const Token &T2,
+const SourceManager &SM) {
+  if (T1.getKind() != T2.getKind())
+return false;
+  if (T1.isNot(tok::raw_identifier))
+return true;
+  if (T1.getLength() != T2.getLength())
+return false;
+  return strncmp(SM.getCharacterData(T1.getLocation())

[PATCH] D60281: [analyzer] Add docs for cplusplus.InnerPointer

2019-04-08 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: docs/analyzer/checkers.rst:225-226
+``std::string``s, by recognizing member functions that may re/deallocate the 
buffer
+before use. In the future, it would be great to add support for other STL and
+non-STL containers, and most notably, ``std::string_view``s.
+

Szelethus wrote:
> Hmm. While this page is a documentation, I would still expect regular users 
> to browse through it -- are we sure that we want to add future plans for a 
> non-alpha checker? I'm not against it, just a question.
I think it is a good idea. A non-alpha checker can also be further developed, 
by anyone else. It is good that we don't forget about further features. This 
note also highlights the limitations of the checker.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60281/new/

https://reviews.llvm.org/D60281



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57858: [analyzer] Add a new frontend flag to display all checker options

2019-03-26 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@dcoughlin I don't necessarily agree with you.
Let me explain why we think this feature is important.

We should give the users the possibility to list all possibly configurable 
checker options and their meaning.

Many of these options should be possible to be set by the end user to be able 
to fine tune the checker behaviour to match the analyzed code style.
Such examples are:
alpha.clone.CloneChecker:MinimumCloneComplexity
alpha.cplusplus.UninitializedObject:CheckPointeeInitialization
alpha.clone.CloneChecker:ReportNormalClones
alpha.cplusplus.UninitializedObject:IgnoreRecordsWithField
etc.
Actually except for the debug checker options and  
unix.DynamicMemoryModeling:Optimistic all these options are meaningful end-user 
options.

Of course there are the debug checker options which are not interesting for the 
end user, but debug checkers should anyway be ignored by the end-users.

We always wanted to add checker option listing to CodeChecker, however did not 
want to duplicate the option list and option documentation in the CodeChecker 
codebase.
It belongs to the analyzer and actually to the checker implementation.
So from CodeChecker we would like to invoke the "clang -cc1 
-analyzer-checker-option-help" to be able to list these options to the end 
users.

The same feature is available already in clang-tidy: clang-tidy -dump-config

I think it is the responsibility of the end-user to decide what option he may 
want to configure.

I understand you would like to differentiate between "developer" and "end-user" 
options.
What we could do maybe is to indicate in the option explanation that the given 
option is "analyzer-internal", "experimental" or "developer only".

What do you think about that?

In D57858#1432714 , @dcoughlin wrote:

> I'm pretty worried about exposing this flag to end users.
>
> - Almost none of the options you've listed are user facing. Many represent 
> options intended for use by static analyzer developers: debugging options, 
> feature flags, and checkers that were never finished. Others represent 
> mechanisms for build systems to control the behavior of the analyzer. Even 
> these are not meant for end users to interact with but rather for 
> implementers of build systems and IDEs. I don't think end users should have 
> to understand these options to use the analyzer.
> - The help text refers to analyzer implementation details (such as 
> "SymRegion") that users won't have the context or knowledge to understand.
> - The help text also recommends invoking -cc1 directly or through the driver 
> with -Xclang. Neither of these are supported end-user interfaces to the 
> analyzer. Instead, users should use scan-build or another tool (such as 
> CodeChecker) that was designed to be used by humans.





CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57858/new/

https://reviews.llvm.org/D57858



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-21 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked 13 inline comments as done.
dkrupp added a comment.

Thanks for your comments. I fixed them all. I also added the handling of 
redundant sizeof() and alignof() operators on the way. Please check if OK now...




Comment at: clang-tidy/misc/RedundantExpressionCheck.cpp:602
+static bool compareToks(Token &T1, Token &T2, const SourceManager &SM) {
+  if (T1.getLength() != T2.getLength())
+return false;

Szelethus wrote:
> alexfh wrote:
> > `Token::getLength()` will assert-fail for annotation tokens.
> I guess if `T1.isNot(tok::raw_identifier)` (or `T2`) we could get away with 
> simply comparing token kinds. If they are, it wouldn't assert. :)
I changed to code as per the suggestion from Szelethus.
So we will compare only raw_identifiers char-by-char, the rest by kind only. 
According to my tests, macros and types, typedefs for example are such 
raw_identifiers. 




Comment at: clang-tidy/misc/RedundantExpressionCheck.cpp:605
+  return strncmp(SM.getCharacterData(T1.getLocation()),
+ SM.getCharacterData(T2.getLocation()), T1.getLength()) == 0;
+}

alexfh wrote:
> JonasToth wrote:
> > This operation could overflow if `len(T1) > len(T2)` and `T2` is the last 
> > token of the file. I think `std::min(T1.getLength(), T2.getLength())` would 
> > be better.
> This code is only executed when they have the same length.
exactly, overflow should not happen.



Comment at: clang-tidy/misc/RedundantExpressionCheck.cpp:638
+  SourceLocation LLoc, RLoc;
+  int LRem, RRem;
+  do {//we compare the expressions token-by-token

Szelethus wrote:
> It seems like we're only checking whether these values are `0` or not, maybe 
> make them `bool`? Also, I find `Rem` a little cryptic, what is it referring 
> to? Maybe `RemainingCharCount`?
> 
> Maybe we should just make a simple function, so we could both boost 
> readability, and get rid of some these variables (`LLoc`, `RLoc`) entirely:
> 
> ```
> unsinged getCharCountUntilEndOfExpr(SourceRange ExprSR, Token T,
> const SourceManager &SM) {
> 
>   assert((ExprSR.getBegin() > T.getLocation() ||
>   ExprSR.getEnd() < T.getLocation()) &&
>  "Token is not within the expression range!");
> 
>   return SM.getCharacterData(SM.getExpansionLoc(ExprSR.getEnd())) -
>  SM.getCharacterData(T.getLocation());
> }
> ```
> 
> 
good point. Refactored ass suggested.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-21 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 179277.
dkrupp marked an inline comment as done.
dkrupp added a comment.

All comments fixed.

I also added the handling of redundancy comparison of sizeof(..), alignof() 
operators.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tidy/misc/RedundantExpressionCheck.cpp
  test/clang-tidy/misc-redundant-expression.cpp

Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -106,6 +106,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -114,16 +115,33 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+#define   COND_OP_THIRD_MACRO 8
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+
+  k += (y < 0) ? sizeof(I64) : sizeof(I64);
+  // CHECK-MESSAGES: :[[@LINE-1]]:30: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(TestConditional(k,y));
+  // CHECK-MESSAGES: :[[@LINE-1]]:47: warning: 'true' and 'false' expressions are equivalent
+  // No warning if the expression arguments are different.
+  k += (y < 0) ? sizeof(TestConditional(k,y)) : sizeof(Valid(k,y));
+
   return k;
 }
 #undef COND_OP_MACRO
 #undef COND_OP_OTHER_MACRO
 
+
 // Overloaded operators that compare two instances of a struct.
 struct MyStruct {
   int x;  
Index: clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -132,6 +132,18 @@
   case Stmt::BinaryOperatorClass:
 return cast(Left)->getOpcode() ==
cast(Right)->getOpcode();
+  case Stmt::UnaryExprOrTypeTraitExprClass:
+if (cast(Left)->isArgumentType() &&
+cast(Right)->isArgumentType())
+  return cast(Left)->getArgumentType() ==
+ cast(Right)->getArgumentType();
+else if (!cast(Left)->isArgumentType() &&
+ !cast(Right)->isArgumentType())
+  return areEquivalentExpr(
+  cast(Left)->getArgumentExpr(),
+  cast(Right)->getArgumentExpr());
+
+return false;
   }
 }
 
@@ -598,23 +610,66 @@
   return true;
 }
 
+static bool isSameToken(Token &T1, Token &T2, const SourceManager &SM) {
+  if (T1.getKind()!=T2.getKind())
+return false;
+  else if (T1.isNot(tok::raw_identifier))
+return true;
+  if (T1.getLength() != T2.getLength())
+return false;
+  return strncmp(SM.getCharacterData(T1.getLocation()),
+ SM.getCharacterData(T2.getLocation()), T1.getLength()) == 0;
+}
+
+unsigned getCharCountUntilEndOfExpr(SourceRange ExprSR, Token T,
+const SourceManager &SM) {
+  assert((T.getLocation() < ExprSR.getBegin()  ||
+  ExprSR.getEnd() < T.getLocation()) &&
+ "Token is not within the expression range!");
+
+  return SM.getCharacterData(SM.getExpansionLoc(ExprSR.getEnd())) -
+ SM.getCharacterData(T.getLocation());
+}
+
+/// Returns true if both LhsEpxr and RhsExpr are
+/// macro expressions and they are expanded
+/// from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange();
+  if (!Lsr.getBegin().isMacroID() || !Rsr.getBegin().isMacroID())
 return false;
-
   const SourceManager &SM = AstCtx->getSourceManager();
   const LangOptions &LO = AstCtx->getLangOpts();
 
-  return !(Lexer::getImmediateMacroName(LhsLoc, SM, LO) ==
-  Lexer::getImmediateMacroNa

[PATCH] D54429: [analyzer] Creating standard Sphinx documentation

2018-12-10 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

@dcoughlin @NoQ ping...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54429/new/

https://reviews.llvm.org/D54429



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55255: Fix a false positive in misplaced-widening-cast

2018-12-05 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D55255#1319784 , @JonasToth wrote:

> Committed, Thank you for the patch! Was there a bug-report for this issue? If 
> yes can you please close it/reference?


There was not bug report for this. Thanks for committing!


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55255/new/

https://reviews.llvm.org/D55255



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55255: Fix a false positive in misplaced-widening-cast

2018-12-04 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 176604.
dkrupp added a comment.

Comments addressed. Please commit if looks good, I don't have commit rights.
Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55255/new/

https://reviews.llvm.org/D55255

Files:
  clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
  test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp


Index: test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
===
--- test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
+++ test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
@@ -62,3 +62,21 @@
   enum Type {};
   static char *m_fn1() { char p = (Type)(&p - m_fn1()); }
 };
+
+enum DaysEnum {
+  MON,
+  TUE,
+  WED,
+  THR,
+  FRI,
+  SAT,
+  SUN
+};
+
+// Do not warn for int to enum casts.
+void nextDay(DaysEnum day) {
+  if (day < SUN)
+day = (DaysEnum)(day + 1);
+  if (day < SUN)
+day = static_cast(day + 1);
+}
Index: clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
===
--- clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
+++ clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
@@ -213,8 +213,9 @@
 dyn_cast(CastType->getUnqualifiedDesugaredType());
 const auto *CalcBuiltinType =
 dyn_cast(CalcType->getUnqualifiedDesugaredType());
-if (CastBuiltinType && CalcBuiltinType &&
-!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
+if (!CastBuiltinType || !CalcBuiltinType)
+  return;
+if (!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
   return;
   }
 


Index: test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
===
--- test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
+++ test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
@@ -62,3 +62,21 @@
   enum Type {};
   static char *m_fn1() { char p = (Type)(&p - m_fn1()); }
 };
+
+enum DaysEnum {
+  MON,
+  TUE,
+  WED,
+  THR,
+  FRI,
+  SAT,
+  SUN
+};
+
+// Do not warn for int to enum casts.
+void nextDay(DaysEnum day) {
+  if (day < SUN)
+day = (DaysEnum)(day + 1);
+  if (day < SUN)
+day = static_cast(day + 1);
+}
Index: clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
===
--- clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
+++ clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
@@ -213,8 +213,9 @@
 dyn_cast(CastType->getUnqualifiedDesugaredType());
 const auto *CalcBuiltinType =
 dyn_cast(CalcType->getUnqualifiedDesugaredType());
-if (CastBuiltinType && CalcBuiltinType &&
-!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
+if (!CastBuiltinType || !CalcBuiltinType)
+  return;
+if (!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
   return;
   }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55255: Fix a false positive in misplaced-widening-cast

2018-12-04 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added reviewers: JonasToth, alexfh.
dkrupp added a project: clang-tools-extra.
Herald added subscribers: cfe-commits, gamesh411, Szelethus, rnkovacs.

bugprone-misplaced-widening-cast check
used to give a false warning to the
following example.

  

enum DaysEnum{

  MON = 0,
  TUE = 1
  };


day = (DaysEnum)(day + 1);
 //warning: either cast from 'int' to 'DaysEnum' is ineffective...

But i think int to enum cast is not widening neither ineffective.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D55255

Files:
  clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
  test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp


Index: test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
===
--- test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
+++ test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
@@ -62,3 +62,29 @@
   enum Type {};
   static char *m_fn1() { char p = (Type)(&p - m_fn1()); }
 };
+
+enum DaysEnum{
+MON = 0,
+TUE = 1,
+WED = 2,
+THR = 3,
+FRI = 4,
+SAT = 5,
+SUN = 6
+};
+
+//do not warn for int to enum casts
+DaysEnum nextDay(DaysEnum day){
+  if (day < SUN)
+  day = (DaysEnum)(day + 1);
+  return day;
+}
+
+//do not warn for int to enum casts
+DaysEnum nextDay_cpp(DaysEnum day){
+  if (day < SUN)
+  day = static_cast(day + 1);
+  return day;
+}
+
+
Index: clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
===
--- clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
+++ clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
@@ -213,8 +213,9 @@
 dyn_cast(CastType->getUnqualifiedDesugaredType());
 const auto *CalcBuiltinType =
 dyn_cast(CalcType->getUnqualifiedDesugaredType());
-if (CastBuiltinType && CalcBuiltinType &&
-!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
+if (!CastBuiltinType || !CalcBuiltinType)
+  return;
+if (!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
   return;
   }
 


Index: test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
===
--- test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
+++ test/clang-tidy/bugprone-misplaced-widening-cast-explicit-only.cpp
@@ -62,3 +62,29 @@
   enum Type {};
   static char *m_fn1() { char p = (Type)(&p - m_fn1()); }
 };
+
+enum DaysEnum{
+MON = 0,
+TUE = 1,
+WED = 2,
+THR = 3,
+FRI = 4,
+SAT = 5,
+SUN = 6
+};
+
+//do not warn for int to enum casts
+DaysEnum nextDay(DaysEnum day){
+  if (day < SUN)
+  day = (DaysEnum)(day + 1);
+  return day;
+}
+
+//do not warn for int to enum casts
+DaysEnum nextDay_cpp(DaysEnum day){
+  if (day < SUN)
+  day = static_cast(day + 1);
+  return day;
+}
+
+
Index: clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
===
--- clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
+++ clang-tidy/bugprone/MisplacedWideningCastCheck.cpp
@@ -213,8 +213,9 @@
 dyn_cast(CastType->getUnqualifiedDesugaredType());
 const auto *CalcBuiltinType =
 dyn_cast(CalcType->getUnqualifiedDesugaredType());
-if (CastBuiltinType && CalcBuiltinType &&
-!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
+if (!CastBuiltinType || !CalcBuiltinType)
+  return;
+if (!isFirstWider(CastBuiltinType->getKind(), CalcBuiltinType->getKind()))
   return;
   }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-03 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked 2 inline comments as done.
dkrupp added inline comments.



Comment at: test/clang-tidy/misc-redundant-expression.cpp:109
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {

JonasToth wrote:
> dkrupp wrote:
> > JonasToth wrote:
> > > Could you please add a test where the macro is `undef`ed and redefined to 
> > > something else?
> > I am not sure what you exactly suggest here. It should not matter what 
> > COND_OP_THIRD_MACRO is defined to as we lexically compare tokens as they 
> > are written in the original source code.
> > 
> > Could you be a bit more specific what test case you would like to add? 
> *should* not matter is my point, please show that :)
undef/define testcase added and passes. @JonasToth is this what you thought of?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-03 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 176372.
dkrupp added a comment.

new undef/defined testcase added


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tidy/misc/RedundantExpressionCheck.cpp
  test/clang-tidy/misc-redundant-expression.cpp

Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -106,6 +106,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -114,16 +115,26 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+#define   COND_OP_THIRD_MACRO 8
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
+#undef COND_OP_THIRD_MACRO
+
   return k;
 }
 #undef COND_OP_MACRO
 #undef COND_OP_OTHER_MACRO
 
+
 // Overloaded operators that compare two instances of a struct.
 struct MyStruct {
   int x;  
Index: clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -598,23 +598,56 @@
   return true;
 }
 
+static bool compareToks(Token &T1, Token &T2, const SourceManager &SM) {
+  if (T1.getLength() != T2.getLength())
+return false;
+  return strncmp(SM.getCharacterData(T1.getLocation()),
+ SM.getCharacterData(T2.getLocation()), T1.getLength()) == 0;
+}
+
+//Returns true if both LhsEpxr and RhsExpr are
+//macro expressions and they are expanded
+//from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange();
+  if (!Lsr.getBegin().isMacroID() || !Rsr.getBegin().isMacroID())
 return false;
-
   const SourceManager &SM = AstCtx->getSourceManager();
   const LangOptions &LO = AstCtx->getLangOpts();
 
-  return !(Lexer::getImmediateMacroName(LhsLoc, SM, LO) ==
-  Lexer::getImmediateMacroName(RhsLoc, SM, LO));
+  std::pair LsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Lsr.getBegin()));
+  std::pair RsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Rsr.getBegin()));
+  const llvm::MemoryBuffer *MB = SM.getBuffer(LsrLocInfo.first);
+
+  const char *LTokenPos = MB->getBufferStart() + LsrLocInfo.second;
+  const char *RTokenPos = MB->getBufferStart() + RsrLocInfo.second;
+  Lexer LRawLex(SM.getLocForStartOfFile(LsrLocInfo.first), LO,
+   MB->getBufferStart(), LTokenPos, MB->getBufferEnd());
+  Lexer RRawLex(SM.getLocForStartOfFile(RsrLocInfo.first), LO,
+   MB->getBufferStart(), RTokenPos, MB->getBufferEnd());
+
+  Token LTok, RTok;
+  SourceLocation LLoc, RLoc;
+  int LRem, RRem;
+  do {//we compare the expressions token-by-token
+LRawLex.LexFromRawLexer(LTok);
+RRawLex.LexFromRawLexer(RTok);
+LLoc = LTok.getLocation();
+RLoc = RTok.getLocation();
+LRem = SM.getCharacterData(SM.getExpansionLoc(Lsr.getEnd())) -
+   SM.getCharacterData(LLoc);//remaining characters until the end of expr
+RRem = SM.getCharacterData(SM.getExpansionLoc(Rsr.getEnd())) -
+   SM.getCharacterData(RLoc);
+  } while (!LTok.is(tok::eof) && !RTok.is(tok::eof) &&
+   compareToks(LTok, RTok, SM) && LRem > 0 && RRem > 0);
+  return !((LRem == 0 && RRem == 0) && compareToks(LTok, RTok, SM));
 }
 
 static bool areExprsMacroAndNonMacro(const Expr *&LhsExpr,
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-01 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

In D55125#1315335 , @Szelethus wrote:

> I see your point, but here's why I think it isn't a bug: I like to see macros 
> as `constexpr` variables, and if I used those instead, I personally wouldn't 
> like to get a warning just because they have the same value. In C, silencing 
> such a warning isn't even really possible.
>
> Another interesting thought, @donat.nagy's check works by comparing actual 
> nodes of the AST, while this one would work with `Lexer`, but they almost 
> want to do the same thing, the only difference is that the first checks 
> whether two pieces of code are equivalent, and the second checks whether they 
> are the same. Maybe it'd be worth to extract the logic into a simple 
> `areStmtsEqual(const Stmt *S1, const Stmt *S2, bool ShouldCompareLexically = 
> false)` function, that would do AST based comparison if the last param is set 
> to false, and would use `Lexer` if set to true. After that, we could just add 
> command line options to both of these checks, to leave it up to the user to 
> choose in between them. Maybe folks who suffer from heavily macro-infested 
> code would rather bear the obvious performance hit `Lexer` causes for little 
> more precise warnings, and (for example) users of C++11 (because there are 
> few excuses not to prefer `constexpr` there) could run in AST mode.
>
> edit: I'm not actually all that sure about the performance hit. Just a guess.
>
> But I'm just thinking aloud really.


Wiring out the lexical comparison and AST based comparison sounds like an 
interesting idea, however IMHO such a setting is too coarse-grained on the file 
level.  My guess would be that it depends from expression (fragment) to 
expression fragment how you want to compare them: for macros lexical comparison 
is better, for arithmetic expressions with many parentheses AST based recursive 
comparison may be a better fit (as implemented also in this checker).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-01 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked 3 inline comments as done.
dkrupp added a comment.

In D55125#1314788 , @JonasToth wrote:

> In D55125#1314741 , @Szelethus wrote:
>
> > @JonasToth this is the `Lexer` based expression equality check I talked 
> > about in D54757#1311516 .  The 
> > point of this patch is that the definition is a macro sure could be build 
> > dependent, but the macro name is not, so it wouldn't warn on the case I 
> > showed. What do you think?
>
>
> Yes, this approach is possible.
>  IMHO it is still a bug/redudant if you do the same thing on both paths and 
> warning on it makes the programmer aware of the fact. E.g. the macros might 
> have been something different before, but a refactoring made them equal and 
> resulted in this situation.


@JonasThoth: I see you point with refactoring, but let's imagine that the two  
macros COND_OP_MACRO and COND_OP_THIRD_MACRO are defined by compile time 
parameters. If the two macros are happened to be defined to the same value the 
user would get a warning, and if not she would not.  How would the user would 
"fix" her code in the first case? So if the macro names are different in the 
conditional expression, it is not a redundant expression because the macro 
names are compile time parameters (with just eventually same values).  This was 
the same logic the old test case were testing:  k += (y < 0) ? COND_OP_MACRO : 
COND_OP_OTHER_MACRO;




Comment at: test/clang-tidy/misc-redundant-expression.cpp:109
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {

JonasToth wrote:
> Could you please add a test where the macro is `undef`ed and redefined to 
> something else?
I am not sure what you exactly suggest here. It should not matter what 
COND_OP_THIRD_MACRO is defined to as we lexically compare tokens as they are 
written in the original source code.

Could you be a bit more specific what test case you would like to add? 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D55125: [clang-tidy] Fix a false positive in misc-redundant-expression check

2018-12-01 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 176247.
dkrupp added a comment.

-clang-format applied
-clang:: namespace qualifiers removed


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55125/new/

https://reviews.llvm.org/D55125

Files:
  clang-tidy/misc/RedundantExpressionCheck.cpp
  test/clang-tidy/misc-redundant-expression.cpp


Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -106,6 +106,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -114,11 +115,15 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions 
are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions 
are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + 
COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions 
are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different 
macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another 
macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
   return k;
 }
 #undef COND_OP_MACRO
Index: clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -598,23 +598,56 @@
   return true;
 }
 
+static bool compareToks(Token &T1, Token &T2, const SourceManager &SM) {
+  if (T1.getLength() != T2.getLength())
+return false;
+  return strncmp(SM.getCharacterData(T1.getLocation()),
+ SM.getCharacterData(T2.getLocation()), T1.getLength()) == 0;
+}
+
+//Returns true if both LhsEpxr and RhsExpr are
+//macro expressions and they are expanded
+//from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange();
+  if (!Lsr.getBegin().isMacroID() || !Rsr.getBegin().isMacroID())
 return false;
-
   const SourceManager &SM = AstCtx->getSourceManager();
   const LangOptions &LO = AstCtx->getLangOpts();
 
-  return !(Lexer::getImmediateMacroName(LhsLoc, SM, LO) ==
-  Lexer::getImmediateMacroName(RhsLoc, SM, LO));
+  std::pair LsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Lsr.getBegin()));
+  std::pair RsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Rsr.getBegin()));
+  const llvm::MemoryBuffer *MB = SM.getBuffer(LsrLocInfo.first);
+
+  const char *LTokenPos = MB->getBufferStart() + LsrLocInfo.second;
+  const char *RTokenPos = MB->getBufferStart() + RsrLocInfo.second;
+  Lexer LRawLex(SM.getLocForStartOfFile(LsrLocInfo.first), LO,
+   MB->getBufferStart(), LTokenPos, MB->getBufferEnd());
+  Lexer RRawLex(SM.getLocForStartOfFile(RsrLocInfo.first), LO,
+   MB->getBufferStart(), RTokenPos, MB->getBufferEnd());
+
+  Token LTok, RTok;
+  SourceLocation LLoc, RLoc;
+  int LRem, RRem;
+  do {//we compare the expressions token-by-token
+LRawLex.LexFromRawLexer(LTok);
+RRawLex.LexFromRawLexer(RTok);
+LLoc = LTok.getLocation();
+RLoc = RTok.getLocation();
+LRem = SM.getCharacterData(SM.getExpansionLoc(Lsr.getEnd())) -
+   SM.getCharacterData(LLoc);//remaining characters until the end of 
expr
+RRem = SM.getCharacterData(SM.getExpansionLoc(Rsr.getEnd())) -
+   SM.getCharacterData(RLoc);
+  } while (!LTok.is(tok::eof) && !RTok.is(tok::eof) &&
+   compareToks(LTok, RTok, SM) && LRem > 0 && RRem > 0);
+  return !((LRem == 0 && RRem == 0) && compareToks(LTok, RTok, SM));
 }
 
 static bool areExprsMacroAndNonMacro(const Expr *&LhsExpr,


Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -106,6 +106,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -114,11 +115,15 @@
   // CHECK-MESSAG

[PATCH] D55125: Fix a false positive in misc-redundant-expression check

2018-11-30 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added reviewers: alexfh, aaron.ballman, gamesh411.
dkrupp added a project: clang-tools-extra.
Herald added subscribers: cfe-commits, Szelethus, rnkovacs.

Do not warn for redundant conditional expressions
when the true and false branches are expanded from
different macros even when they are
defined by one another.

We used to get a false warning in the following case:
define M1  1
#define M2 M1 
int test(int cond){
 return cond ? M1  : M2;//false warning: 'true' 
and 'false' expressions are equivalent
}

The problem was that the Lexer::getImmediateMacroName() call was used for 
comparing macros, which returned "M1 " for the 
expansion of M1  and M2.
Now we are comparing macros from token to token.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D55125

Files:
  clang-tidy/misc/RedundantExpressionCheck.cpp
  test/clang-tidy/misc-redundant-expression.cpp


Index: test/clang-tidy/misc-redundant-expression.cpp
===
--- test/clang-tidy/misc-redundant-expression.cpp
+++ test/clang-tidy/misc-redundant-expression.cpp
@@ -106,6 +106,7 @@
 
 #define COND_OP_MACRO 9
 #define COND_OP_OTHER_MACRO 9
+#define COND_OP_THIRD_MACRO COND_OP_MACRO
 int TestConditional(int x, int y) {
   int k = 0;
   k += (y < 0) ? x : x;
@@ -114,11 +115,15 @@
   // CHECK-MESSAGES: :[[@LINE-1]]:24: warning: 'true' and 'false' expressions 
are equivalent
   k += (y < 0) ? COND_OP_MACRO : COND_OP_MACRO;
   // CHECK-MESSAGES: :[[@LINE-1]]:32: warning: 'true' and 'false' expressions 
are equivalent
+  k += (y < 0) ? COND_OP_MACRO + COND_OP_OTHER_MACRO : COND_OP_MACRO + 
COND_OP_OTHER_MACRO;
+  // CHECK-MESSAGES: :[[@LINE-1]]:54: warning: 'true' and 'false' expressions 
are equivalent
 
   // Do not match for conditional operators with a macro and a const.
   k += (y < 0) ? COND_OP_MACRO : 9;
   // Do not match for conditional operators with expressions from different 
macros.
   k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;
+  // Do not match for conditional operators when a macro is defined to another 
macro
+  k += (y < 0) ? COND_OP_MACRO : COND_OP_THIRD_MACRO;
   return k;
 }
 #undef COND_OP_MACRO
Index: clang-tidy/misc/RedundantExpressionCheck.cpp
===
--- clang-tidy/misc/RedundantExpressionCheck.cpp
+++ clang-tidy/misc/RedundantExpressionCheck.cpp
@@ -598,23 +598,55 @@
   return true;
 }
 
+static bool compareToks(Token &T1, Token &T2, const SourceManager &SM){
+  if (T1.getLength()!=T2.getLength())
+return false;
+  return 
strncmp(SM.getCharacterData(T1.getLocation()),SM.getCharacterData(T2.getLocation()),T1.getLength())==0;
+}
+
+//Returns true if both LhsEpxr and RhsExpr are
+//macro expressions and they are expanded
+//from different macros.
 static bool areExprsFromDifferentMacros(const Expr *LhsExpr,
 const Expr *RhsExpr,
 const ASTContext *AstCtx) {
   if (!LhsExpr || !RhsExpr)
 return false;
-
-  SourceLocation LhsLoc = LhsExpr->getExprLoc();
-  SourceLocation RhsLoc = RhsExpr->getExprLoc();
-
-  if (!LhsLoc.isMacroID() || !RhsLoc.isMacroID())
+  SourceRange Lsr = LhsExpr->getSourceRange();
+  SourceRange Rsr = RhsExpr->getSourceRange();
+  if (!Lsr.getBegin().isMacroID() || !Rsr.getBegin().isMacroID())
 return false;
-
   const SourceManager &SM = AstCtx->getSourceManager();
   const LangOptions &LO = AstCtx->getLangOpts();
 
-  return !(Lexer::getImmediateMacroName(LhsLoc, SM, LO) ==
-  Lexer::getImmediateMacroName(RhsLoc, SM, LO));
+  std::pair LsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Lsr.getBegin()));
+  std::pair RsrLocInfo =
+  SM.getDecomposedLoc(SM.getExpansionLoc(Rsr.getBegin()));
+  const llvm::MemoryBuffer *MB = SM.getBuffer(LsrLocInfo.first);
+
+  const char *LTokenPos = MB->getBufferStart() + LsrLocInfo.second;
+  const char *RTokenPos = MB->getBufferStart() + RsrLocInfo.second;
+  clang::Lexer LRawLex(SM.getLocForStartOfFile(LsrLocInfo.first), LO,
+   MB->getBufferStart(), LTokenPos, MB->getBufferEnd());
+  clang::Lexer RRawLex(SM.getLocForStartOfFile(RsrLocInfo.first), LO,
+   MB->getBufferStart(), RTokenPos, MB->getBufferEnd());
+
+  clang::Token LTok, RTok;
+  SourceLocation LLoc, RLoc;
+  int LRem, RRem;
+  do {//we compare the expressions token-by-token
+LRawLex.LexFromRawLexer(LTok);
+RRawLex.LexFromRawLexer(RTok);
+LLoc = LTok.getLocation();
+RLoc = RTok.getLocation();
+LRem = SM.getCharacterData(SM.getExpansionLoc(Lsr.getEnd())) -
+   SM.getCharacterData(LLoc);//remaining characters until the end of 
expr
+RRem = SM.getCharacterData(SM.getExpansionLoc(Rsr.getEnd())) -
+   SM.getCharacterD

[PATCH] D54429: [analyzer] Creating standard Sphinx documentation

2018-11-23 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.
Herald added a subscriber: gamesh411.

@dcoughlin could you please look into this?


https://reviews.llvm.org/D54429



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D54429: [analyzer] Creating standard Sphinx documentation

2018-11-13 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 173846.
dkrupp added a comment.

-scanbuild and xcode pictures are included now
-intro text ("What is Static Analysis?" etc.) are put under the Introduction 
section
-Download section is created, but I am not sure how well was the this Mac OSX 
binary release section was maintained. Should users really download from this 
site or through a package manager instead?

I will transform all the subpages after we come to an agreement that this is 
the way to go. For now, there are links to the old sub-pages from the main page.

@dcoughlin what do you think?

This is the current status.
Done:
Main: https://clang-analyzer.llvm.org/
Main/User manual/available checks: 
https://clang-analyzer.llvm.org/available_checks.html
Main/Mailing Lists

Not done:
Main/Filing Bugs: https://clang-analyzer.llvm.org/filing_bugs.html
Main/User manual/Obtaining the analyzer: 
https://clang-analyzer.llvm.org/installation.html
Main/User manual/Command line usage: 
https://clang-analyzer.llvm.org/scan-build.html
Main/User manual/Using within XCode: https://clang-analyzer.llvm.org/xcode.html
Main/Development/Checker Developer Manual: 
https://clang-analyzer.llvm.org/checker_dev_manual.html
Main/Development/Open Projects: 
https://clang-analyzer.llvm.org/open_projects.html
Main/Development/Potential Future Checkers: 
https://clang-analyzer.llvm.org/potential_checkers.html
Main/User manual/Source level annotations: 
https://clang-analyzer.llvm.org/annotations.html
Main/User manual/FAQ: https://clang-analyzer.llvm.org/faq.html


https://reviews.llvm.org/D54429

Files:
  docs/ClangStaticAnalyzer.rst
  docs/analyzer/DesignDiscussions/IPA.rst
  docs/analyzer/DesignDiscussions/InitializerLists.rst
  docs/analyzer/DesignDiscussions/RegionStore.rst
  docs/analyzer/DesignDiscussions/nullability.rst
  docs/analyzer/IPA.txt
  docs/analyzer/RegionStore.txt
  docs/analyzer/checkers.rst
  docs/analyzer/checkers/UninitializedObject.rst
  docs/analyzer/images/analyzer_xcode_html.png
  docs/analyzer/nullability.rst
  docs/conf.py
  docs/index.rst

Index: docs/index.rst
===
--- docs/index.rst
+++ docs/index.rst
@@ -23,6 +23,7 @@
AttributeReference
DiagnosticsReference
CrossCompilation
+   ClangStaticAnalyzer
ThreadSafetyAnalysis
AddressSanitizer
ThreadSanitizer
Index: docs/conf.py
===
--- docs/conf.py
+++ docs/conf.py
@@ -65,7 +65,7 @@
 
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
-exclude_patterns = ['_build', 'analyzer']
+exclude_patterns = ['_build']
 
 # The reST default role (used for this markup: `text`) to use for all documents.
 #default_role = None
Index: docs/analyzer/nullability.rst
===
--- docs/analyzer/nullability.rst
+++ /dev/null
@@ -1,92 +0,0 @@
-
-Nullability Checks
-
-
-This document is a high level description of the nullablility checks.
-These checks intended to use the annotations that is described in this
-RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html.
-
-Let's consider the following 2 categories:
-
-1) nullable
-
-
-If a pointer 'p' has a nullable annotation and no explicit null check or assert, we should warn in the following cases:
-- 'p' gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter.
-- 'p' gets dereferenced
-
-Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers.
-
-Explicit cast from nullable to nonnul::
-
-__nullable id foo;
-id bar = foo;
-takesNonNull((_nonnull) bar); <— should not warn here (backward compatibility hack)
-anotherTakesNonNull(bar); <— would be great to warn here, but not necessary(*)
-
-Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*).
- 
-2) nonnull
-
-
-- Dereferencing a nonnull, or sending message to it is ok.
-- Converting nonnull to nullable is Ok.
-- When there is an explicit cast from nonnull to nullable I will trust the cast (it is probable there for a reason, because this cast does not suppress any warnings or errors).
-- But what sho

[PATCH] D54429: [analyzer] Creating standard shpinx documentation

2018-11-12 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 173679.
dkrupp added a comment.

making the diff full context.


Repository:
  rC Clang

https://reviews.llvm.org/D54429

Files:
  docs/ClangStaticAnalyzer.rst
  docs/analyzer/DesignDiscussions/IPA.rst
  docs/analyzer/DesignDiscussions/InitializerLists.rst
  docs/analyzer/DesignDiscussions/RegionStore.rst
  docs/analyzer/DesignDiscussions/nullability.rst
  docs/analyzer/IPA.txt
  docs/analyzer/RegionStore.txt
  docs/analyzer/checkers.rst
  docs/analyzer/checkers/UninitializedObject.rst
  docs/analyzer/nullability.rst
  docs/conf.py
  docs/index.rst

Index: docs/index.rst
===
--- docs/index.rst
+++ docs/index.rst
@@ -23,6 +23,7 @@
AttributeReference
DiagnosticsReference
CrossCompilation
+   ClangStaticAnalyzer
ThreadSafetyAnalysis
AddressSanitizer
ThreadSanitizer
Index: docs/conf.py
===
--- docs/conf.py
+++ docs/conf.py
@@ -65,7 +65,7 @@
 
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
-exclude_patterns = ['_build', 'analyzer']
+exclude_patterns = ['_build']
 
 # The reST default role (used for this markup: `text`) to use for all documents.
 #default_role = None
Index: docs/analyzer/nullability.rst
===
--- docs/analyzer/nullability.rst
+++ /dev/null
@@ -1,92 +0,0 @@
-
-Nullability Checks
-
-
-This document is a high level description of the nullablility checks.
-These checks intended to use the annotations that is described in this
-RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html.
-
-Let's consider the following 2 categories:
-
-1) nullable
-
-
-If a pointer 'p' has a nullable annotation and no explicit null check or assert, we should warn in the following cases:
-- 'p' gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter.
-- 'p' gets dereferenced
-
-Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers.
-
-Explicit cast from nullable to nonnul::
-
-__nullable id foo;
-id bar = foo;
-takesNonNull((_nonnull) bar); <— should not warn here (backward compatibility hack)
-anotherTakesNonNull(bar); <— would be great to warn here, but not necessary(*)
-
-Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*).
- 
-2) nonnull
-
-
-- Dereferencing a nonnull, or sending message to it is ok.
-- Converting nonnull to nullable is Ok.
-- When there is an explicit cast from nonnull to nullable I will trust the cast (it is probable there for a reason, because this cast does not suppress any warnings or errors).
-- But what should we do about null checks?::
-
-__nonnull id takesNonnull(__nonnull id x) {
-if (x == nil) {
-// Defensive backward compatible code:
-
-return nil; <- Should the analyzer cover this piece of code? Should we require the cast (__nonnull)nil?
-}
-
-}
-
-There are these directions:
-- We can either take the branch; this way the branch is analyzed
-	- Should we not warn about any nullability issues in that branch? Probably not, it is ok to break the nullability postconditions when the nullability preconditions are violated.
-- We can assume that these pointers are not null and we lose coverage with the analyzer. (This can be implemented either in constraint solver or in the checker itself.)
-
-Other Issues to keep in mind/take care of:
-Messaging:
-- Sending a message to a nullable pointer
-	- Even though the method might return a nonnull pointer, when it was sent to a nullable pointer the return type will be nullable.
-	- The result is nullable unless the receiver is known to be non null.
-- Sending a message to a unspecified or nonnull pointer
-	- If the pointer is not assumed to be nil, we should be optimistic and use the nullability implied by the method.
-- This will not happen automatically, since the AST will have null unspecified in this case.
-
-Inlining
-
-
-A symbol may need to be treated differently inside an inlined body. For example, consider these 

[PATCH] D54429: Creating standard shpinx documentation for Clang Static Analyzer

2018-11-12 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp created this revision.
dkrupp added reviewers: Szelethus, NoQ, george.karpenkov.
dkrupp added a project: clang.
Herald added subscribers: cfe-commits, donat.nagy, jfb, arphaman, a.sidorin, 
rnkovacs, baloghadamsoftware, whisperity.

Standard Clang tools (ThreadSanitizer, MemorySanitizer, DataFlowSanitizer etc.)
all have documentation delivered with Clang and build using Sphinx tool.
Would be great to have the ClangSA docs there too in sphinx markup.

I wonder what you guys think about this approach?

  

In this patch the analyzer's documentation main and checker pages are
transformed into Sphinx rst to be part of the standard clang documentation 
tree: https://clang.llvm.org/docs/

The generated new documentation would look like this:
http://cc.elte.hu/clang-docs/docs/html/ClangStaticAnalyzer.html

 

Advantages:

  

- The documentation is easier to be maintained in Sphinx markup compared to raw 
HTML
- The documentation is easier found in the standard clang doc tree
- The generated documentation looks nicer ;)

In detail this patch contains the following contributions:

  

- Transformation of the main page https://clang-analyzer.llvm.org/ to 
http://cc.elte.hu/clang-docs/docs/html/ClangStaticAnalyzer.html
- Transformation of the https://clang-analyzer.llvm.org/available_checks.html 
and https://clang-analyzer.llvm.org/alpha_checks.html pages to 
http://cc.elte.hu/clang-docs/docs/html/analyzer/checkers.html
- Per checker page added for alpha.cplusplus.UninitializedObject checker. Would 
be great to create similar doc for all checkers in the long term. See 
http://cc.elte.hu/clang-docs/docs/html/analyzer/checkers/UninitializedObject.html
- Design discussion documents were renamed and formatted to rst and linked from 
the Development/Design Discussions section.

If you agree with the direction I can transform the rest of the 
https://clang-analyzer.llvm.org/ documentation into rst. But most importantly 
we would like to have a per checker one pager doc like the one in 
http://cc.elte.hu/clang-docs/docs/html/analyzer/checkers/UninitializedObject.html

RST seemed to be a better format than plain HTML. Then we could make mandatory 
for checker author to write such one pagers to their checkers similar to 
clang-tidy...


Repository:
  rC Clang

https://reviews.llvm.org/D54429

Files:
  docs/ClangStaticAnalyzer.rst
  docs/analyzer/DesignDiscussions/IPA.rst
  docs/analyzer/DesignDiscussions/InitializerLists.rst
  docs/analyzer/DesignDiscussions/RegionStore.rst
  docs/analyzer/DesignDiscussions/nullability.rst
  docs/analyzer/IPA.txt
  docs/analyzer/RegionStore.txt
  docs/analyzer/checkers.rst
  docs/analyzer/checkers/UninitializedObject.rst
  docs/analyzer/nullability.rst
  docs/conf.py
  docs/index.rst

Index: docs/index.rst
===
--- docs/index.rst
+++ docs/index.rst
@@ -23,6 +23,7 @@
AttributeReference
DiagnosticsReference
CrossCompilation
+   ClangStaticAnalyzer
ThreadSafetyAnalysis
AddressSanitizer
ThreadSanitizer
Index: docs/conf.py
===
--- docs/conf.py
+++ docs/conf.py
@@ -65,7 +65,7 @@
 
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
-exclude_patterns = ['_build', 'analyzer']
+exclude_patterns = ['_build']
 
 # The reST default role (used for this markup: `text`) to use for all documents.
 #default_role = None
Index: docs/analyzer/nullability.rst
===
--- docs/analyzer/nullability.rst
+++ /dev/null
@@ -1,92 +0,0 @@
-
-Nullability Checks
-
-
-This document is a high level description of the nullablility checks.
-These checks intended to use the annotations that is described in this
-RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html.
-
-Let's consider the following 2 categories:
-
-1) nullable
-
-
-If a pointer 'p' has a nullable annotation and no explicit null check or assert, we should warn in the following cases:
-- 'p' gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter.
-- 'p' gets dereferenced
-
-Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers.
-
-Explicit cast from nullable to nonnul::
-
-__nullable id foo;
-id bar = foo;
-takesNonNull((_nonnull) bar); <— should not warn here (backward compatibility hack)
-anotherTakesNonNull(bar); <— would be great to warn here, but not necessary(*)
-
-Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings w

[PATCH] D53024: [analyzer][www] Add more open projects

2018-10-17 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: www/analyzer/open_projects.html:198
+  or using a dataflow framework.
+  (Difficulty: Hard)
+

Probably it is worth mentioning here, that there is a macro language already 
for describing summaries of standard library functions in 
StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp. 

This macro language could be refactored to a separate header file so it could 
be used in other checkers too. Could also be extended for C++. 

Another useful addition would be to enable users to describe these summaries in 
run-time (in YAML files for example) to be able to model their own proprietary 
library functions. 

Then as a next step we could introduce a flow sensitive analysis to generate 
such summaries automatically. Which is a hard problem indeed, the others above 
should not be too difficult


Repository:
  rC Clang

https://reviews.llvm.org/D53024



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

2018-07-18 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

> Which means that for some calls we aren't even trying to make a CTU lookup.

Thanks @NoQ, we will take a look at it!


Repository:
  rC Clang

https://reviews.llvm.org/D30691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48831: alpha.unix.cstring.OutOfBounds checker enable/disable fix

2018-07-13 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked an inline comment as done.
dkrupp added a comment.

@NoQ do we need any more update to this patch? Thanks.


https://reviews.llvm.org/D48831



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48831: alpha.unix.cstring.OutOfBounds checker enable/disable fix

2018-07-03 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp marked 2 inline comments as done.
dkrupp added inline comments.



Comment at: lib/StaticAnalyzer/Checkers/CStringChecker.cpp:311
+if (!Filter.CheckCStringOutOfBounds)
+  return StOutBound;
 

NoQ wrote:
> Could we preserve the other portion of the assertion on this branch? I.e., 
> `assert(Filter.CheckCStringNullArg)`.
> 
> Additionally, do you really want to continue analysis on this path? Maybe 
> `return nullptr` to sink?
I was unsure whether to return nullptr or StOutBound. I thought that 
alpha.unix.cstring.OutOfBounds is in alpha because it may falsely detect buffer 
overflows and then we would cut the path unnecessarily.  
But OK, it is safer this way.

I could not put back the assertion, because if only unix.Malloc checker is 
enabled (and CStringOutOfBounds and CStringNullArg are not) the assertion is 
not true.



https://reviews.llvm.org/D48831



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48831: alpha.unix.cstring.OutOfBounds checker enable/disable fix

2018-07-03 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp updated this revision to Diff 153905.
dkrupp added a comment.

The patch has been updated.
Changes:

-The analysis path is cut if overvlow is detected even if CStringOutOfBounds is 
disabled

The assert(Filter.CheckCStringOutOfBounds || Filter.CheckCStringNullArg); 
cannot be put back, because if both checkers are disabled, the assertion is not 
true.


https://reviews.llvm.org/D48831

Files:
  lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  test/Analysis/cstring-plist.c
  test/Analysis/malloc.c
  test/Analysis/string.c

Index: test/Analysis/string.c
===
--- test/Analysis/string.c
+++ test/Analysis/string.c
@@ -31,6 +31,8 @@
 void clang_analyzer_eval(int);
 
 int scanf(const char *restrict format, ...);
+void *malloc(size_t);
+void free(void *);
 
 //===--===
 // strlen()
@@ -110,7 +112,7 @@
   if (a == 0) {
 clang_analyzer_eval(b == 0); // expected-warning{{TRUE}}
 // Make sure clang_analyzer_eval does not invalidate globals.
-clang_analyzer_eval(strlen(global_str) == 0); // expected-warning{{TRUE}}
+clang_analyzer_eval(strlen(global_str) == 0); // expected-warning{{TRUE}}
   }
 
   // Call a function with unknown effects, which should invalidate globals.
@@ -309,11 +311,13 @@
   clang_analyzer_eval(globalInt == 42); // expected-warning{{TRUE}}
 }
 
+#ifndef SUPPRESS_OUT_OF_BOUND
 void strcpy_overflow(char *y) {
   char x[4];
   if (strlen(y) == 4)
 strcpy(x, y); // expected-warning{{String copy function overflows destination buffer}}
 }
+#endif
 
 void strcpy_no_overflow(char *y) {
   char x[4];
@@ -348,11 +352,13 @@
   clang_analyzer_eval(a == x[0]); // expected-warning{{UNKNOWN}}
 }
 
+#ifndef SUPPRESS_OUT_OF_BOUND
 void stpcpy_overflow(char *y) {
   char x[4];
   if (strlen(y) == 4)
 stpcpy(x, y); // expected-warning{{String copy function overflows destination buffer}}
 }
+#endif
 
 void stpcpy_no_overflow(char *y) {
   char x[4];
@@ -403,6 +409,7 @@
   clang_analyzer_eval((int)strlen(x) == (orig_len + strlen(y))); // expected-warning{{TRUE}}
 }
 
+#ifndef SUPPRESS_OUT_OF_BOUND
 void strcat_overflow_0(char *y) {
   char x[4] = "12";
   if (strlen(y) == 4)
@@ -420,6 +427,7 @@
   if (strlen(y) == 2)
 strcat(x, y); // expected-warning{{String copy function overflows destination buffer}}
 }
+#endif
 
 void strcat_no_overflow(char *y) {
   char x[5] = "12";
@@ -496,6 +504,15 @@
   clang_analyzer_eval(a == x[0]); // expected-warning{{UNKNOWN}}
 }
 
+#ifndef SUPPRESS_OUT_OF_BOUND
+// Enabling the malloc checker enables some of the buffer-checking portions
+// of the C-string checker.
+void cstringchecker_bounds_nocrash() {
+  char *p = malloc(2);
+  strncpy(p, "AAA", sizeof("AAA")); // expected-warning {{Size argument is greater than the length of the destination buffer}}
+  free(p);
+}
+
 void strncpy_overflow(char *y) {
   char x[4];
   if (strlen(y) == 4)
@@ -516,6 +533,7 @@
   if (strlen(y) == 3)
 strncpy(x, y, n); // expected-warning{{Size argument is greater than the length of the destination buffer}}
 }
+#endif
 
 void strncpy_truncate(char *y) {
   char x[4];
@@ -592,6 +610,7 @@
   clang_analyzer_eval(strlen(x) == (orig_len + strlen(y))); // expected-warning{{TRUE}}
 }
 
+#ifndef SUPPRESS_OUT_OF_BOUND
 void strncat_overflow_0(char *y) {
   char x[4] = "12";
   if (strlen(y) == 4)
@@ -615,6 +634,8 @@
   if (strlen(y) == 4)
 strncat(x, y, 2); // expected-warning{{Size argument is greater than the free space in the destination buffer}}
 }
+#endif
+
 void strncat_no_overflow_1(char *y) {
   char x[5] = "12";
   if (strlen(y) == 2)
@@ -632,6 +653,7 @@
   clang_analyzer_eval(strlen(dst) >= 4); // expected-warning{{TRUE}}
 }
 
+#ifndef SUPPRESS_OUT_OF_BOUND
 void strncat_symbolic_src_length(char *src) {
   char dst[8] = "1234";
   strncat(dst, src, 3);
@@ -649,6 +671,7 @@
   char dst2[8] = "1234";
   strncat(dst2, &src[offset], 4); // expected-warning{{Size argument is greater than the free space in the destination buffer}}
 }
+#endif
 
 // There is no strncat_unknown_dst_length because if we can't get a symbolic
 // length for the "before" strlen, we won't be able to set one for "after".
Index: test/Analysis/malloc.c
===
--- test/Analysis/malloc.c
+++ test/Analysis/malloc.c
@@ -375,16 +375,16 @@
 // or inter-procedural analysis, this is a conservative answer.
 int *f3() {
   static int *p = 0;
-  p = malloc(12); 
+  p = malloc(12);
   return p; // no-warning
 }
 
 // This case tests that storing malloc'ed memory to a static global variable
 // which is then returned is not leaked.  In the absence of known contracts for
 // functions or inter-procedural analysis, this is a conservative answer.
 static int *p_f4 = 0;
 int *f4() {
-  p_f4 = malloc(12); 
+  p_f4 = malloc(12);
   return p_f4; // no-warning
 }
 
@@ -1232,7 +1232,7 @@
 
   if (myValueSize <= sizeof(stackBuf

[PATCH] D45532: [StaticAnalyzer] Checker to find uninitialized fields after a constructor call

2018-04-16 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

Would be interesting to extend this checker (maybe in an upcoming patch) to 
report on uninitialized members not only in constructors, but also copy 
constructors and move constructors.

See related https://bugs.llvm.org/show_bug.cgi?id=37086

This bug report also mentions assignment operator. But for that a warning may 
be not so useful. In that case the members of the assigned to object should 
have some initialized value already which the programmer may not want to 
overwrite in the assignment operator.


https://reviews.llvm.org/D45532



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41150: [CFG] Adding new CFGStmt LoopEntrance for the StaticAnalyzer

2017-12-15 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added inline comments.



Comment at: include/clang/Analysis/CFG.h:179
 /// entered.
+class CFGLoopEntrance : public CFGElement {
+public:

This comment refers to the CFGLoopExit class. Please add a separate explaining 
comment to the CFGLoopEntrance. 


https://reviews.llvm.org/D41150



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

2017-10-15 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp requested changes to this revision.
dkrupp added a comment.
This revision now requires changes to proceed.

Please fix the incompatibility between analyze-build and lib/CrossTU in the 
format of externalFnMap.txt mappfing file.




Comment at: tools/scan-build-py/libscanbuild/analyze.py:535
+ast_path = os.path.join("ast", triple_arch, path + ".ast")
+func_ast_list.append(mangled_name + "@" + triple_arch + " " + ast_path)
+return func_ast_list

There is a incompatibility between this scan-build (analyze-build actually) 
implementation and new lib/CrossTU library.

CrossTranslationUnitContext::loadExternalAST(
StringRef LookupName, StringRef CrossTUDir, StringRef IndexName)

expects the externalFnMap.txt to be in
"functionName astFilename" format.
however currently we generate here
"functionName@arch astFilename" lines.

One possible fix could be to 
create one externalFnMap.txt indexes per arch
/ast/x86_64/externalFnMap.txt
/ast/ppc64/externalFnMap.txt
etc.
and call clang analyze with the architecture specific map directory: 
e.g. ctu-dir=/ast/x86_64

This would then work if the "to-be-analyzed" source-code is cross-compiled into 
multiple architectures.

Would be useful to add a test-case too to check if the map file and ctu-dir 
content generated by analyze-build is compatible.



Comment at: tools/scan-build-py/libscanbuild/analyze.py:585
++ [opts['file']]
+triple_arch = get_triple_arch(cmd, cwd)
+generate_ast(triple_arch)

Maybe we could use the full target-triple for distinguishing the AST binaries, 
not only the architecture part. The sys part for example is probably important 
too and a "win32" AST may not be compatible with a "linux" AST.


https://reviews.llvm.org/D30691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D34512: Add preliminary Cross Translation Unit support library

2017-08-21 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a subscriber: zaks.anna.
dkrupp added a comment.

The creation of this library (libCrossTU) is approved for importing function 
definitions.  @zaks.anna, @NoQ , @klimek could you please help us reviewing the 
code itself?

Then, when this is approved, we could progress with the review of the dependent 
"Support for naive cross translational unit analysis" 
(https://reviews.llvm.org/D30691) feature. The two patch is now in sync. 
Thanks a lot in advance.


https://reviews.llvm.org/D34512



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

2017-06-19 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

Thanks.

> It would be best to just add the scan-build-py support to the tree, 
> especially, since the new scrips are not tested.

OK. We will update this patch with the scan-build-py changes and remove the 
ctu-build.py and ctu-analyze.py scripts.

> I am curious which optimization strategies you considered. An idea that @NoQ 
> came up with was to serialize only the most used translation units. Another 
> idea is to choose the TUs that a particular file has most dependencies on and 
> only inline functions from those.

Both of these strategies could work in practice, we did not try them. We 
implemented the two extremes: serialize all TUs/don't serialize any of the TUs. 
Both of them could can be useful in practice as is (depending if the user is 
cpu/memory/disk space bound).  I think we could try with the above suggested 
optimizations as an incremental improvement (and keep this initial patch as 
simple as possible).

> What mode would you prefer? Would you pay the 20%-30% in speed but reduce the 
> huge disk consumption? That might be a good option, especially, if you have 
> not exhausted the ideas on how to optimize.

In this initial version I think we should keep the serializing mode. We just 
measured that the heap consumption of the non-serializing mode and it seems to 
be ~50% larger. Probably because the serializing mode only loads those AST 
fragments from the disk that is imported. But I can imagine that some user 
still want to use the non-serializing version which is not using extra disk 
space. So we will add the non-serializing mode in a next patch as an Analyzer 
option first (which we can turn into default behaviour later on). OK?

> I see some changes to the compiler proper, such as ASTImporter, ASTContext, 
> SourceManager. We should split those changes into a separate patch and ask 
> for review from people working on those components. You can point back to 
> this patch, which would contain the analyzer related changes, and state that 
> the other patch is blocking this work.

Allright, we will do that.

So is it OK to proceed like this?


https://reviews.llvm.org/D30691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

2017-06-14 Thread Daniel Krupp via Phabricator via cfe-commits
dkrupp added a comment.

Thanks for the reviews so far.
I think we have addressed all major concerns regarding this patch:

-(Anna) Scan-build-py integration of the functionality is nearly finished (see 
https://github.com/rizsotto/scan-build/issues/83) (--ctu switch performs both 
analysis phases at once).  This I think could go in a different patch, but 
until we could keep the ctu-build.py and ctu-analyze.py scripts. Do you agree?

-(Devin,NoQ) In the  externalFnMap.txt (which contains which function 
definitions is located where) Unified Symbol Resolution (USR) is used to 
identify functions instead of mangled names, which seems to work equally well 
for C and C++

-(Anna) Dumping the ASTs to the disk. We tried a version where, ASTs are not 
dumped in the 1st phase, but is recreated each time a function definition is 
needed from an external TU. It works fine, but the analysis time went up by 
20-30% on open source C projects. Is it OK to add this functionality in a next 
patch? Or should we it as an optional feature right now?

Do you see anything else that would prevent this patch to get in?


https://reviews.llvm.org/D30691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits