steakhal requested changes to this revision.
steakhal added a comment.
This revision now requires changes to proceed.

Nice improvement!
I only have minor nitpicks and some recommendations for the taint API.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:205
+        BR.markInteresting(CallLocation);
+        if (TaintedArgs.at(Idx) != ReturnValueIndex) {
+          LLVM_DEBUG(llvm::dbgs() << "Taint Propagated to argument "
----------------
We usually use the `operator[]` on vector instead of the `at()`. When in doubt, 
we assert that the index is in `idx < size()`.
Apply this for all `.at()` uses.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214
+          if (nofTaintedArgs == 0)
+            Out << "Taint propagated to argument "
+                << TaintedArgs.at(Sym.index()) + 1;
+          else
+            Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+          nofTaintedArgs++;
----------------
dkrupp wrote:
> steakhal wrote:
> > I'd recommend using `llvm::interleaveComma()` in such cases.
> > You can probably get rid of `nofTaintedArgs` as well - by using this 
> > function.
> I chose another solution. I hope that is ok too.
Looks good. I'm not expecting many cases propagating to multiple arguments 
anyway.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
 bool taint::isTainted(ProgramStateRef State, const Stmt *S,
                       const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind).empty();
 }
----------------
TBH I'm not sure if I like that now we allocate unbounded amount of times 
(because of `getTaintedSymbols()` is recursive and returns by value), where we 
previously did not.

What we could possibly do is to compute the elements of this sequence lazily.
I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's possible 
to have something like that as a return type as it might encode the map 
function in the type or something like that.
Anyway, I'm just saying that it would be nice to not do more than it's 
necessary, and especially not allocate a lot of short-lived objects there.

Do you think there is a way to have the cake and eat it too?

---

I did some investigation, and one could get pretty far in the implementation, 
and maybe even complete it but it would be really complicated as of now. Maybe 
we could revisit this subject when we have coroutines.

So, I would suggest to have two sets of APIs:
 - the usual `isTainted(.) -> bool`
 - and a `getTaintedSymbols(.) -> vector<Sym>`
The important point would be that the `isTainted()` version would not eagerly 
collect all tainted sub-syms but return on finding the first one.
While, the `getTaintedSymbols()` would collect eagerly all of them, as its name 
suggests.

Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool 
returnAfterFirstMatch`. This way `isTainted()` can call it like that. While in 
the other case, the parameter would be `false`, and eagerly collect all symbols.

This is probably the best of both worlds, as it prevents `isTainted` from doing 
extra work and if we need to iterate over the tainted symbols, we always 
iterate over all of them, so doing it lazily wouldn't gain us much in that case 
anyway.
As a bonus, the user-facing API would be self-descriptive.

WDYT?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:265-268
+      std::vector<SymbolRef> TaintedRegions =
+          getTaintedSymbols(State, SRV->getRegion(), Kind);
+      TaintedSymbols.insert(TaintedSymbols.begin(), TaintedRegions.begin(),
+                            TaintedRegions.end());
----------------
For such constructs, I would prefer this.


================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:12-13
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
----------------
The [[ 
https://buildkite.com/llvm-project/premerge-checks/builds/146760#01877fd9-2f3a-4f8d-9ffa-6e017eb883c5
 | premerge bots ]] are complaining about these two lines on Windows:
```
error: 'warning' diagnostics seen but not expected: 
  File 
C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c
 Line 12: incompatible redeclaration of library function 'strlen'
  File 
C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c
 Line 13: incompatible redeclaration of library function 'malloc'
error: 'note' diagnostics seen but not expected: 
  File 
C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c
 Line 12: 'strlen' is a builtin with type 'unsigned long long (const char *)'
  File 
C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c
 Line 13: 'malloc' is a builtin with type 'void *(unsigned long long)'
4 errors generated.
```

I think it's because `size_t` should be defined as `unsigned long long` on 
`x86_64`. This also means that you should pin the target to `x86_64` to satisfy 
this test on all platforms.


================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:53-67
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+                                 // expected-note@-1 {{Taint propagated to the 
return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
----------------
I know this is subjective, but I'd suggest to reformat the tests to match LLVM 
style guidelines, unless the formatting is important for the test.
Consistency helps the reader and reviewer, as code and tests are read many more 
times than written.

This applies to the rest of the touched tests.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to