NoQ added inline comments.

================
Comment at: lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:494
+  SymbolManager &SM = C.getSymbolManager();
+  return SM.getDerivedSymbol(Sym, LCV.getRegion());
 }
----------------
vlad.tsyrklevich wrote:
> NoQ wrote:
> > I'd think about this a bit more and come back.
> > 
> > I need to understand how come that constructing a symbol manually is the 
> > right thing to do; that doesn't happen very often, but it seems correct 
> > here.
> Indeed it is odd. The best justification I could come up with: LCVs are meant 
> to be optimizations, their 'purpose' is to expose an SVal that hides 
> SymbolRef values so that we can have a split store. We don't have to copy all 
> of a compound values SymbolRef mappings because LCVs are kept distinct. Hence 
> to set/query/constrain region values you use SVals so that RegionStore can 
> differentiate between LCVs and SymbolRef backed SVals for the two different 
> stores it contains.
> 
> The taint interface however requires you taint a SymbolRef, not an SVal. If 
> we wanted, instead of doing this logic here, we could change 
> getPointedToSymbol() to return an SVal and update usages of it accordingly 
> since that value is only passed on to 
> ProgramState.isTainted()/ProgramState.addTaint() anyway. Then we could update 
> addTaint/isTainted to perform this logic, hiding it from the checker.
> 
> This still requires manually constructing a symbol, now it's just performed 
> in the analyzer instead of in a checker. Not sure if that addresses the issue 
> you were considering, but the idea that we need to 'undo' the LCV 
> optimization hiding the SymbolRef to have a value to taint seems somewhat 
> convincing to me. What do you think?
Hmm (!) I suggest adding a new function to the program state, that we'd call 
`addPartialTaint()` or something like that, and this function would accept a 
symbol and a region and would act identically to passing a derived symbol (from 
this symbol and that region) to `addTaint()` (but we wouldn't need to actually 
construct a derived symbol here).

Such API would be easier to understand and use than the current approach that 
forces the user to construct a derived symbol manually in the checker code. 
Unfortunately, this checker's `getLCVSymbol()` would become a bit more 
complicated (having various return types depending on circumstances), but this 
misfortune seems more random than systematic to me.

Since we're having this new kind of partial taint, why don't we expose it in 
the API.


================
Comment at: test/Analysis/taint-generic.c:210
+  read(sock, &tainted.st, sizeof(tainted.st));
+  __builtin_memcpy(buffer, tainted.buf, tainted.st[0].length); // 
expected-warning {{Untrusted data is used to specify the buffer size}}
 }
----------------
vlad.tsyrklevich wrote:
> NoQ wrote:
> > Are we already supporting the case when we're tainting some elements of an 
> > array but not all of them, and this works as expected? Could we add such 
> > tests (regardless of whether we already handle them)?
> It does work in that case. If you taint element X of region Y the current 
> logic will be conservative and only mark element X as tainted, not X-i or 
> X+i. This is also true for element 0, so if a programmer passes &array[0] but 
> reads sizeof(array) bytes it will not correctly mark that. This is also a 
> short coming of the invalidation code so I don't think there's much to do 
> until there's more general support for dealing with region extents.
\o/


https://reviews.llvm.org/D30909



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to