dkrupp added a comment.

> TaintBugReport is brilliant and we already have a precedent for subclassing 
> BugReport in another checker. However I'm somewhat worried that once we start 
> doing more of this, we'll eventually end up with multiple inheritance 
> situations when the report needs multiple kinds of information. So at a 
> glance my approach with a "generic data map" in bugreport objects looks a bit 
> more future-proof to me. Also a bit easier to set up, no need to deal with 
> custom RTTI.

Adding a data map (like a string->sval map) to the `PathSensitiveBugreport` 
instead of relying on dynamic casting sounds an easy addition. I will update 
the patch with this. Or you specifically mean this kind of datamap ?  `typedef 
llvm::ImmutableMap<void*, void*>   GenericDataMap;` (`ProgramState.h:74`) I 
guess it should not be immutable…

> I guess my main point is, there shouldn't be a need to assist tracking by 
> adding extra information to the program state. Information in the state 
> should ideally be "material" to program execution, "tangible", it has to 
> describe something that's actually stored somewhere in memory (either by 
> directly defining it, or by constraining it). In particular, if two nodes 
> result in indistinguishable future behavior of the program, we're supposed to 
> merge them; but any "immaterial" bits of information in the state will 
> prevent that from happening.

@NoQ aha! Now I see where  you are coming from! If an SVal is tainted on both 
analysis branches, but their taint flow value is different (meaning that they 
carry taint values from different taint sources), then they cannot be merged 
which causes inefficiency.
I understand the generic principle, but I wonder how frequent would that be in 
practice. I would think not too much, because taint sources are uncommon. 
Especially having multiple taint sources in the same source file/Translation 
Unit (only that creates different taint flow ids).

> In our case it should be enough to have the lambda for propagation method ask 
> "Hey, is this freshly produced propagation target value relevant to this 
> specific report?" and if yes, mark the corresponding propagation source value 
> as relevant to the report as well; also emit a note and "consume" the mark on 
> the target value. Such chain of local decisions can easily replace the global 
> taint flow identifier, and it's more flexible because this way the flow 
> doesn't need to be "linear", it may branch in various ways and that's ok.

Taint propagation is not only handled in the `GenericTaintChecker:892`, where 
we calculate that the taintedness should propagate from function argument `x` 
to `y` or return value, but also it spreads in peculiar ways within 
expressions, from subregion to parent region etc. handled in the `addtaint(..)` 
and `addPartialTaint(..)` functions in `Taint.cpp`. What your proposed solution 
would essentially mean that we would need to implement the taint propagation in 
backward direction too. I think this design would be fragile and difficult to 
maintain (especially if taint propagation would change in the future).

I definitely don’t have the full picture here, so if you think that the sval 
backtracking is the better way, because of the potential performance penalty of 
the taintflow solution with the merges, I will go down that road and start to 
work on an alternative patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to