steakhal added a comment.

First and foremost, I want to deeply apologize about my rushed response. When I 
say that the `Taint originated here` note remained, I **wrongly** draw 
conclusions. Won't happen again.

__Taint-flow IDs__:
I would challenge that we need a flow ID (number) because for me the tainted 
symbol already has a unique ID which should be suitable for this purpose.
What I can see is that it would be useful to know directly what was the first 
tainted symbols of any given tainted symbol. (`SymbolData -> SymbolData` 
mapping)
This is pretty much analog with what you implemented by piggybacking on the 
taint kind. Although, I'd argue that having an explicit mapping would be 
cleaner, but I can see that it's more on personal preference.
In addition to that, I think there is value in minimizing the number of places 
where we introduce such static counters, so I would adwise against that unless 
we have a good reason for doing so.

I'm thinking that although it's handy to have the originally tainted symbol 
directly at the error-node at hand, I'm still not sure if that couldn't be 
calculated and tracked back to the place where we introduced taint.

  int n;
  scanf("%d", &n); // binds $1, not important
  int v = mytaintsource(n); // taint originated here, returns $2
  int z = taintprop(v); // taint propagated here, returns $3
  int x = 42 / z; // 42 div $3

__Soundness of the back-propagation__:
The back-propagation is always in-sync **given that** the state-transition of 
introducing taint **also attaches** the `NoteTag` explaining what it did and 
why.
This basically means that after we call `addTaint()` we must also add a 
`NoteTag` when we call `addTransition()`. Under these circumstances I find it 
easier to argue that the back-propagation is consistent (sound). If a specific 
checker (other than the `GenericTaintChecker`) models taint, it should stick to 
the rules described and emit the right `NoteTag`. That way even downstream 
checkers would play nicely with the taint-tracking and benefit from it.

__Interestingness__:
The concerns seem valid about that the set of interesting symbols could be 
larger than the actually required (and desired) set. However, I wouldn't 
worried about this much unless we have an example on which we can continue 
discussing that this is a real concern.
What I can see is that as-of-now we use the interestingness notion for this, 
and deviating or introducing something else would introduce complexity. So, I'm 
not strictly against having something else, but I would lean towards using 
interestingness here as well, unless we have clear-cut examples demonstrating 
the need for something more sophisticated.

Finally, I'd like to thank you for investing your time into this subject. We 
really should have done it much earlier. Without these similar improvements 
like this one, it's just half way done. Thank you.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to