tkuchta added inline comments.

================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:213
+  char *res = strsep(s, delim);
+  s_label = dfsan_read_label(base, strlen(base));
+  if (res && (res != base)) {
----------------
browneee wrote:
> tkuchta wrote:
> > browneee wrote:
> > > tkuchta wrote:
> > > > browneee wrote:
> > > > > tkuchta wrote:
> > > > > > browneee wrote:
> > > > > > > The `s_label` represents the taint label for `s` (the pointer).
> > > > > > > 
> > > > > > > This line would clobber the taint label of the pointer (`s`) with 
> > > > > > > a taint label from `s[0][0..n]`.
> > > > > > > 
> > > > > > > I think this line should be deleted.
> > > > > > Agree, s_label represents the taint associated with the **s 
> > > > > > pointer. However I am now wondering if that is the taint wich we 
> > > > > > would like to return.
> > > > > > For example, if we have
> > > > > > if (flags().strict_data_dependencies) {
> > > > > >     *ret_label = res ? s_label : 0;
> > > > > > 
> > > > > > We would taint the return value with the value of the pointer, not 
> > > > > > the data. It means that if we operate on a string for which the 
> > > > > > characters are tainted, but the pointer itself isn't, we are likely 
> > > > > > going to return label 0. My understanding was that we are more 
> > > > > > concerned with the taint of the data, not the pointer, am I missing 
> > > > > > something?
> > > > > > 
> > > > > Yes, we are usually more concerned with the taint of the data, not 
> > > > > the pointer.
> > > > > 
> > > > > With strict dependencies:
> > > > > // If the input pointer is tainted, the output pointer would be 
> > > > > tainted (because it is derived from the input pointer - maybe the 
> > > > > same value).
> > > > > taint(s[0]) == dfsan_read_label(s, sizeof(s)) ====> taint(ret) == 
> > > > > ret_label[0]
> > > > > 
> > > > > // If the input data is tainted, the output data would be tainted 
> > > > > (because it is derived from the input data).
> > > > > taint(s[0][0]) == MEM_TO_SHADOW(s[0])[0] ====> taint(ret[0]) == 
> > > > > MEM_TO_SHADOW(ret)[0]
> > > > > 
> > > > > Because s[0] == ret  (or ret==null), (for the non-null case) the 
> > > > > output shadow bytes are the same bytes as input shadow bytes and so 
> > > > > these taint labels for the string data in shadow memory do not need 
> > > > > to be explicitly propagated in this function. 
> > > > > 
> > > > > I think the only case actually changing/copying string data is 
> > > > > writing a delimiter byte to NULL, which you handled.
> > > > I am working on the changes and I came across a strange behavior that I 
> > > > would like to ask about.
> > > > 
> > > > It turned out that if we do
> > > > 
> > > > char *s = " ... ";
> > > > dfsan_set_label(label, &p_s, sizeof(&p_s));
> > > > 
> > > > Then, the s_label within the handler is 0, not "label". This is 
> > > > unexpected, as we would like the pointer itself to be labeled here.
> > > > I believe this has something to do with the fact that the input string 
> > > > in strsep is a double pointer. For example this works as expected for 
> > > > the delimiter string, which is a single pointer. 
> > > > It's either I'm doing something incorrectly or there is some issue with 
> > > > propagating labels for double pointers.
> > > > Have you perhaps come across this behavior before?
> > > I'm not sure what p_s is in your example. Could you provide a more 
> > > complete example?
> > > (and maybe all in one function, not half in __dfsw_strsep and half in 
> > > another function)
> > > 
> > > Here is an example showing taint labels at different levels of 
> > > indirection:
> > > 
> > > ```
> > > #include <assert.h>
> > > #include <string.h>
> > > #include <sanitizer/dfsan_interface.h>
> > > 
> > > int main(void) {
> > >   char *s = " ... ";
> > >   char **p_s = &s;
> > >   char ***pp_s = &p_s;
> > > 
> > >   dfsan_label i_label = 1 << 0;
> > >   dfsan_label j_label = 1 << 1;
> > >   dfsan_label k_label = 1 << 2;
> > >   dfsan_label m_label = 1 << 3;
> > > 
> > >   // string data
> > >   dfsan_set_label(i_label, s, strlen(s));
> > >   // pointer s
> > >   dfsan_set_label(j_label, &s, sizeof(s));
> > >   // pointer to pointer s
> > >   dfsan_set_label(k_label, &p_s, sizeof(p_s));
> > >   // pointer to pointer to pointer s
> > >   dfsan_set_label(m_label, &pp_s, sizeof(pp_s));
> > > 
> > >   assert(pp_s[0][0] == s);
> > > 
> > >   // string data
> > >   assert(dfsan_read_label(s, strlen(s)) == i_label);
> > >   // pointer s
> > >   assert(dfsan_read_label(&s, sizeof(s)) == j_label);
> > >   // pointer to pointer s
> > >   assert(dfsan_read_label(&p_s, sizeof(p_s)) == k_label);
> > >   // pointer to pointer to pointer s
> > >   assert(dfsan_read_label(&pp_s, sizeof(pp_s)) == m_label);
> > > 
> > >   return 0;
> > > }
> > > ```
> > Hello,
> > 
> > Thank you for the comment.
> > 
> > I should have provided a more complete example.
> > What I meant is a behavior I've found while working on the tests.
> > In the test file I have something like that:
> > 
> > ```
> > char *s = "Hello world/";
> > char *delim = " /";
> > dfsan_set_label(label, &s, sizeof(&s));
> > char *rv = strep(&s, delim);
> >  ```
> > 
> > If I get this right, the line 
> > ```
> > dfsan_set_label(label, &s, sizeof(&s));
> > 
> > ``` 
> > should have applied a taint to the `s` pointer.
> > However, when inside `strsep`, if we check the `s_label`, it's `0`, not 
> > `label`
> > 
> > ```
> > SANITIZER_INTERFACE_ATTRIBUTE char *__dfsw_strsep(char **s, const char 
> > *delim,
> >                                                   dfsan_label s_label,
> >                                                   dfsan_label delim_label,
> >                                                   dfsan_label *ret_label) {
> >     fprintf(stderr, "s_label = %d\n", s_label); // -> this prints out "0"
> > ```
> > 
> > If we do exactly the same trick with the `delim` pointer, the associated 
> > `delim_label` will be equal to `label`, as expected. 
> > 
> > It seems to me that either I am missing some important point here or the 
> > `s_label` is not propagated correctly by the compiler runtime for a double 
> > pointer to the user-provided handler (in my case `__dfsw_strsep`).
> > 
> > I believe that strsep is currently the first function within 
> > dfsan_custom.cpp in which we operate on or check the label of the double 
> > pointer.
> > 
> > I would like to ask if you maybe came across this behavior. It might be 
> > that I misunderstood how this should work and would be grateful for your 
> > help.
> > Thank you
> > If we do exactly the same trick with the delim pointer, the associated 
> > delim_label will be equal to label, as expected.
> 
> > `char *rv = strep(&s, delim);`
> 
> The difference is `&s` adds another level of indirection, `delim` does not
> 
> I haven't tested, but I expect this example would work...
> 
> ```
> char *s = "Hello world/";
> char **ps = &s;
> char *delim = " /";
> dfsan_set_label(label, &ps, sizeof(&ps));
> char *rv = strep(ps, delim);
> 
> // Inside __dfsw_strsep ...
>     fprintf(stderr, "s_label = %d\n", s_label); // should now print nonzero 
> label
> ```
You're right, thank you!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to