tkuchta added inline comments.
================ Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:213 + char *res = strsep(s, delim); + s_label = dfsan_read_label(base, strlen(base)); + if (res && (res != base)) { ---------------- browneee wrote: > tkuchta wrote: > > browneee wrote: > > > tkuchta wrote: > > > > browneee wrote: > > > > > tkuchta wrote: > > > > > > browneee wrote: > > > > > > > The `s_label` represents the taint label for `s` (the pointer). > > > > > > > > > > > > > > This line would clobber the taint label of the pointer (`s`) with > > > > > > > a taint label from `s[0][0..n]`. > > > > > > > > > > > > > > I think this line should be deleted. > > > > > > Agree, s_label represents the taint associated with the **s > > > > > > pointer. However I am now wondering if that is the taint wich we > > > > > > would like to return. > > > > > > For example, if we have > > > > > > if (flags().strict_data_dependencies) { > > > > > > *ret_label = res ? s_label : 0; > > > > > > > > > > > > We would taint the return value with the value of the pointer, not > > > > > > the data. It means that if we operate on a string for which the > > > > > > characters are tainted, but the pointer itself isn't, we are likely > > > > > > going to return label 0. My understanding was that we are more > > > > > > concerned with the taint of the data, not the pointer, am I missing > > > > > > something? > > > > > > > > > > > Yes, we are usually more concerned with the taint of the data, not > > > > > the pointer. > > > > > > > > > > With strict dependencies: > > > > > // If the input pointer is tainted, the output pointer would be > > > > > tainted (because it is derived from the input pointer - maybe the > > > > > same value). > > > > > taint(s[0]) == dfsan_read_label(s, sizeof(s)) ====> taint(ret) == > > > > > ret_label[0] > > > > > > > > > > // If the input data is tainted, the output data would be tainted > > > > > (because it is derived from the input data). > > > > > taint(s[0][0]) == MEM_TO_SHADOW(s[0])[0] ====> taint(ret[0]) == > > > > > MEM_TO_SHADOW(ret)[0] > > > > > > > > > > Because s[0] == ret (or ret==null), (for the non-null case) the > > > > > output shadow bytes are the same bytes as input shadow bytes and so > > > > > these taint labels for the string data in shadow memory do not need > > > > > to be explicitly propagated in this function. > > > > > > > > > > I think the only case actually changing/copying string data is > > > > > writing a delimiter byte to NULL, which you handled. > > > > I am working on the changes and I came across a strange behavior that I > > > > would like to ask about. > > > > > > > > It turned out that if we do > > > > > > > > char *s = " ... "; > > > > dfsan_set_label(label, &p_s, sizeof(&p_s)); > > > > > > > > Then, the s_label within the handler is 0, not "label". This is > > > > unexpected, as we would like the pointer itself to be labeled here. > > > > I believe this has something to do with the fact that the input string > > > > in strsep is a double pointer. For example this works as expected for > > > > the delimiter string, which is a single pointer. > > > > It's either I'm doing something incorrectly or there is some issue with > > > > propagating labels for double pointers. > > > > Have you perhaps come across this behavior before? > > > I'm not sure what p_s is in your example. Could you provide a more > > > complete example? > > > (and maybe all in one function, not half in __dfsw_strsep and half in > > > another function) > > > > > > Here is an example showing taint labels at different levels of > > > indirection: > > > > > > ``` > > > #include <assert.h> > > > #include <string.h> > > > #include <sanitizer/dfsan_interface.h> > > > > > > int main(void) { > > > char *s = " ... "; > > > char **p_s = &s; > > > char ***pp_s = &p_s; > > > > > > dfsan_label i_label = 1 << 0; > > > dfsan_label j_label = 1 << 1; > > > dfsan_label k_label = 1 << 2; > > > dfsan_label m_label = 1 << 3; > > > > > > // string data > > > dfsan_set_label(i_label, s, strlen(s)); > > > // pointer s > > > dfsan_set_label(j_label, &s, sizeof(s)); > > > // pointer to pointer s > > > dfsan_set_label(k_label, &p_s, sizeof(p_s)); > > > // pointer to pointer to pointer s > > > dfsan_set_label(m_label, &pp_s, sizeof(pp_s)); > > > > > > assert(pp_s[0][0] == s); > > > > > > // string data > > > assert(dfsan_read_label(s, strlen(s)) == i_label); > > > // pointer s > > > assert(dfsan_read_label(&s, sizeof(s)) == j_label); > > > // pointer to pointer s > > > assert(dfsan_read_label(&p_s, sizeof(p_s)) == k_label); > > > // pointer to pointer to pointer s > > > assert(dfsan_read_label(&pp_s, sizeof(pp_s)) == m_label); > > > > > > return 0; > > > } > > > ``` > > Hello, > > > > Thank you for the comment. > > > > I should have provided a more complete example. > > What I meant is a behavior I've found while working on the tests. > > In the test file I have something like that: > > > > ``` > > char *s = "Hello world/"; > > char *delim = " /"; > > dfsan_set_label(label, &s, sizeof(&s)); > > char *rv = strep(&s, delim); > > ``` > > > > If I get this right, the line > > ``` > > dfsan_set_label(label, &s, sizeof(&s)); > > > > ``` > > should have applied a taint to the `s` pointer. > > However, when inside `strsep`, if we check the `s_label`, it's `0`, not > > `label` > > > > ``` > > SANITIZER_INTERFACE_ATTRIBUTE char *__dfsw_strsep(char **s, const char > > *delim, > > dfsan_label s_label, > > dfsan_label delim_label, > > dfsan_label *ret_label) { > > fprintf(stderr, "s_label = %d\n", s_label); // -> this prints out "0" > > ``` > > > > If we do exactly the same trick with the `delim` pointer, the associated > > `delim_label` will be equal to `label`, as expected. > > > > It seems to me that either I am missing some important point here or the > > `s_label` is not propagated correctly by the compiler runtime for a double > > pointer to the user-provided handler (in my case `__dfsw_strsep`). > > > > I believe that strsep is currently the first function within > > dfsan_custom.cpp in which we operate on or check the label of the double > > pointer. > > > > I would like to ask if you maybe came across this behavior. It might be > > that I misunderstood how this should work and would be grateful for your > > help. > > Thank you > > If we do exactly the same trick with the delim pointer, the associated > > delim_label will be equal to label, as expected. > > > `char *rv = strep(&s, delim);` > > The difference is `&s` adds another level of indirection, `delim` does not > > I haven't tested, but I expect this example would work... > > ``` > char *s = "Hello world/"; > char **ps = &s; > char *delim = " /"; > dfsan_set_label(label, &ps, sizeof(&ps)); > char *rv = strep(ps, delim); > > // Inside __dfsw_strsep ... > fprintf(stderr, "s_label = %d\n", s_label); // should now print nonzero > label > ``` You're right, thank you! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141389/new/ https://reviews.llvm.org/D141389 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits