Dan Hecht has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". ......................................................................
Patch Set 4: (20 comments) http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 275: if (bytes_to_fill < sizeof(*decimal)) { this doesn't seem necessary: we know that: slot_byte_size == 4 sizeof(*decimal) == 4 0 <= len.val <= slot_byte_size 0 <= bytes_to_fill <= slot_byte_size i.e. 0 <= bytes_to_fill <= 4 Is the undefined case when bytes_to_fill == 4, i.e. right shift by 32? is that actually undefined to shift int32 by 32? If so, we'd need the check for the other two cases below as well, but can just do a common if (len.val == 0) check outside the switch statement. http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exec/zigzag-test.cc File be/src/exec/zigzag-test.cc: PS4, Line 101: what's undefined about that? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS4, Line 161: dst->ptr != nullptr dst->ptr equals copy, which we've already checked against NULL at line 154, so this check isn't necessary. And this code is somewhat perf sensitive so we shouldn't do it. http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/expr-value.h File be/src/exprs/expr-value.h: PS4, Line 72: size() size() != 0 or use empty() http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: Line 379: place = Overflow::CheckedProduct(place, static_cast<uint64_t>(src_base), &overflow); why don't we need to check overflow here? and then why do we need to check it at line 367 instead? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS4, Line 819: why remove that? do we sometimes pass null? Line 629: if (strs[0].len) memcpy(ptr, strs[0].ptr, strs[0].len); != 0. but why is that necessary. C standard about string.h says: Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS4, Line 388: ( these are extraneous http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: Line 143: // case but what makes the normal path faster? this comment seemed to imply that this change will make this code slower. is that the case? PS4, Line 149: x + y why do we need UnsignedSum() on line 144 case but not here? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: PS4, Line 161: dest->len > 0 why is this len > 0 rather than dest->ptr != nullptr? is memcpy with 0 length undefined? as mentioned earlier, C standard seems to claim otherwise. http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: PS4, Line 975: string_val->ptr != nullptr is this actually possible? it's not checked upstream? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/string-value.inline.h File be/src/runtime/string-value.inline.h: PS4, Line 60: (len > 0) is it really undefined to call strncmp() with len==0? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/tuple.h File be/src/runtime/tuple.h: PS4, Line 75: tuple_desc.num_null_bytes() > 0 same question here and elsewhere about 0 count. http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/udf/udf.cc File be/src/udf/udf.cc: PS4, Line 493: len > 0 len will almost always be > 0 so this condition should go second (if we need it at all). http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: PS4, Line 118: input_len != 0 here and elsewhere. but why needed? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/util/overflow.h File be/src/util/overflow.h: Line 45: this could use a quick comment PS4, Line 68: [[gnu::always_inline]] ALWAYS_INLINE ? Line 134: constexpr Signedness SIGN = (std::is_signed<T>::value || std::is_signed<U>::value) ? do the tests cover these combinations? http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS4, Line 999: num > 0 same -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes