Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
......................................................................


Patch Set 1:

(26 comments)

> (22 comments)
 > 
 > what's the plan for keeping this compiling?

I have a Jenkins job in progress to build in a variety of different ways (-so, 
-asan, -ninja, -releast). I will modify to add this. I intend eventually to add 
the job to a continually-monitored dashboard and perhaps to have it send email 
alerts on failure.

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

PS1, Line 95: synbol
> typo
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

PS1, Line 150: )
> add "!= 0" as we try to avoid implicit comparisons against zero conversion 
switched to std::fill


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 276:           *decimal >>= bytes_to_fill * 8;
> since len comes from the data file, we could also have bytes_to_fill < 0, n
Yes, that is also undefined, but it is precluded by ReadFieldLen being .ok, 
above.


Line 278:           *decimal = 0;
> can this happen with well-formed avro? if not, this be a return false case.
Here is a stack trace:
    /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:276:20: runtime 
error: shift exponent 32 is too large for 32-bit type 'int32_t' (aka 'int')
    #0 0x2af0805477df in impala::HdfsAvroScanner::ReadAvroDecimal(int, unsigned 
char**, unsigned char*, bool, void*, impala::MemPool*) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:276:20
    #1 0x7ec68c in void 
impala::HdfsAvroScannerTest::TestReadAvroType<impala::DecimalValue<int>, bool 
(impala::HdfsAvroScanner::*)(int, unsigned char**, unsigned char*, bool, void*, 
impala::MemPool*), unsigned long>(bool (impala::HdfsAvroScanner::*)(int, 
unsigned char**, unsigned char*, bool, void*, impala::MemPool*), unsigned long, 
unsigned char*, long, impala::DecimalValue<int>, int, impala::TErrorCode::type) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:85:20
    #2 0x7c8e9a in void 
impala::HdfsAvroScannerTest::TestReadAvroDecimal<int>(unsigned char*, long, 
impala::DecimalValue<int>, int, impala::TErrorCode::type) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:187:5
    #3 0x7bbb3f in impala::HdfsAvroScannerTest_DecimalTest_Test::TestBody() 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:393:3


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1379:     double percent = total_rows * 100 / std::max(1L, num_input_rows);
> if that were the case, we would have seen crashes.
1. "if that were the case, we would have seen crashes." - I think that because 
total_rows is a double, the denominator gets converted to a double, making this 
not a crash.

UBSAN is a dynamic analysis tool; when it triggers, the behavior actually 
occurred.

2. Done.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

PS1, Line 97: *
> nit: we put the * next to type.
Done.


Line 100:   memcpy(&uinteger, &integer, sizeof(integer));
> using memcpy doesn't seem necessary. can't we just write:
It has different semantics: for 8-bit ints and integer = -2, my expression has 
value 3, yours has value 253, and yours with the | replaced by ^ has value 251.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 157:     if (LIKELY(dst->ptr && src.ptr)) memcpy(dst->ptr, src.ptr, 
src.len);
> Prefer explicit NULL checks.
memcpy is UB if either ptr is null, even if len is 0.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc
File be/src/exprs/bit-byte-functions-ir.cc:

PS1, Line 151: v * 
> why is this necessary?  and does it generate the same code (let's make sure
UBSAN caught a case where v << shift was not representable as a T. That's UB, 
but since this patch sets the -fwrapv, the multiply behavior is defined.

It generates the same code, but it if we don't do it this way and set fwrapv, 
the compiler can break the code and do bizarre things. Historically, this is a 
real risk with GCC.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 820:   if (!haystack) return NULL;
> okay but this check is still unnecessary - no loop iterations will be taken
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

PS1, Line 804: t
> nit: "t != nullptr" per our style.
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS1, Line 1684: accumulators::count(completion_times)
> Can you do an explicit comparison against 0 here?
Done


Line 1701:       if (accumulators::count(rates)) {
> And here
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

Line 161:         if (dest->len) memcpy(dest->ptr, src->ptr, dest->len);
> Don't we return a non-null ptr (zero_length_region_) in this case to avoid 
If src->ptr is nullptr and dest->len is 0, this is still UB.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/row-batch-serialize-test.cc
File be/src/runtime/row-batch-serialize-test.cc:

PS1, Line 164: len
> How about using a vector or string here instead to avoid the extra branch.
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 975:     if (string_val->ptr) memcpy(dest, string_val->ptr, 
string_val->len);
> != nullptr
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/string-value.inline.h
File be/src/runtime/string-value.inline.h:

Line 58:   const int result = len ? strncmp(s1, s2, len) : 0;
> len != 0
Done; I feel that the UB in the standards is not great for us and that it's 
worth the extra code to avoid the UB at all times.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

Line 75:     if (tuple_desc.num_null_bytes()) {
> != 0
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/udf/udf.cc
File be/src/udf/udf.cc:

Line 487:   if (LIKELY(len && !result.is_null)) {
> len != 0
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 36:   if (bit_width >= CHAR_BIT * sizeof(uint32_t)) return ~0;
> But where would we ever do that?  i.e. shouldn't this be a precondition to 
If we did that, the conditional would have to move into the tests, if you look 
in the rest of this file.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bitmap.h
File be/src/util/bitmap.h:

Line 78:     if (buffer_.size()) memset(buffer_.data(), 255 * b, buffer_.size() 
* sizeof(uint64_t));
> != 0
switched to std::fill


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

Line 45:   if (input.size()) memcpy(input_vector.data(), input.c_str(), 
input.size());
> != 0
switched to using constructor


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 387:                         ::max(1L, total_time_counter()->value());
> same comment - should it be set to 0 or 100 in this case rather than local_
Done


Line 998:   if (num) {
> != 0
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/streaming-sampler.h
File be/src/util/streaming-sampler.h:

Line 55:     if (samples_collected_) {
> != 0
Done


Line 112:     if (samples_collected_) memcpy(samples_, samples.data(), 
sizeof(T) * samples_collected_);
> same
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to