Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11160 )
Change subject: IMPALA-7412: width_bucket() function overflows too easily ...................................................................... Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/11160/2/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/11160/2/be/src/exprs/math-functions-ir.cc@516 PS2, Line 516: DCHECK(lhs > 0 && rhs > 0); > This was >= in the previous version - is this intentional? Thanks! Done. http://gerrit.cloudera.org:8080/#/c/11160/2/be/src/exprs/math-functions-ir.cc@518 PS2, Line 518: return BitUtil::CountLeadingZeros(lhs) + BitUtil::CountLead > I assume that in most use cases min_range, max_range and num_buckets will b Thanks, I implemented this logic. http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@458 PS3, Line 458: // The bucket_number is evaluated using the following formula: > Optional: consider putting in this a header like decimal-util.h I've put it into bit-util.h. I felt it's a more proper place for it since it can be useful for other use cases, not only for decimals. http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@462 PS3, Line 462: dis > Maybe we should name it so that it's explicit that it's double the width. E Yeah, it's a better name. Done. http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@516 PS3, Line 516: > static_cast<int64_t>() Done http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@520 PS3, Line 520: > static_cast Done http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@523 PS3, Line 523: bigger_type_needed = bigger_type_needed || MultiplicationOverflows( > It might be worth keeping some form of the CountLeadingZeros() check to avo I've added some logic to avoid unnecessary type widening. I added the query to the commit message that I used for performance measurement. It seems that the new algorithm offers 2-3x speedup compared to the old one. http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@524 PS3, Line 524: range_v > static_cast. I think we should also explicitly wrap the return value in a B Done http://gerrit.cloudera.org:8080/#/c/11160/4/be/src/exprs/math-functions-ir.cc File be/src/exprs/math-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/11160/4/be/src/exprs/math-functions-ir.cc@506 PS4, Line 506: rtainly constant e > I am not sure what happens here if min_range_val is INT_MIN - according to It is undefined behavior, but what usually happens is that abs(INT_MIN) returns INT_MIN. When INT_MIN is casted to unsigned it will be interpreted as INT_MAX + 1. E.g. if we have only one byte, then min is -128, in 2's compelement it is 10000000, and if we interpret it as unsigned, then it means 128. So it should work just right, but yeah it is a bit worrying that it is undefined behavior. http://gerrit.cloudera.org:8080/#/c/11160/4/be/src/exprs/math-functions-ir.cc@514 PS4, Line 514: : auto MultiplicationOverflows = [](ActualType lhs, ActualType rhs) { : DCHECK(lhs > 0 && rhs > 0); > optional performance idea: I think that it should be faster to create a bit I am not sure I can follow. Could you elaborate on that? Maybe with a 1-byte example? -- To view, visit http://gerrit.cloudera.org:8080/11160 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I728cc05d9aef8d081e6f2da66146f6d7b75dbb57 Gerrit-Change-Number: 11160 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 14 Aug 2018 14:24:23 +0000 Gerrit-HasComments: Yes
