Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11160 )

Change subject: IMPALA-7412: width_bucket() function overflows too easily
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/11160/2/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/11160/2/be/src/exprs/math-functions-ir.cc@516
PS2, Line 516:     DCHECK(lhs > 0 && rhs > 0);
> This was >= in the previous version - is this intentional?
Thanks! Done.


http://gerrit.cloudera.org:8080/#/c/11160/2/be/src/exprs/math-functions-ir.cc@518
PS2, Line 518:     return BitUtil::CountLeadingZeros(lhs) + BitUtil::CountLead
> I assume that in most use cases min_range, max_range and num_buckets will b
Thanks, I implemented this logic.


http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@458
PS3, Line 458: // The bucket_number is evaluated using the following formula:
> Optional: consider putting in this a header like decimal-util.h
I've put it into bit-util.h. I felt it's a more proper place for it since it 
can be useful for other use cases, not only for decimals.


http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@462
PS3, Line 462:    dis
> Maybe we should name it so that it's explicit that it's double the width. E
Yeah, it's a better name. Done.


http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@516
PS3, Line 516:
> static_cast<int64_t>()
Done


http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@520
PS3, Line 520:
> static_cast
Done


http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@523
PS3, Line 523:   bigger_type_needed = bigger_type_needed || 
MultiplicationOverflows(
> It might be worth keeping some form of the CountLeadingZeros() check to avo
I've added some logic to avoid unnecessary type widening.

I added the query to the commit message that I used for performance 
measurement. It seems that the new algorithm offers 2-3x speedup compared to 
the old one.


http://gerrit.cloudera.org:8080/#/c/11160/3/be/src/exprs/math-functions-ir.cc@524
PS3, Line 524: range_v
> static_cast. I think we should also explicitly wrap the return value in a B
Done


http://gerrit.cloudera.org:8080/#/c/11160/4/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/11160/4/be/src/exprs/math-functions-ir.cc@506
PS4, Line 506: rtainly constant e
> I am not sure what happens here if min_range_val is INT_MIN - according to
It is undefined behavior, but what usually happens is that abs(INT_MIN) returns 
INT_MIN. When INT_MIN is casted to unsigned it will be interpreted as INT_MAX + 
1.

E.g. if we have only one byte, then min is -128, in 2's compelement it is 
10000000, and if we interpret it as unsigned, then it means 128.

So it should work just right, but yeah it is a bit worrying that it is 
undefined behavior.


http://gerrit.cloudera.org:8080/#/c/11160/4/be/src/exprs/math-functions-ir.cc@514
PS4, Line 514:
             :   auto MultiplicationOverflows = [](ActualType lhs, ActualType 
rhs) {
             :     DCHECK(lhs > 0 && rhs > 0);
> optional performance idea: I think that it should be faster to create a bit
I am not sure I can follow. Could you elaborate on that? Maybe with a 1-byte 
example?



--
To view, visit http://gerrit.cloudera.org:8080/11160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I728cc05d9aef8d081e6f2da66146f6d7b75dbb57
Gerrit-Change-Number: 11160
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 14 Aug 2018 14:24:23 +0000
Gerrit-HasComments: Yes

Reply via email to