[Impala-ASF-CR] IMPALA-6533: Add min-max filter for decimal types on kudu tables.

Thomas Marshall (Code Review) Mon, 07 Jan 2019 11:42:37 -0800

Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12113 )


Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu 
tables.
......................................................................


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h@199
PS14, Line 199: value_
nit: ///
nit: 'value_' (i.e. quotes around variable names)

More importantly, I don't understand this comment - eg. what is "the receiving 
end" in this context?

I assume you're thinking in the context of this patch, that a 
DecimalMinMaxFilter will be reconstructed after being sent over the network 
using this function, but the function is more general so I think it makes more 
sense to have a more general comment.

Its also generally nice to specifically mention the use of the out parameter.

Maybe something like:
/// Store the binary representation of this DecimalValue in 'tvalue'.


http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc@95
PS14, Line 95:   switch (size_) {
So this is pretty perf-sensitive code (executed once per row on the build side 
of joins), and since 'size_' is a constant we really don't want to be doing 
this switch here.

There are at least two ways to fix this:
- Use codegen, eg. introduce a MinMaxFilter::CodegenInsert() which returns an 
llvm::Function. For other types, this would just call 
codegen->GetFunction(GetInsertIRFunctionType()) but for DecimalMinMaxFilter it 
could replace this with a constant (eg. by having a function GetSize() here and 
using ReplaceCallSites()), then call this new function in 
FilterContext::CodegenInsert().

- Use macros. eg. have separate classes for Decimal4MinMaxFilter, 
Decimal8MinMaxFilter, and Decimal16MinMaxFilter generated with macros just as 
we do for NUMERIC_MIN_MAX_FILTER, and return an object of the appropriate type 
from MinMaxFilter::Create()

I have a strong preference for the macros option as codegen can be tricky and I 
think using macros will be more efficient (eg. because it doesn't add to 
codegen time, eliminates switching in other function like 
DecimalMinMaxFilter::Or()/ToThrift())


http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h@298
PS14, Line 298:   static int GetSize(int precision) {
I think that you can just use ColumnType::GetDecimalByteSize() instead of 
defining this function yourself.


http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py@116
PS13, Line 116:     self.run_test_case('QueryTest/decimal_min_max_filters', 
vector)
> How do I find out the time it takes to run?
Hmm, alright. It doesn't make much sense to me to split the decimal-related 
tests based on the join type (and my guess is that isn't really what Tim had in 
mind anyways), so my recommendation is to move all of the decimal-specific test 
cases into this decimal-specific file (eg. test cases 5+), and leave the 
decimal stuff that you added to non-decimal-specific test cases in 
min_max_filters.test (eg. the stuff in test case 1).

Then probably have test_decimal_min_max_filters run only in exhaustive (based 
on a combination of the long running time and the assumption that the tests in 
case 1 combined with various tests elsewhere that address different join types 
cover most of this well enough to catch most issues, and the specific things 
being tested by your join-type-specific decimal tests are unlikely to break 
regularly).



--
To view, visit http://gerrit.cloudera.org:8080/12113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Gerrit-Change-Number: 12113
Gerrit-PatchSet: 14
Gerrit-Owner: Janaki Lahorani <jan...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Janaki Lahorani <jan...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 07 Jan 2019 19:42:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6533: Add min-max filter for decimal types on kudu tables.

Reply via email to