Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12113 )
Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu tables. ...................................................................... Patch Set 14: (4 comments) http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h File be/src/runtime/decimal-value.h: http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h@199 PS14, Line 199: value_ nit: /// nit: 'value_' (i.e. quotes around variable names) More importantly, I don't understand this comment - eg. what is "the receiving end" in this context? I assume you're thinking in the context of this patch, that a DecimalMinMaxFilter will be reconstructed after being sent over the network using this function, but the function is more general so I think it makes more sense to have a more general comment. Its also generally nice to specifically mention the use of the out parameter. Maybe something like: /// Store the binary representation of this DecimalValue in 'tvalue'. http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc@95 PS14, Line 95: switch (size_) { So this is pretty perf-sensitive code (executed once per row on the build side of joins), and since 'size_' is a constant we really don't want to be doing this switch here. There are at least two ways to fix this: - Use codegen, eg. introduce a MinMaxFilter::CodegenInsert() which returns an llvm::Function. For other types, this would just call codegen->GetFunction(GetInsertIRFunctionType()) but for DecimalMinMaxFilter it could replace this with a constant (eg. by having a function GetSize() here and using ReplaceCallSites()), then call this new function in FilterContext::CodegenInsert(). - Use macros. eg. have separate classes for Decimal4MinMaxFilter, Decimal8MinMaxFilter, and Decimal16MinMaxFilter generated with macros just as we do for NUMERIC_MIN_MAX_FILTER, and return an object of the appropriate type from MinMaxFilter::Create() I have a strong preference for the macros option as codegen can be tricky and I think using macros will be more efficient (eg. because it doesn't add to codegen time, eliminates switching in other function like DecimalMinMaxFilter::Or()/ToThrift()) http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h@298 PS14, Line 298: static int GetSize(int precision) { I think that you can just use ColumnType::GetDecimalByteSize() instead of defining this function yourself. http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py@116 PS13, Line 116: self.run_test_case('QueryTest/decimal_min_max_filters', vector) > How do I find out the time it takes to run? Hmm, alright. It doesn't make much sense to me to split the decimal-related tests based on the join type (and my guess is that isn't really what Tim had in mind anyways), so my recommendation is to move all of the decimal-specific test cases into this decimal-specific file (eg. test cases 5+), and leave the decimal stuff that you added to non-decimal-specific test cases in min_max_filters.test (eg. the stuff in test case 1). Then probably have test_decimal_min_max_filters run only in exhaustive (based on a combination of the long running time and the assumption that the tests in case 1 combined with various tests elsewhere that address different join types cover most of this well enough to catch most issues, and the specific things being tested by your join-type-specific decimal tests are unlikely to break regularly). -- To view, visit http://gerrit.cloudera.org:8080/12113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94 Gerrit-Change-Number: 12113 Gerrit-PatchSet: 14 Gerrit-Owner: Janaki Lahorani <jan...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Janaki Lahorani <jan...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 07 Jan 2019 19:42:27 +0000 Gerrit-HasComments: Yes