Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 )
Change subject: IMPALA-4252: Min-max runtime filters for Kudu ...................................................................... Patch Set 6: (18 comments) http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG@15 PS6, Line 15: them into the Kudu client. Because the Kudu client doesn't provide a > Is there any documentation for Kudu about what ordering Kudu uses for compa I asked the Kudu team, and its not documented anywhere, but they confirmed they use the same ordering as Impala does. http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@106 PS6, Line 106: bloom_filter > has_bloom_filter() ? At callsites it reads like this should return a filter Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@109 PS6, Line 109: min_max_filter > has_min_max_filter? Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc@1220 PS6, Line 1220: MinMaxFilter::Or(src_type(), params.min_max_filter, min_max_filter_.get()); > I mentioned this in a later comment, but we should consider what happens if Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc@235 PS6, Line 235: MinMaxFilter* min_max_filter = MinMaxFilter::Create(type, &obj_pool_); > (nit) - could combine into one line. Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc File be/src/runtime/runtime-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc@27 PS6, Line 27: // to a) the atomicity of / pointer assignments and b) the x86 TSO memory model. > Comment not relevant any more since we're using actual atomics? Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc@53 PS6, Line 53: min_str = string(value->ptr, value->len); > We should think through the behaviour with very large strings. It may make Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h@83 PS6, Line 83: #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE) \ > Yeah, the macro seems like it may be the least of the evils. Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@24 PS6, Line 24: " > #include "runtime/timestamp-value.inline.h" Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@77 PS6, Line 77: std::string NAME##MinMaxFilter::DebugString() const { \ > nit: don't need std:: prefix in .cc files. Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@139 PS6, Line 139: out->min.__set_string_val(std::min(in.min.string_val, out->min.string_val)); > I feel like we should use our StringValue::Compare() function instead of re Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291 PS6, Line 291: BigIntMinMaxFilter::Or(in, out); > Shouldn't this be calling the timestamp method? TColumnData doesn't have a timestamp field, so I convert timestamps with UtcToUnixTimeMicros and pass them around in thrift as longs. (noted in a comment now) Unless I'm missing a reason that comparing timestamps with way won't lead to the same results as comparing the unconverted TimestampValues. http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@327 PS6, Line 327: } > Timestamp? Same as above. http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@379 PS6, Line 379: if (slotRef == null || slotRef.getDesc().getColumn() == null > Are there planner tests for all these cases? Done http://gerrit.cloudera.org:8080/#/c/7793/6/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/6/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@30 PS6, Line 30: kudu > Can we make this something like table_format=kudu so that it's a bit more s Done http://gerrit.cloudera.org:8080/#/c/7793/6/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/7793/6/tests/query_test/test_runtime_filters.py@61 PS6, Line 61: lambda v: v.get_value('table_format').file_format not in ['hbase', 'kudu']) > nit: 4 character indents on line continuations (I guess this is preserved f Done http://gerrit.cloudera.org:8080/#/c/7793/6/tests/util/test_file_parser.py File tests/util/test_file_parser.py: http://gerrit.cloudera.org:8080/#/c/7793/6/tests/util/test_file_parser.py@252 PS6, Line 252: allowed_formats = ['kudu'] > Thanks for adding this validation, hopefully will lead people in the right Done http://gerrit.cloudera.org:8080/#/c/7793/6/tests/util/test_file_parser.py@254 PS6, Line 254: hint > hint? Isn't it a table format? Done -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 18 Oct 2017 19:08:39 +0000 Gerrit-HasComments: Yes