Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 )
Change subject: IMPALA-4252: Min-max runtime filters for Kudu ...................................................................... Patch Set 6: (32 comments) http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG@9 PS5, Line 9: This patch implements min-max filters for runtime filters. Each > I had a general terminology question that popped up as I read through. Does My intention (which I probably wasn't super consistent about, I'll try to clean it up) was that a "runtime filter" corresponds to a single filter id and might contain and bloom and/or min-max (eventually and/or in-list) filters. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@80 PS5, Line 80: /// FilterContext contains all metadata for a single runtime filter, and allows the filter > This seems to be an example of a place where "runtime filter" could potenti Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@107 PS5, Line 107: > Can you add a blank line between methods? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@422 PS5, Line 422: col_type, filter_expr->type().ToIR(codegen), "expr_type_arg"); > Why is decimal special here? Decimal isn't supported for Kudu, so I didn't implement min-max filters for it since they wouldn't be testable. I special cased it here because you might have a decimal bloom filter and you would hit this code path, but a better solution is to just check if the filter desc says to build a min-max filter, which will never be the case if the type is decimal. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@434 PS5, Line 434: // Call Insert() on the bloom filter. > Can we make this a lookup table? Seems like the only difference between the Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc@401 PS5, Line 401: if (filter->BloomAlwaysTrue() > Can we phrase the method in a way that doesn't depend on the assumption tha I changed this to highlight the fact that the hdfs scanner only deals with bloom filters for now. The runtime filter here might have a min-max filter, if it also has a kudu scan target, and since min-max filters don't really have a concept of "AlwaysTrue" we would have to say the filter isn't always true. That would make sense if we were actually applying the min-max filter here, but its sort of weird since we're only really applying the bloom filter. It can then be changed to not be bloom specific when we add consideration of min-max filters here. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@174 PS5, Line 174: for (const FilterContext& ctx : scan_node_->filter_ctxs_) { > How important is this and is it blocked by anything else? If it's impactful Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@175 PS5, Line 175: filt > nullptr Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@177 PS5, Line 177: se()) { > dynamic_cast seems unnecessary since the type should always be correct - ca Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@179 PS5, Line 179: eCurre > const string&? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc@231 PS5, Line 231: int64_t ts_micros; > This timestamp code is a bit subtle. Is there a sane way to combine the con Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc@548 PS5, Line 548: IMER(repartition_ti > Is this meant to double-count the filters if there is a bloom and min-max f You're right, this needs to be counting "runtime filters" (corresponding to a single filter id/FilterContext). This gets a little confusing, since you might have a bloom filter that gets disabled but its not counted as disabled here because the min-max filter isn't disabled, but it'll make a lot more sense in the long run goal here of having all scanners support all filter types (and of course the fact that the bloom filter is disabled in that situation will still show up in the runtime profile info for the hdfs scan node). http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/coordinator-filter-state.h File be/src/runtime/coordinator-filter-state.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/coordinator-filter-state.h@92 PS5, Line 92: all > all Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/coordinator-filter-state.h@131 PS5, Line 131: bool bloom_disabled_; > Blank line? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.h@74 PS5, Line 74: > nit: quote 'bloom_filter' and 'min_max_filter' for consistency. Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.cc@202 PS5, Line 202: MinMaxFilter::Create(params.min_max_filter, it->second->type(), &obj_pool_); > In other places we've designed APIs so that the ObjectPool is passed into C Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.cc@230 PS5, Line 230: null > nullptr? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@46 PS5, Line 46: : bloom_filter_(nullptr), min_max_filter_(nullptr), filter_desc_(filter), > Not your code, but seems like this should be in the initialiser list. Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@94 PS5, Line 94: /// Class name in LLVM IR. > The thread safety around these members is really confusing. I have a few su Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@99 PS5, Line 99: /// it does not filter any rows, either because it was not created > I think this should be an AtomicPtr Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@102 PS5, Line 102: AtomicPtr<BloomFilter> bloom_filter_; > I think this should be an AtomicPtr Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@108 PS5, Line 108: const TRuntimeFilterDesc& filter_desc_; > This doesn't change so let's make it const. Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@110 PS5, Line 110: /// Time, in ms, that the filter was registered. > I think this should be an AtomicInt. Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@114 PS5, Line 114: AtomicInt64 arrival_time_; > This doesn't change so let's make it const. Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.inline.h File be/src/runtime/runtime-filter.inline.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.inline.h@41 PS5, Line 41: DCHECK(bloom_filter_.Load() == nullptr && min_max_filter_.Load() == nullptr); > nullptr? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.inline.h@42 PS5, Line 42: bloom_filter_.Store(bloom_filter); > This TODO is really confusing. I don't understand the reasoning. IMO we sho Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@35 PS5, Line 35: /// the appropriate type. Values can then be added using Insert(), and the min and max can > It might be better to make these proper classes and hide the internals? Alt Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@38 PS5, Line 38: /// MinMaxFilters ignore NULL values, and so are only appropriate to use as a runtime > It's worth documenting in the class comment what representation is used (I Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@91 PS5, Line 91: virtual ~NAME##MinMaxFilter() {} \ > I find it helpful to add override annotations to overridden methods for doc Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@122 PS5, Line 122: min_str = ""; > Is it valid to call these methods if no values have been added? No. Noted in a comment in MinMaxFilter. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@126 PS5, Line 126: e = true; > I don't think adding IR_ALWAYS_INLINE to a virtual method will do anything Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.cc@118 PS5, Line 118: return PrimitiveType::TYPE_STRING; > I don't think we should rely on the uniquifying numbers produced by LLVM, s Agreed. I found the code necessary to do subclasses to be tricky/ugly, so instead I just defined each as a separate class and used macros instead of templates to remove the duplication. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 06 Oct 2017 17:38:48 +0000 Gerrit-HasComments: Yes