Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
......................................................................


Patch Set 6:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG@9
PS5, Line 9: This patch implements min-max filters for runtime filters. Each
> I had a general terminology question that popped up as I read through. Does
My intention (which I probably wasn't super consistent about, I'll try to clean 
it up) was that a "runtime filter" corresponds to a single filter id and might 
contain and bloom and/or min-max (eventually and/or in-list) filters.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@80
PS5, Line 80: /// FilterContext contains all metadata for a single runtime 
filter, and allows the filter
> This seems to be an example of a place where "runtime filter" could potenti
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@107
PS5, Line 107:
> Can you add a blank line between methods?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@422
PS5, Line 422:         col_type, filter_expr->type().ToIR(codegen), 
"expr_type_arg");
> Why is decimal special here?
Decimal isn't supported for Kudu, so I didn't implement min-max filters for it 
since they wouldn't be testable.

I special cased it here because you might have a decimal bloom filter and you 
would hit this code path, but a better solution is to just check if the filter 
desc says to build a min-max filter, which will never be the case if the type 
is decimal.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@434
PS5, Line 434:     // Call Insert() on the bloom filter.
> Can we make this a lookup table? Seems like the only difference between the
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc@401
PS5, Line 401:     if (filter->BloomAlwaysTrue()
> Can we phrase the method in a way that doesn't depend on the assumption tha
I changed this to highlight the fact that the hdfs scanner only deals with 
bloom filters for now.

The runtime filter here might have a min-max filter, if it also has a kudu scan 
target, and since min-max filters don't really have a concept of "AlwaysTrue" 
we would have to say the filter isn't always true.

That would make sense if we were actually applying the min-max filter here, but 
its sort of weird since we're only really applying the bloom filter.

It can then be changed to not be bloom specific when we add consideration of 
min-max filters here.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@174
PS5, Line 174:     for (const FilterContext& ctx : scan_node_->filter_ctxs_) {
> How important is this and is it blocked by anything else? If it's impactful
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@175
PS5, Line 175: filt
> nullptr
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@177
PS5, Line 177: se()) {
> dynamic_cast seems unnecessary since the type should always be correct - ca
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@179
PS5, Line 179: eCurre
> const string&?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc@231
PS5, Line 231:       int64_t ts_micros;
> This timestamp code is a bit subtle. Is there a sane way to combine the con
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc@548
PS5, Line 548: IMER(repartition_ti
> Is this meant to double-count the filters if there is a bloom and min-max f
You're right, this needs to be counting "runtime filters" (corresponding to a 
single filter id/FilterContext).

This gets a little confusing, since you might have a bloom filter that gets 
disabled but its not counted as disabled here because the min-max filter isn't 
disabled, but it'll make a lot more sense in the long run goal here of having 
all scanners support all filter types (and of course the fact that the bloom 
filter is disabled in that situation will still show up in the runtime profile 
info for the hdfs scan node).


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/coordinator-filter-state.h@92
PS5, Line 92: all
> all
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/coordinator-filter-state.h@131
PS5, Line 131:   bool bloom_disabled_;
> Blank line?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.h@74
PS5, Line 74:
> nit: quote 'bloom_filter' and 'min_max_filter' for consistency.
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.cc@202
PS5, Line 202:         MinMaxFilter::Create(params.min_max_filter, 
it->second->type(), &obj_pool_);
> In other places we've designed APIs so that the ObjectPool is passed into C
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter-bank.cc@230
PS5, Line 230: null
> nullptr?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@46
PS5, Line 46:       : bloom_filter_(nullptr), min_max_filter_(nullptr), 
filter_desc_(filter),
> Not your code, but seems like this should be in the initialiser list.
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@94
PS5, Line 94:   /// Class name in LLVM IR.
> The thread safety around these members is really confusing. I have a few su
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@99
PS5, Line 99:   /// it does not filter any rows, either because it was not 
created
> I think this should be an AtomicPtr
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@102
PS5, Line 102:   AtomicPtr<BloomFilter> bloom_filter_;
> I think this should be an AtomicPtr
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@108
PS5, Line 108:   const TRuntimeFilterDesc& filter_desc_;
> This doesn't change so let's make it const.
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@110
PS5, Line 110:   /// Time, in ms, that the filter was registered.
> I think this should be an AtomicInt.
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.h@114
PS5, Line 114:   AtomicInt64 arrival_time_;
> This doesn't change so let's make it const.
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.inline.h
File be/src/runtime/runtime-filter.inline.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.inline.h@41
PS5, Line 41:   DCHECK(bloom_filter_.Load() == nullptr && 
min_max_filter_.Load() == nullptr);
> nullptr?
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/runtime/runtime-filter.inline.h@42
PS5, Line 42:   bloom_filter_.Store(bloom_filter);
> This TODO is really confusing. I don't understand the reasoning. IMO we sho
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@35
PS5, Line 35: /// the appropriate type. Values can then be added using 
Insert(), and the min and max can
> It might be better to make these proper classes and hide the internals? Alt
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@38
PS5, Line 38: /// MinMaxFilters ignore NULL values, and so are only appropriate 
to use as a runtime
> It's worth documenting in the class comment what representation is used (I
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@91
PS5, Line 91:     virtual ~NAME##MinMaxFilter() {}                              
 \
> I find it helpful to add override annotations to overridden methods for doc
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@122
PS5, Line 122:     min_str = "";
> Is it valid to call these methods if no values have been added?
No. Noted in a comment in MinMaxFilter.


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.h@126
PS5, Line 126: e = true;
> I don't think adding IR_ALWAYS_INLINE to a virtual method will do anything
Done


http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/util/min-max-filter.cc@118
PS5, Line 118:   return PrimitiveType::TYPE_STRING;
> I don't think we should rely on the uniquifying numbers produced by LLVM, s
Agreed. I found the code necessary to do subclasses to be tricky/ugly, so 
instead I just defined each as a separate class and used macros instead of 
templates to remove the duplication.



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:38:48 +0000
Gerrit-HasComments: Yes

Reply via email to