Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10937 )

Change subject: IMPALA-3825: [WIP]Distributed RuntimeFiltering Work in progress 
patch for Distributing runtime filtering aggregation.
......................................................................


Patch Set 1:

(9 comments)

I added some high level comments. After addressing these high level comments, 
you can clean up the code so we can do a second round of reviews.

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/common/global-flags.cc@132
PS1, Line 132: DEFINE_bool(enable_distributed_filter_aggregation, true,
             :     "Enables aggregation of filters"
             :     "in a distributed manner, set to false to revert to 
coordinator based aggregation");
After looking through the code, it seems better to avoid having this flag. 
Let's get rid of coordinator-filter-state.* completely, and go with the new 
approach you've done. When we need to compare the 2 approaches for benchmarking 
purposes, we can use 2 branches.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@93
PS1, Line 93: x.second.ToThrift(&tfs);
Move inside if(), since it's only needed in that case.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@94
PS1, Line 94:     AggregatorRoutingTable::const_iterator it = 
aggregator_routing_table.find(x.first);
            :     if (it != aggregator_routing_table.end()) {
            :       tfs.__set_aggregator_address(it->second);
            :       rpc_params->filter_routing_table.insert(
            :           std::pair<int32_t, TFilterState>(x.first, tfs));
            :     }
Wouldn't all filters that are present in 'filter_routing_table' be present in 
'aggregator_routing_table' ?

If that's the case, we're doing the same work multiple times here (once per 
BackendState), and we would probably be better off not having the 
'aggregator_routing_table' at all and just set the aggregator address in 
'FilterState' as part of InitFilterRoutingTable().


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@101
PS1, Line 101:
We also don't need to send the filter routing table to the nodes that don't 
aggregate or don't produce right? If it's not simple to do, you can leave a 
TODO for now.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator.cc@284
PS1, Line 284:
We can set the aggregator address in the FilterState here, and not have the 
aggregator_routing_table.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-exec-mgr.h@22
PS1, Line 22: #include <memory>
None of these new #includes should be needed since you haven't added any new 
code to this file.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-state.cc@559
PS1, Line 559:
Add a TODO here to call PublishFilterFromAggregator() asynchronously using a 
thread pool.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/runtime-filter-bank.cc@205
PS1, Line 205:       SendFilterToAggregator(
What if the aggregator and the producer are on the same node? We can consider 
short-circuiting that path. Add a TODO for that.


http://gerrit.cloudera.org:8080/#/c/10937/1/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/10937/1/bin/bootstrap_toolchain.py@430
PS1, Line 430: "flatbuffers", "gcc", "gflags", "glog",
I'm guessing this was a mistake?



--
To view, visit http://gerrit.cloudera.org:8080/10937
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0289ee39787d8e9c13a16c3d7d64f471bc554536
Gerrit-Change-Number: 10937
Gerrit-PatchSet: 1
Gerrit-Owner: Rahul Shivu Mahadev <rahul.maha...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Jul 2018 16:42:32 +0000
Gerrit-HasComments: Yes

Reply via email to