Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10937 )
Change subject: IMPALA-3825: [WIP]Distributed RuntimeFiltering Work in progress patch for Distributing runtime filtering aggregation. ...................................................................... Patch Set 1: (9 comments) I added some high level comments. After addressing these high level comments, you can clean up the code so we can do a second round of reviews. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/common/global-flags.cc@132 PS1, Line 132: DEFINE_bool(enable_distributed_filter_aggregation, true, : "Enables aggregation of filters" : "in a distributed manner, set to false to revert to coordinator based aggregation"); After looking through the code, it seems better to avoid having this flag. Let's get rid of coordinator-filter-state.* completely, and go with the new approach you've done. When we need to compare the 2 approaches for benchmarking purposes, we can use 2 branches. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@93 PS1, Line 93: x.second.ToThrift(&tfs); Move inside if(), since it's only needed in that case. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@94 PS1, Line 94: AggregatorRoutingTable::const_iterator it = aggregator_routing_table.find(x.first); : if (it != aggregator_routing_table.end()) { : tfs.__set_aggregator_address(it->second); : rpc_params->filter_routing_table.insert( : std::pair<int32_t, TFilterState>(x.first, tfs)); : } Wouldn't all filters that are present in 'filter_routing_table' be present in 'aggregator_routing_table' ? If that's the case, we're doing the same work multiple times here (once per BackendState), and we would probably be better off not having the 'aggregator_routing_table' at all and just set the aggregator address in 'FilterState' as part of InitFilterRoutingTable(). http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@101 PS1, Line 101: We also don't need to send the filter routing table to the nodes that don't aggregate or don't produce right? If it's not simple to do, you can leave a TODO for now. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator.cc@284 PS1, Line 284: We can set the aggregator address in the FilterState here, and not have the aggregator_routing_table. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-exec-mgr.h@22 PS1, Line 22: #include <memory> None of these new #includes should be needed since you haven't added any new code to this file. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-state.cc@559 PS1, Line 559: Add a TODO here to call PublishFilterFromAggregator() asynchronously using a thread pool. http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/runtime-filter-bank.cc@205 PS1, Line 205: SendFilterToAggregator( What if the aggregator and the producer are on the same node? We can consider short-circuiting that path. Add a TODO for that. http://gerrit.cloudera.org:8080/#/c/10937/1/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/10937/1/bin/bootstrap_toolchain.py@430 PS1, Line 430: "flatbuffers", "gcc", "gflags", "glog", I'm guessing this was a mistake? -- To view, visit http://gerrit.cloudera.org:8080/10937 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0289ee39787d8e9c13a16c3d7d64f471bc554536 Gerrit-Change-Number: 10937 Gerrit-PatchSet: 1 Gerrit-Owner: Rahul Shivu Mahadev <rahul.maha...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Comment-Date: Thu, 19 Jul 2018 16:42:32 +0000 Gerrit-HasComments: Yes