Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13882 )
Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC ...................................................................... Patch Set 24: (17 comments) Sorry for the delays in reviews and thanks for the detailed replies. Please see some more comments below. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator-filter-state.h File be/src/runtime/coordinator-filter-state.h: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator-filter-state.h@124 PS24, Line 124: string Please also document what this field stands for. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1103 PS24, Line 1103: // Workaround for fact that parameters are const& for Protobuf RPCs. : BloomFilterPB* non_const_filter = &const_cast<BloomFilterPB&>(params->bloom_filter()); I don't think this makes sense anymore with the implementation using protobuf. The reason is that the old Thrift implementation embedded the entire string directory in it but in this new implementation, it's stored in the sidecar so we aren't saving much if anything by doing the swap trick below. We can just initialize the BloomFilterPB by copying if necessary. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1108 PS24, Line 1108: if (!bloom_filter_.has_log_bufferpool_space()) { May be I am missing the details here but why is it not a no-op if the incoming filter is always false ? http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1110 PS24, Line 1110: bloom_filter_.Swap(non_const_filter); See comments above. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1132 PS24, Line 1132: bloom_filter_.Swap(non_const_filter); See comments above. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1140 PS24, Line 1140: sidecar_slice.ToString(), Seems less expensive if we can pass the slice directly. This avoids doing a copy from the slice's buffer into the string argument. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1141 PS24, Line 1141: bloom_filter_directory_ May need to do some const casting but please consider using: bloom_filter_directory_.data() http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/query-state.h@202 PS24, Line 202: const PublishFilterParamsPB* Why not keep it as const PublishFilterParamsPB& ? http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@174 PS24, Line 174: UpdateFilterParamsPB* params = obj_pool_.Add(new UpdateFilterParamsPB); > Thank you for pointing this out! You are correct. 'params' can be freed onc Or you can just allocate it on the stack as it isn't too large: UpdateFilterParamsPB params; http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@175 PS24, Line 175: UpdateFilterResultPB* res = obj_pool_.Add(new UpdateFilterResultPB); : RpcController* controller = obj_pool_.Add(new RpcController); > Thank you very much for the suggestion! Don't think it's safe to free the memory in the completion callback as the KRPC stack may still have reference to it. One way to share the parameters is to ensure there is one outstanding RPC per controller object but it's probably not worth the trouble so I am fine keep them as-is. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@196 PS24, Line 196: Couldn't send filter to coordinator: " Substitute("Failed to get proxy to coordinator $0: $1", host_address, get_proxy_status.msg().msg()); http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@240 PS24, Line 240: LOG(WARNING) << "Cannot get inbound sidecar: LOG(ERROR) << "Failed to get bloom filter sidecar" << ... http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@320 PS24, Line 320: // Wait for all inflight rpcs to complete before closing the filters. : { : std::unique_lock<SpinLock> l1(num_inflight_rpcs_lock_); : while (num_inflight_rpcs_ > 0) { : krpcs_done_cv_.wait(l1); : } : } : : lock_guard<mutex> l2(runtime_filter_lock_); : closed_ = true; > That 'propagation stage' above should be 'aggregation stage' instead. Sorry Actually, it's possible for RuntimeFilterBank::UpdateFilterFromLocal() and FragmentInstanceState::Close() to be called from different thread contexts as the existing code still has async build thread (https://github.com/apache/impala/blob/master/be/src/exec/blocking-join-node.cc#L212-L215). However, we will always wait for that build thread to complete before returning (https://github.com/apache/impala/blob/master/be/src/exec/blocking-join-node.cc#L228-L230) so the build thread should have exited by the time we call Close() in the fragment instance thread so it could't have issued an RPC again when we reach this function. May be worth adding a comment about it: The async build thread should have exited by the time Close() is called so there shouldn't be any new RPCs being issued when this function is called. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/data-stream-service.cc File be/src/service/data-stream-service.cc: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/data-stream-service.cc@119 PS24, Line 119: req Per comments elsewhere, we usually pass const value by reference *req. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/impala-server.h@348 PS24, Line 348: const UpdateFilterParamsPB* Use const UpdateFilterParamsPB& params. See comments in data-stream-service.cc http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/util/bloom-filter.cc@218 PS24, Line 218: const string& directory_in Please see comments in coordinator.cc. Please consider passing in the slide directly instead. http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/util/bloom-filter.cc@219 PS24, Line 219: string* This may be char* instead. -- To view, visit http://gerrit.cloudera.org:8080/13882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a Gerrit-Change-Number: 13882 Gerrit-PatchSet: 24 Gerrit-Owner: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Thu, 10 Oct 2019 00:30:55 +0000 Gerrit-HasComments: Yes