Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13882 )

Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC
......................................................................


Patch Set 24:

(17 comments)

Sorry for the delays in reviews and thanks for the detailed replies. Please see 
some more comments below.

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator-filter-state.h@124
PS24, Line 124: string
Please also document what this field stands for.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1103
PS24, Line 1103:    // Workaround for fact that parameters are const& for 
Protobuf RPCs.
               :     BloomFilterPB* non_const_filter = 
&const_cast<BloomFilterPB&>(params->bloom_filter());
I don't think this makes sense anymore with the implementation using protobuf. 
The reason is that the old Thrift implementation embedded the entire string 
directory in it but in this new implementation, it's stored in the sidecar so 
we aren't saving much if anything by doing the swap trick below.

We can just initialize the BloomFilterPB by copying if necessary.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1108
PS24, Line 1108: if (!bloom_filter_.has_log_bufferpool_space()) {
May be I am missing the details here but why is it not a no-op if the incoming 
filter is always false ?


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1110
PS24, Line 1110: bloom_filter_.Swap(non_const_filter);
See comments above.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1132
PS24, Line 1132:  bloom_filter_.Swap(non_const_filter);
See comments above.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1140
PS24, Line 1140: sidecar_slice.ToString(),
Seems less expensive if we can pass the slice directly. This avoids doing a 
copy from the slice's buffer into the string argument.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/coordinator.cc@1141
PS24, Line 1141: bloom_filter_directory_
May need to do some const casting but please consider using:

  bloom_filter_directory_.data()


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/query-state.h@202
PS24, Line 202: const PublishFilterParamsPB*
Why not keep it as const PublishFilterParamsPB& ?


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@174
PS24, Line 174: UpdateFilterParamsPB* params = obj_pool_.Add(new 
UpdateFilterParamsPB);
> Thank you for pointing this out! You are correct. 'params' can be freed onc
Or you can just allocate it on the stack as it isn't too large:

   UpdateFilterParamsPB params;


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@175
PS24, Line 175:     UpdateFilterResultPB* res = obj_pool_.Add(new 
UpdateFilterResultPB);
              :     RpcController* controller = obj_pool_.Add(new 
RpcController);
> Thank you very much for the suggestion!
Don't think it's safe to free the memory in the completion callback as the KRPC 
stack may still have reference to it.

One way to share the parameters is to ensure there is one outstanding RPC per 
controller object but it's probably not worth the trouble so I am fine keep 
them as-is.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@196
PS24, Line 196: Couldn't send filter to coordinator: "
Substitute("Failed to get proxy to coordinator $0: $1", host_address, 
get_proxy_status.msg().msg());


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@240
PS24, Line 240: LOG(WARNING) << "Cannot get inbound sidecar:
LOG(ERROR) << "Failed to get bloom filter sidecar" << ...


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/runtime/runtime-filter-bank.cc@320
PS24, Line 320:   // Wait for all inflight rpcs to complete before closing the 
filters.
              :   {
              :     std::unique_lock<SpinLock> l1(num_inflight_rpcs_lock_);
              :     while (num_inflight_rpcs_ > 0) {
              :       krpcs_done_cv_.wait(l1);
              :     }
              :   }
              :
              :   lock_guard<mutex> l2(runtime_filter_lock_);
              :   closed_ = true;
> That 'propagation stage' above should be 'aggregation stage' instead. Sorry
Actually, it's possible for RuntimeFilterBank::UpdateFilterFromLocal() and 
FragmentInstanceState::Close() to be called from different thread contexts as 
the existing code still has async build thread 
(https://github.com/apache/impala/blob/master/be/src/exec/blocking-join-node.cc#L212-L215).

However, we will always wait for that build thread to complete before returning 
(https://github.com/apache/impala/blob/master/be/src/exec/blocking-join-node.cc#L228-L230)
 so the build thread should have exited by the time we call Close() in the 
fragment instance thread so it could't have issued an RPC again when we reach 
this function.

May be worth adding a comment about it:

The async build thread should have exited by the time Close() is called so 
there shouldn't be any new RPCs being issued when this function is called.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/data-stream-service.cc@119
PS24, Line 119: req
Per comments elsewhere, we usually pass const value by reference *req.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/service/impala-server.h@348
PS24, Line 348: const UpdateFilterParamsPB*
Use const UpdateFilterParamsPB& params. See comments in data-stream-service.cc


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/util/bloom-filter.cc@218
PS24, Line 218: const string& directory_in
Please see comments in coordinator.cc. Please consider passing in the slide 
directly instead.


http://gerrit.cloudera.org:8080/#/c/13882/24/be/src/util/bloom-filter.cc@219
PS24, Line 219: string*
This may be char* instead.



--
To view, visit http://gerrit.cloudera.org:8080/13882
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a
Gerrit-Change-Number: 13882
Gerrit-PatchSet: 24
Gerrit-Owner: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Oct 2019 00:30:55 +0000
Gerrit-HasComments: Yes

Reply via email to