Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )
Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC ...................................................................... Patch Set 8: (16 comments) http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc@49 PS8, Line 49: const char* DataSink::ROOT_PARTITION_KEY = ""; > this should be const char* const -- ie it's a const pointer to const charac True. Same comment probably applies to other classes too (e.g. https://github.com/apache/impala/blob/master/be/src/exec/hash-table.h#L196) http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h File be/src/exec/hdfs-table-writer.h: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h@91 PS8, Line 91: const InsertStatsPB& stats() { return stats_; } > probably should be a const function as well? Done http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h@211 PS8, Line 211: Coordinator* coord_; /// Coordinator object that owns this BackendState > this went from const to non-const. does this mutate the coordinator that ow Indirectly. We are updating the coord_->dml_exec_state_(). The locking is done inside dml_exec_state_ itself so it should be thread safe. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@50 PS8, Line 50: : coord_(coord), > nit: you could just do coord_(DHECK_NOTNULL(coord)) here since DCHECK_NOTNU Done http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@262 PS8, Line 262: lock.unlock(); > I'm nervous about this optimization. Do we really have data that deserializ The reason for doing it here is to handle the case in which we can have 0 to # fragment instances (after IMPALA-4063) entries in "backend_exec_status. instance_exec_status". We can have 0 entries if say we failed to start a thread. So, while we can do it in the RPC handler, it's a bit awkward to handle the zero entry case. The new patch moves the profile sidecar idx into backend_exec_status from backend_exec_status.instance_exec_status. That allows us to keep the deserialization in the ControlService::ReportExecStatus(). In the patch for IMPALA-4063, we will just replace TRuntimeProfileTree with a list<TRuntimeProfileTree> instead. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@310 PS8, Line 310: coord_ > I feel that allowing for a non-const pointer to the Coordinator object to t Done http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@317 PS8, Line 317: MergeErrorMaps(instance_exec_status.error_log(), &error_log_); > This is less severe, but I guess that this is already wrong too. We potenti The new report has a monotonically increasing version number. This is a variant of IMPALA-7241. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@516 PS8, Line 516: lock_guard<SpinLock> l1(exec_summary->lock); > Are we violating any lock ordering here? Yes, I also looked and didn't see anything. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@318 PS8, Line 318: bool done > trying to understand the purpose of this parameter. I don't think this shou I agree in general. Let me add a TODO. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@331 PS8, Line 331: ThriftSerializer serializer(true); > Why construct this here and pass it into ConstructReport? It seems Construc The serialization output is owned by ThriftSerializer. My mistake to use a string below. I was using (buffer, len) earlier to avoid the extra copying. Also added comments about the lifetime of the buffer. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@359 PS8, Line 359: CHECK(sidecar_status.ok()) : << FromKuduStatus(sidecar_status, "Failed to add sidecar").GetDetail(); > is there any chance that the profile would extend past max RPC size? I gues Added a CHECK for it in ConstructReport(). http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@373 PS8, Line 373: if (i < 2) SleepForMs(FLAGS_report_status_retry_interval_ms); > worth a LOG(WARNING) here? We have some logging already below at line 385. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc@76 PS8, Line 76: > I feel that we can still keep the deserialization here and just check for t The complication comes from the fact the sidecar index is stashed inside instance_exec_status. The new patch has moved that index ReportExecStatusRequestPB which also makes it easier for IMPALA-4063 when there are multiple profiles. This allows us to handle the case where instance_exec_status_size() == 0 here. http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h File be/src/util/uid-util.h: http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h@61 PS8, Line 61: unique_id_pb->set_lo(t_unique_id.lo); > worth a DCHECK that t_unique_id.__isset_.lo and hi? Both lo and hi are required fields in TUniqueId. Updated UniqueIdPB to reflect that too. Don't recall why those fields were set to optional when it was ported to protobuf. http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@140 PS8, Line 140: This is sent only on completion of a fragment instance. > can you be more explicit and say that this is sent only when 'done' is true Done http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@145 PS8, Line 145: // Map of TErrorCode to ErrorLogEntryPB; New errors that have not been reported to : // the coordinator by this fragment instance. Not idempotent. The done flag helps : // prevent applying the same update twice. : map<int32, ErrorLogEntryPB> error_log = 7; > so this is only sent when 'done' is true? if not, then the done flag doesn' That's true. Guess we do need a version number of some sort as part of FragmentInstanceExecStatusPB -- To view, visit http://gerrit.cloudera.org:8080/10855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe Gerrit-Change-Number: 10855 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Thu, 09 Aug 2018 00:14:56 +0000 Gerrit-HasComments: Yes