Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use 
KRPC
......................................................................


Patch Set 8:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/data-sink.cc@49
PS8, Line 49: const char* DataSink::ROOT_PARTITION_KEY = "";
> this should be const char* const -- ie it's a const pointer to const charac
True. Same comment probably applies to other classes too (e.g. 
https://github.com/apache/impala/blob/master/be/src/exec/hash-table.h#L196)


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h
File be/src/exec/hdfs-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/exec/hdfs-table-writer.h@91
PS8, Line 91:   const InsertStatsPB& stats() { return stats_; }
> probably should be a const function as well?
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.h@211
PS8, Line 211:   Coordinator* coord_; /// Coordinator object that owns this 
BackendState
> this went from const to non-const. does this mutate the coordinator that ow
Indirectly. We are updating the coord_->dml_exec_state_(). The locking is done 
inside dml_exec_state_ itself so it should be thread safe.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@50
PS8, Line 50:   : coord_(coord),
> nit: you could just do coord_(DHECK_NOTNULL(coord)) here since DCHECK_NOTNU
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@262
PS8, Line 262:     lock.unlock();
> I'm nervous about this optimization. Do we really have data that deserializ
The reason for doing it here is to handle the case in which we can have 0 to # 
fragment instances (after IMPALA-4063) entries in "backend_exec_status. 
instance_exec_status". We can have 0 entries if say we failed to start a 
thread. So, while we can do it in the RPC handler, it's a bit awkward to handle 
the zero entry case.

The new patch moves the profile sidecar idx into backend_exec_status from 
backend_exec_status.instance_exec_status. That allows us to keep the 
deserialization in the ControlService::ReportExecStatus().

In the patch for IMPALA-4063, we will just replace TRuntimeProfileTree with a 
list<TRuntimeProfileTree> instead.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@310
PS8, Line 310: coord_
> I feel that allowing for a non-const pointer to the Coordinator object to t
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@317
PS8, Line 317: MergeErrorMaps(instance_exec_status.error_log(), &error_log_);
> This is less severe, but I guess that this is already wrong too. We potenti
The new report has a monotonically increasing version number. This is a variant 
of IMPALA-7241.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@516
PS8, Line 516: lock_guard<SpinLock> l1(exec_summary->lock);
> Are we violating any lock ordering here?
Yes, I also looked and didn't see anything.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@318
PS8, Line 318: bool done
> trying to understand the purpose of this parameter. I don't think this shou
I agree in general. Let me add a TODO.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@331
PS8, Line 331:   ThriftSerializer serializer(true);
> Why construct this here and pass it into ConstructReport? It seems Construc
The serialization output is owned by ThriftSerializer. My mistake to use a 
string below. I was using (buffer, len) earlier to avoid the extra copying. 
Also added comments about the lifetime of the buffer.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@359
PS8, Line 359:       CHECK(sidecar_status.ok())
             :           << FromKuduStatus(sidecar_status, "Failed to add 
sidecar").GetDetail();
> is there any chance that the profile would extend past max RPC size? I gues
Added a CHECK for it in ConstructReport().


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/query-state.cc@373
PS8, Line 373:     if (i < 2) SleepForMs(FLAGS_report_status_retry_interval_ms);
> worth a LOG(WARNING) here?
We have some logging already below at line 385.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc@76
PS8, Line 76:
> I feel that we can still keep the deserialization here and just check for t
The complication comes from the fact the sidecar index is stashed inside 
instance_exec_status. The new patch has moved that index 
ReportExecStatusRequestPB which also makes it easier for IMPALA-4063 when there 
are multiple profiles. This allows us to handle the case where 
instance_exec_status_size() == 0 here.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h
File be/src/util/uid-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/util/uid-util.h@61
PS8, Line 61:   unique_id_pb->set_lo(t_unique_id.lo);
> worth a DCHECK that t_unique_id.__isset_.lo and hi?
Both lo and hi are required fields in TUniqueId. Updated UniqueIdPB to reflect 
that too. Don't recall why those fields were set to optional when it was ported 
to protobuf.


http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@140
PS8, Line 140: This is sent only on completion of a fragment instance.
> can you be more explicit and say that this is sent only when 'done' is true
Done


http://gerrit.cloudera.org:8080/#/c/10855/8/common/protobuf/control_service.proto@145
PS8, Line 145:   // Map of TErrorCode to ErrorLogEntryPB; New errors that have 
not been reported to
             :   // the coordinator by this fragment instance. Not idempotent. 
The done flag helps
             :   // prevent applying the same update twice.
             :   map<int32, ErrorLogEntryPB> error_log = 7;
> so this is only sent when 'done' is true? if not, then the done flag doesn'
That's true. Guess we do need a version number of some sort as part of 
FragmentInstanceExecStatusPB



--
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:14:56 +0000
Gerrit-HasComments: Yes

Reply via email to