[Impala-ASF-CR] Revert "IMPALA-8146: Remove make {debug,release,asan}.sh"
Michael Ho has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12335 ) Change subject: Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh" .. Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh" This reverts commit e8ea4b0525fb3223da9291775eff4b7fbd15c547. Apparently some users of Impala have some workflow which rely on these scripts. Reverting this change to unblock those users. Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1 Reviewed-on: http://gerrit.cloudera.org:8080/12335 Reviewed-by: Philip Zeyliger Tested-by: Michael Ho --- A bin/make_asan.sh A bin/make_debug.sh A bin/make_release.sh 3 files changed, 60 insertions(+), 0 deletions(-) Approvals: Philip Zeyliger: Looks good to me, approved Michael Ho: Verified -- To view, visit http://gerrit.cloudera.org:8080/12335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1 Gerrit-Change-Number: 12335 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Revert "IMPALA-8146: Remove make {debug,release,asan}.sh"
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12335 ) Change subject: Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh" .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1 Gerrit-Change-Number: 12335 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 01 Feb 2019 19:31:33 + Gerrit-HasComments: No
[Impala-ASF-CR] Revert "IMPALA-8146: Remove make {debug,release,asan}.sh"
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12335 Change subject: Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh" .. Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh" This reverts commit e8ea4b0525fb3223da9291775eff4b7fbd15c547. Apparently some users of Impala have some workflow which rely on these scripts. Reverting this change to unblock those users. Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1 --- A bin/make_asan.sh A bin/make_debug.sh A bin/make_release.sh 3 files changed, 60 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/12335/1 -- To view, visit http://gerrit.cloudera.org:8080/12335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1 Gerrit-Change-Number: 12335 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h@137 PS2, Line 137: boost::lock_guard l(lock_); In theory, this doesn't need any locking as read / write of a 64-bit word should be atomic as long as its' 64-bit aligned. http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h@153 PS2, Line 153: /// Returns a timestamp for use in tracking arrival of status reports. May help to state explicitly that this uses the monotonic time instead of Unix time. http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.cc@89 PS2, Line 89: last_report_time_ms_ = GenerateReportTimestamp(); Should this clock not be started until we have issued all the initial ExecFInstance() RPC ? I can see there may be cases in which starting a query in a large cluster can take a bit of time so we may be more prone to false positive. http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator.cc@747 PS2, Line 747: for (BackendState* backend_state : backend_states_) { Is there any race with the init code which populates backend_states_ ? Shouldn't the detection be skipped until the backend_states_ be fully populated ? Does this handle the case in which a backend is done so no more new report will be received ? http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/service/impala-server.cc@215 PS2, Line 215: (max_report_lag_s, 30, Shouldn't the lag be dependent on the reporting interval ? Ideally, we should mark the query as hung after x * reporting interval has elapsed without receiving a report. In addition, we need to handle the case of periodic reporting being disabled by the user. In that case, we probably need to disable this detection. -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 31 Jan 2019 20:18:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 4: (9 comments) Looking good. Some more minor comments. http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/coordinator-backend-state.cc@344 PS4, Line 344: for (const auto& stateful_report : instance_exec_status.stateful_report()) { DCHECK_LE(stateful_report.report_seq_no(), report_seq_no()); http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.h@170 PS4, Line 170: recieved typo http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.h@171 PS4, Line 171: stateful_reports_ nit: prev_stateful_reports_; http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.cc@253 PS4, Line 253: if (final_report_saved_) { Please see comments below about some potential simplification we can do. http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.cc@279 PS4, Line 279: {stateful_reports_.begin(), stateful_reports_.end()}; nit: indent 4 http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/fragment-instance-state.cc@298 PS4, Line 298: if (instance_exec_status.done()) { As discussed in person, this could have been simpler by not storing the final profile here. We just have to keep track of the fact that we don't need to bump the sequence number once the final report has been generated at least once. http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/query-state.h@410 PS4, Line 410: report_ stale ? http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/query-state.h@411 PS4, Line 411: profiles_forest_' stale ? http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/4/be/src/runtime/query-state.cc@563 PS4, Line 563: FLAGS_status_report_interval_ms * (num_failed_reports_ + 1) Given this is quite clunky to write and used at more than one places, may be it makes sense to factor it into a function itself. This also makes changing the logic of sleep time (e.g. exponential backoff) in the future easier. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 31 Jan 2019 19:46:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8111: [DOCS] Take 2: Removed the Fix Version for KUDU-2198
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12311 ) Change subject: IMPALA-8111: [DOCS] Take 2: Removed the Fix Version for KUDU-2198 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I299eeda31de3cea768c1a1627702527501240a8b Gerrit-Change-Number: 12311 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 31 Jan 2019 19:05:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@79 PS4, Line 79: /// Retry the Rpc 'rpc_call' up to 'times_to_try' times. May also want to mention each RPC has a timeout of 'timeout' in some time units. Also, there is no sleeping between retries. http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@84 PS4, Line 84: MonoDelta krpc_timeout Instead of passing the MonoDelta object itself, how about just defining this as timeout in millisecond ? http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@85 PS4, Line 85: DCHECK(times_to_try > 0); DCHECK_GT() http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@92 PS4, Line 92: if (!rpc_status.ok()) continue; Please add a TODO about sleeping between retries in case the server is overloaded. Honestly though, I don't see why we cannot add a sleep time between retries in this patch if the remote server's service queue fills up. Otherwise, the retry loop won't be too effective as we burst sending the same RPCs in such a short period of time. http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc@170 PS3, Line 170: FAULT_INJECTION_RPC_DELAY(RPC_CANCELQUERYFINSTANCES); > These tests are setting delays on server-side commands so as to test RPC ti This can easily be replaced by DebugAction. Please see https://github.com/apache/impala/blob/master/tests/custom_cluster/test_rpc_timeout.py#L145-L150 In general, we should converge on using DebugAction for any type of fault injection instead of building on the old infrastructure. http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc@184 PS3, Line 184: FAULT_INJECTION_RPC_DELAY(RPC_REMOTESHUTDOWN); > These tests are setting delays on server-side commands so as to test RPC ti Please see replies above. -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 31 Jan 2019 19:00:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 7: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node-base.cc@452 PS7, Line 452: MakeScopeExitTrigger([&](){ UpdateRemainingScanRangeIssuances(-1); }); nit: indent 4 http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node.cc@79 PS7, Line 79: nit: blank line -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 30 Jan 2019 22:00:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8138: Reintroduce rpc debugging options
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12297 ) Change subject: IMPALA-8138: Reintroduce rpc debugging options .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/krpc-backend-client.h File be/src/runtime/krpc-backend-client.h: http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/krpc-backend-client.h@27 PS1, Line 27: Please add a class level comment. http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/krpc-backend-client.h@28 PS1, Line 28: class KrpcBackendClient : public ControlServiceProxy Hmm...what about DataStreamService ? This seems to be a client for ControlService's RPCs only ? http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h@314 PS1, Line 314: proxy_ Will client_ be a more appropriate name ? Same for other places. -- To view, visit http://gerrit.cloudera.org:8080/12297 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2046cb9dadf846ea90c04e95781b2bbde3325941 Gerrit-Change-Number: 12297 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 30 Jan 2019 00:55:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_submission > I am now trying remaining_scan_range_submissions_. I'd argue that it's all Thanks for the explanation. I suppose the initial source of confusion had to do with the way to interpret this variable. In particular, there is a minor distinction between (1) the remaining scan ranges to be issued to the IO subsystem for the rest of the scan nodes' lifetime and (2) the "new work" to be picked up by scanner threads. I believe this variables refer to (2). In other words, the scanner thread which picks up the footer range may still submit one or more scan ranges for the actual split it's responsible for after this variable has gone to 0. As far as the scanner thread loop is concerned, there is no more new work so it's safe to exit. Not sure if there is a better name to explain this here but I guess some (hopefully succinct) comments may help. http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: : Status HdfsScanNode::GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos) { : SCOPED_TIMER(runtime_profile_->total_time_counter()); : Sco > You were right and we can move this into Close() once the threads have been I was just wondering if the if-stmt body is empty as DCHECK is not compiled in release build, the if-check automatically becomes dead code. That said, I am not sure the compiler is always smart enough to remove the if-check as the conjuncts being evaluated may have side-effects. Anyhow, this is just a minor point. Feel free to ignore. http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc@196 PS6, Line 196: !(ranges_issued_barrier_.pending() != 0) Why not ranges_issued_barrier_.pending() == 0 ? http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc@197 PS6, Line 197: && initial_ranges_issued_ : && progress_.done()) { nit: one line -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 30 Jan 2019 00:43:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 14: (2 comments) This patch seems to have pretty through review from Phil and Tim and others already. Please feel free to proceed without waiting for more comments from me. Thanks. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc@76 PS11, Line 76: profile_(RuntimeProfile::Create(obj_pool(), "")), > To make it more consistent. The name is ignored since the profile itself is That's fine I guess. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419 PS11, Line 419: Samples : /// are not re-sampled into larger intervals, instead owners of this profile can call : /// ClearChunkedTimeSeriesCounters() to reset the sample buffers of all chunked time : /// series counters > Done Yes, capping it makes sense but warning about it "once" when it becomes full will be good too. -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 14 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 29 Jan 2019 23:35:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc@653 PS2, Line 653: // KRPC relies on resolved IP address, so convert hostname. : IpAddr ip_address; : Status ip_status = HostnameToIpAddr(request.backend.hostname, _address); : if (!ip_status.ok()) { : VLOG(1) << "Could not convert hostname " << request.backend.hostname : << " to ip address, error: " << ip_status.GetDetail(); : return ip_status; : } : TNetworkAddress addr = MakeNetworkAddress(ip_address, port); > In my original (unpublished) prototype I was keeping the original UX where Thanks for the explanation. That's a reasonable scenario in which the backend being shut down may not be found in the backend descriptors. http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc@692 PS2, Line 692: return Status(Substitute("Rpc to $0 failed with error '$1'. This may be because " :"the port specified is a thrift port, which :shutdown() " :"can no longer use. Make sure the port is a KRPC port.", : TNetworkAddressToString(addr), msg)); : } : return Status(Substitute( : "Rpc to $0 failed with error '$1'.", TNetworkAddressToString(addr), msg)); Not sure if a typical user has any understanding of the difference between Thrift port and KPRC port as they are internal to Impala. May we can simplify the statement to be like the following: string err_string = Substitute("Rpc to $0 failed with error '$1'", TNetworkAddressToString(addr), msg); if (msg.find(...) != string::npos) { err_string.append(" Please make sure the right port is being used or don't specify any port in the :shutdown() command"); } return Status(err_string);} http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc@170 PS3, Line 170: FAULT_INJECTION_RPC_DELAY(RPC_CANCELQUERYFINSTANCES); Shouldn't this use a DebugAction instead ? http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc@184 PS3, Line 184: FAULT_INJECTION_RPC_DELAY(RPC_REMOTESHUTDOWN); Shouldn't this use a DebugAction instead ? -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 29 Jan 2019 23:11:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8111: [DOCS] Added KRPC-related known issues
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12291 ) Change subject: IMPALA-8111: [DOCS] Added KRPC-related known issues .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12291/2/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/12291/2/docs/topics/impala_known_issues.xml@254 PS2, Line 254: is was -- To view, visit http://gerrit.cloudera.org:8080/12291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I806bedcb9d16140413bb6a1f42798be9d1b6371c Gerrit-Change-Number: 12291 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 29 Jan 2019 20:58:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8111: [DOCS] Added KRPC-related known issues
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12291 ) Change subject: IMPALA-8111: [DOCS] Added KRPC-related known issues .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/12291/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/12291/1/docs/topics/impala_known_issues.xml@243 PS1, Line 243: impala-7585 IMPALA ? http://gerrit.cloudera.org:8080/#/c/12291/1/docs/topics/impala_known_issues.xml@244 PS1, Line 244: The impala user not added to /etc/passwd when LDAP is : enabled one liner ? http://gerrit.cloudera.org:8080/#/c/12291/1/docs/topics/impala_known_issues.xml@247 PS1, Line 247: When using Impala with LDAP enabled, a user may hit the : following: : Not authorized: Client connection negotiation failed: client connection to 127.0.0.1:27000: SASL(-1): generic failure: All-whitespace username. : The following sequence can lead to the impala user : not being created in /etc/passwd. : Time 1: The impala user is not in LDAP. Impala : was installed, and the user impala is created in : /etc/passwd. : Time 2: The impala user is added to LDAP. : Time 3: New machine is added. My original wording wasn't that clear. I rewrote them as follows. Please edit as you see fit: The following sequence can lead to Impala user not being created in /etc/passwd on some machines in the cluster. Time 1: The user "impala" is not in LDAP. Impala service was installed on machine 1 and the user "impala" was created in /etc/passwd Time 2: The user "impala" is added to LDAP Time 3: A new machine is added to the cluster. When adding Impala service to this new machine, adding the user "impala" will fail as it already exists in LDAP. The consequence is that "impala" doesn't exist in /etc/passwd on the new machine, leading to the error seen above. http://gerrit.cloudera.org:8080/#/c/12291/1/docs/topics/impala_known_issues.xml@266 PS1, Line 266: IMPALLA-7298 typo ? http://gerrit.cloudera.org:8080/#/c/12291/1/docs/topics/impala_known_issues.xml@292 PS1, Line 292: kudu-2198 KUDU ? -- To view, visit http://gerrit.cloudera.org:8080/12291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I806bedcb9d16140413bb6a1f42798be9d1b6371c Gerrit-Change-Number: 12291 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 29 Jan 2019 19:30:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h@79 PS2, Line 79: /// Retry the Rpc 'rpc_call' up to 3 times. : /// Pass 'debug_action' to DebugAction() to potentially inject errors. : template : static Status DoRpcWithRetry(F&& rpc_call, const TQueryCtx& query_ctx, : const char* debug_action, const char* error_msg) { > Thanks. The idea (suggested by Thomas) was for DoRpcWithRetry to be reusabl I am fine with doing the refactoring with a separate JIRA as long as it doesn't become a back burner somehow. That said, it seems to make sense to add a sleep time between retries in this patch at least for the case of remote server being too busy (see the example in KrpcDataStreamSender). http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h@87 PS2, Line 87: MonoDelta::FromSeconds(10) > Yes, I wondered about this too. I decided that while my 2 clients are using Given what you stated above about DoRpcWithRetry() being reusable, I am not sure it makes sense to make assumption about the clients and hardcode the timeout / number of retries / sleep time between retries. http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc@2489 PS2, Line 2489: ShutdownStatusPB > Is that safe? GetShutdownStatus() allocates the ShutdownStatusPB on the sta It is an official C++ feature to extend the life time of a temporary object to the life time of the const reference which refers to it. -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 29 Jan 2019 19:12:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_issuances_ > Do you have any suggestions? remaining_initial_scan_ranges_? http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load(); : if (remaining != 0) { : LOG(INFO) << "XXX Assertion Failed " << remaining << " " << GetStackTrace(); : } > D'oh. I don't know how this one got by me. If you take the suggestion of using the above DCHECK, I don't think you need the #ifndef NDEBUG. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 29 Jan 2019 07:59:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/12260/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12260/1//COMMIT_MSG@13 PS1, Line 13: imapald typo http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc@653 PS2, Line 653: // KRPC relies on resolved IP address, so convert hostname. : IpAddr ip_address; : Status ip_status = HostnameToIpAddr(request.backend.hostname, _address); : if (!ip_status.ok()) { : VLOG(1) << "Could not convert hostname " << request.backend.hostname : << " to ip address, error: " << ip_status.GetDetail(); : return ip_status; : } : TNetworkAddress addr = MakeNetworkAddress(ip_address, port); Instead of resolving the IP for the backend, it seems more consistent to re-use some of the logic at the scheduler which already contains a map of all the backend state: https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L354-L357 May need some refactoring in the scheduler code to expose the necessary backend states. It's unfortunate that the existing 'executor_config_' uses 'hostname, be_port' as lookup key. May be it makes sense to implement a new function which looks up if there exists a known backend with 'hostname and krpc_address which exposes the port specified' and if so, returns the krpc_address_. If there isn't a known backend which maps to the specified 'host + krpc port', then we should just fail the shutdown request without making the actual RPC. Please feel free to come up with other ideas too. http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc@689 PS2, Line 689: if (msg.find("RemoteShutdown() RPC failed: Timed out: connection negotiation to") : != string::npos) { If we take the route of using existing backend states suggested above, the case of invalid port should be impossible so we shouldn't need to call this out like below here. http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h@79 PS2, Line 79: /// Retry the Rpc 'rpc_call' up to 3 times. : /// Pass 'debug_action' to DebugAction() to potentially inject errors. : template : static Status DoRpcWithRetry(F&& rpc_call, const TQueryCtx& query_ctx, : const char* debug_action, const char* error_msg) { Not specifically related to this change but it may be useful to factor this retry function to generic RPC code in rpc-mgr.inline.h. Also, if the RPC fails due to being overloaded, it may make sense to have some sleep time between retries. FWIW, we had some retry logic in KrpcDataStreamSender too and I believe we also had similar retry logic for the status reporting code. It may be worth the effort to consider refactoring these code so we don't implement the same retry logic at multiple places: https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L374-L386 https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L332-L336 http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h@87 PS2, Line 87: MonoDelta::FromSeconds(10) Not sure if it makes sense to hard code the timeout here as it seems to be utility function used by different RPC calls. Would it be better for the timeout to be one of the function parameter ? At the minimum, if we decide that 10 seconds is the timeout for all RPCs in ControlService, we should at least define it as some constant or #define to make it more explicit. I have a similar question about the number of retries too (which is hardcoded to 3). http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@81 PS1, Line 81: name F> typo ? DoRpcWithRetry ? http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.cc@182 PS2, Line 182: class Why is this needed ? http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.cc@183 PS2, Line 183: class Why is this needed ? http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc:
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_issuances_ This feels a bit of a misnomer as parquet scanner will actually issue more scan ranges although it does so without calling AddDiskIoRanges(). http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load(); : if (remaining != 0) { : LOG(INFO) << "XXX Assertion Failed " << remaining << " " << GetStackTrace(); : } DCHECK_EQ(remaining_scan_range_issuances_.Load(), 0) << GetStackTrace(); http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@212 PS5, Line 212: thread_state_.Close(this); I wonder if you can DCHECK remaining_scan_range_issuances_ to 0 after thread_state_.Close() as it seems to wait for all scaner threads to exit. However, I can imagine that we can potentially reach this point due to cancellation or reaching the limit so not all header ranges for the sequence files may have been handled yet ? -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 26 Jan 2019 03:50:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12257 ) Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG@13 PS3, Line 13: because these are not executable. Would you mind elaborating for latter references ? http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc@206 PS3, Line 206: && state->query_options().mt_dop == 0 Seems better to do this check before trying to acquire the thread token. Otherwise, we would leak the token. http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc@a108 PS3, Line 108: Do you recall what the limitation was ? http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@161 PS3, Line 161:* - MT_DOP > 0 is not supported for plans with base table joins or table sinks. :* Throws a NotImplementedException if plan validation fails. Comments need updating -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 25 Jan 2019 21:55:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7565: Set TAcceptQueueServer connection setup pool to be multi-threaded by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12249 ) Change subject: IMPALA-7565: Set TAcceptQueueServer connection_setup_pool to be multi-threaded by default .. Patch Set 1: Code-Review+1 (2 comments) Given the subtlety of this change, it may make sense to have a second reviewer take a look too. http://gerrit.cloudera.org:8080/#/c/12249/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12249/1//COMMIT_MSG@16 PS1, Line 16: Ran exhaustive tests with a thread pool of 10. Did you run with Kerberos enabled ? It may make sense to run stress test in a secure cluster with FLAGS_accepted_cnxn_setup_thread_pool_size and FLAGS_fe_service_threads set to some values such as 128. http://gerrit.cloudera.org:8080/#/c/12249/1/be/src/transport/TSaslServerTransport.cpp File be/src/transport/TSaslServerTransport.cpp: http://gerrit.cloudera.org:8080/#/c/12249/1/be/src/transport/TSaslServerTransport.cpp@179 PS1, Line 179: lock_guard l(transportMap_mutex_); Does it make sense to DCHECK(transportMap_.find(trans) == transportMap_.end()); here ? -- To view, visit http://gerrit.cloudera.org:8080/12249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I053120d4c3153ddbe5261acd28388be6cd191908 Gerrit-Change-Number: 12249 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 00:30:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc@300 PS2, Line 300: nstance_exec_status : > Yes, but instance_stats->done_ can be set by an intermediate report (if tha Thanks for the explanation. This seems a bit subtle indeed. Would things be simpler if we just stop bumping the sequence number after generating the last report for a fragment instance ? In other words, we need to record the fact that the final report may have been generated already, which is different from whether final report has been sent. http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.h@394 PS3, Line 394: succeeds nit: succeeded, matching "failed" below. http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.cc@382 PS3, Line 382: it = report_.mutable_instance_exec_status()->erase(it); : profile_it = profiles_forest_.profile_trees.erase(profile_it); Reading the documentation of vector::erase(), it's not exactly an efficient operation as it seems to do quite a bit of copying. Doing this in a list may end up being O(n^2) here. I suppose the reason for doing this dance here is to make UpdateReport() work on both code paths of ReportSuccessful() and ReportFailed(). Assuming the common case is that the report succeeds, it seems to be doing quite a bit of extra work just to erase each element one by one. Shouldn't the penalty of copying be paid instead when the report fails, which is presumably not too common ? In other words, should we just copy / move the stateful report into the fragment instance state instead ? A slightly more efficient approach of what this loop is trying to achieve could be swapping elements at the end of the list to this spot being erased. In which case, it would be O(n) worst case. However, I am not sure if this complexity is warranted. http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@117 PS2, Line 117: // Sequence number prevents out-of-order or duplicated updates from being applied. : optional int64 report_seq_no = 1; May be worth documenting the relationship of this seq_no with that in FragmentInstanceExecStatusPB. If I understand it correctly, this one should be <= that in FragmentInstanceExecStatusPB. http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@138 PS2, Line 138: // Cumulative structural changes made by the table sink of this fragment : // instance. This is sent only when 'done' above is true. Not idempotent. : optional DmlExecStatusPB dml_exec_status = 5; > Its not quite straight-forward to do, so given this patch is already reason Sounds good to me. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Jan 2019 21:55:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 11: (12 comments) Still wrapping my head around this big change. Some initial comments for now. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@67 PS11, Line 67: RuntimeProfile* host_profiles nit: Please add a comment on the role of this parameter. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/coordinator-backend-state.h@249 PS11, Line 249: host_profile_ = nullptr; nit: A brief statement of what this contains. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/query-state.cc@296 PS11, Line 296: // Add profile to report : host_profile_->ToThrift(_forest->host_profile); : profiles_forest->__isset.host_profile = true; : // Free resources in chunked counters in the profile : host_profile_->ClearChunkedTimeSeriesCounters(); I wonder how this may reconcile with https://gerrit.cloudera.org/#/c/12049/ ? In particular, with the mentioned patch, on a failure to send the profile report, the query state thread will back off, and retry sending a new profile. So, the host profile is potentially another non-idempotent state added to the report. That said, how tolerant is the coordinator's side logic on handling missing chunks in this time series ? In other words, if we fail to send a report, how bad is it to let the host profile of this epoch to be dropped ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/runtime/runtime-state.cc@76 PS11, Line 76: profile_(RuntimeProfile::Create(obj_pool(), "")), Why this change ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile-counters.h@443 PS11, Line 443: previous_sample_count_ = 0; This deserves a comment. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419 PS11, Line 419: Samples : /// are not re-sampled into larger intervals, instead owners of this profile can call : /// ClearChunkedTimeSeriesCounters() to reset the sample buffers of all chunked time : /// series counters So, misuse of this kind of counter (e.g. never calling ClearChunkedTimeSeriesCounters) may lead to large untracked memory usage, right ? http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1216 PS11, Line 1216: nit: blank space http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1231 PS11, Line 1231: nit: blank space http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.cc@1279 PS11, Line 1279: DCHECK( DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h File be/src/util/system-state-info.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@66 PS11, Line 66: /// CaptureSystemStateSnapshot(). 'cpu_ratios_' is updated after returning from this function. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.h@69 PS11, Line 69: NUM_CPU_VALUES = 5; Seems more future proof to have this as the last value in the enum below like: enum PROC_STAT_CPU_VALUES { CPU_USER = 0, CPU_NICE, CPU_SYSTEM, CPU_IDLE, CPU_IOWAIT, NUM_CPU_VALUES }; http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/system-state-info.cc@44 PS11, Line 44: int64_t SystemStateInfo::ParseInt64(const string& val) const { : StringParser::ParseResult result; : int64_t parsed = StringParser::StringToInt(val.c_str(), val.size(), ); : if (result == StringParser::PARSE_SUCCESS) return parsed; : return -1; : } Wonder if this fits better as a utility function in StringParser ? -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: I meant "the timeout can also be turned off by setting it to 0 or something by user". We will have a non-zero timeout by default. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 23 Jan 2019 01:51:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc@563 PS2, Line 563: FLAGS_status_report_interval_ms * (num_failed_reports_ + 1))) { > This backoff time seems way larger than the retry interval used when failin Some more thoughts on this: The behavior in this patch is that we will back off on the intermediate reports with at least 5 seconds (or whatever FLAGS_status_report_interval_ms is set to) but then we only spin for about 300ms (or whatever FLAGS_report_status_retry_interval_ms is set to) when it fails to be sent. This behavior seems rather complicated without much benefit. Having 2 different flags controlling the behavior also makes it hard to use and reason about the behavior. An executor is merely using this periodic status report as a proxy to determine the liveness of the coordinator. So, wouldn't the behavior be simpler and more consistent if an executor just considers the coordinator as dysfunctional after FLAGS_status_report_max_retries failed reports in a row. In other words, an executor provides a total time allowance of FLAGS_status_report_max_retries * FLAGS_report_interval_ms (or potentially longer with exponential backoff) for coordinator to respond before giving up. With the above policy, we don't distinguish between intermediate report and final report for the backoff behavior. A clock will be started on the first failure and an executor gives up after the time allowance has passed. As a side note, an executor should ideally be able to rely on statestore to provide proper cluster membership but the reality is that it's insufficient for determining the health of a node (see IMPALA-7872). So, we fall back to this poor-man approach of using periodic status report as a heartbeat to probe the coordinator. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 20 Dec 2018 20:15:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7213: Use separate network plane for DataStream and Control services
Michael Ho has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12107 ) Change subject: IMPALA-7213: Use separate network plane for DataStream and Control services .. IMPALA-7213: Use separate network plane for DataStream and Control services This change is a follow up for a review comment in https://gerrit.cloudera.org/#/c/10855/ about separating the TCP connections of DataStream and Control services so that control commands don't get blocked behind large payloads being sent in the DataStream services. Testing done: exhaustive build Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Reviewed-on: http://gerrit.cloudera.org:8080/12107 Reviewed-by: Michael Ho Tested-by: Impala Public Jenkins --- M be/src/service/control-service.cc M be/src/service/data-stream-service.cc 2 files changed, 8 insertions(+), 2 deletions(-) Approvals: Michael Ho: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Gerrit-Change-Number: 12107 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] Add "network plane" as part of ConnectionId
Michael Ho has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12106 ) Change subject: Add "network_plane" as part of ConnectionId .. Add "network_plane" as part of ConnectionId The motivation for doing so is to allow N services on the same host to be multiplexed on M different connections. For instance, a server may host multiple KRPC based services: one for control command and one for data transfer. Separating the connections between the control channel and the data channel prevents unnecessary delays of the control commands due to being stuck behind large data transfers from client to server. By default, the network_plane of a new ConnectionId is not set. A user can change it to a different value by calling Proxy::set_network_plane() on the ConnectionId. Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Reviewed-on: http://gerrit.cloudera.org:8080/11681 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/12106 Reviewed-by: Thomas Marshall Tested-by: Michael Ho --- M be/src/kudu/rpc/connection_id.cc M be/src/kudu/rpc/connection_id.h M be/src/kudu/rpc/messenger.h M be/src/kudu/rpc/proxy.cc M be/src/kudu/rpc/proxy.h M be/src/kudu/rpc/reactor.h M be/src/kudu/rpc/rpc-test.cc 7 files changed, 138 insertions(+), 31 deletions(-) Approvals: Thomas Marshall: Looks good to me, approved Michael Ho: Verified -- To view, visit http://gerrit.cloudera.org:8080/12106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Gerrit-Change-Number: 12106 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] Add "network plane" as part of ConnectionId
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12106 ) Change subject: Add "network_plane" as part of ConnectionId .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Gerrit-Change-Number: 12106 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 20 Dec 2018 00:27:24 + Gerrit-HasComments: No
[Impala-ASF-CR] Add "network plane" as part of ConnectionId
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12106 ) Change subject: Add "network_plane" as part of ConnectionId .. Patch Set 1: Verified by https://jenkins.impala.io/job/gerrit-verify-dryrun/3588/ -- To view, visit http://gerrit.cloudera.org:8080/12106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Gerrit-Change-Number: 12106 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 20 Dec 2018 00:27:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7213: Use separate network plane for DataStream and Control services
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12107 ) Change subject: IMPALA-7213: Use separate network plane for DataStream and Control services .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12107/1/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12107/1/be/src/service/control-service.cc@83 PS1, Line 83: (*proxy)->set_network_plane("control"); > Maybe add a brief comment about the motivation here Done -- To view, visit http://gerrit.cloudera.org:8080/12107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Gerrit-Change-Number: 12107 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 19 Dec 2018 19:30:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7213: Use separate network plane for DataStream and Control services
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12107 ) Change subject: IMPALA-7213: Use separate network plane for DataStream and Control services .. Patch Set 2: Code-Review+2 Carry Thomas' +2. -- To view, visit http://gerrit.cloudera.org:8080/12107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Gerrit-Change-Number: 12107 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 19 Dec 2018 19:30:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7213: Use separate network plane for DataStream and Control services
Hello Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12107 to look at the new patch set (#2). Change subject: IMPALA-7213: Use separate network plane for DataStream and Control services .. IMPALA-7213: Use separate network plane for DataStream and Control services This change is a follow up for a review comment in https://gerrit.cloudera.org/#/c/10855/ about separating the TCP connections of DataStream and Control services so that control commands don't get blocked behind large payloads being sent in the DataStream services. Testing done: exhaustive build Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 --- M be/src/service/control-service.cc M be/src/service/data-stream-service.cc 2 files changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/12107/2 -- To view, visit http://gerrit.cloudera.org:8080/12107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Gerrit-Change-Number: 12107 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-7931: fix executor shutdown races
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12082 ) Change subject: IMPALA-7931: fix executor shutdown races .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/cancellation-work.h File be/src/service/cancellation-work.h: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/cancellation-work.h@18 PS7, Line 18: #pragma once > There was a discussion on dev@ a while ago and that was the consensus: http I see. Thanks for reminding me. -- To view, visit http://gerrit.cloudera.org:8080/12082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c1a80304cb6695d228aca8314e2231727ab1998 Gerrit-Change-Number: 12082 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 19 Dec 2018 01:50:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7931: fix executor shutdown races
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12082 ) Change subject: IMPALA-7931: fix executor shutdown races .. Patch Set 7: (10 comments) http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/runtime/coordinator-backend-state.h@291 PS7, Line 291: boost::unique_lock Did we decide on moving to std::unique_lock in the long run ? I cannot quite recall what we agreed on in the last discussion. http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/runtime/coordinator.cc@835 PS7, Line 835: : nit: space http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/runtime/coordinator.cc@835 PS7, Line 835: (BackendState* backend_state: backend_states_) { This may be very unlikely in practice but is there a chance this loop race with InitBackendStates() ? http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/cancellation-work.h File be/src/service/cancellation-work.h: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/cancellation-work.h@18 PS7, Line 18: #pragma once Interesting. So, that's the new convention we are gonna use in the future ? http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/cancellation-work.h@31 PS7, Line 31: EXPIRED, Will "LIMIT_EXCEEDED" be a better name ? Sounds to me the reason for cancellation is either "time limit exceeded" or "some resource limit exceeded". http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/cancellation-work.h@80 PS7, Line 80: TUniqueId query_id_; const TUniqueId query_id_; ? Same for other fields below. http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/impala-server.cc@1406 PS7, Line 1406: vector active_backends = const http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/service/impala-server.cc@1800 PS7, Line 1800: cancellation_entry->first, cancellation_entry->second)); nit: wrong indent http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/util/impalad-metrics.h File be/src/util/impalad-metrics.h: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/util/impalad-metrics.h@49 PS7, Line 49: executed nit: started. Precisely speaking, the counter is incremented when a query starts on a backend. It may not have completed. This may just be a minor distinction. Feel free to ignore. http://gerrit.cloudera.org:8080/#/c/12082/7/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/12082/7/common/thrift/metrics.json@483 PS7, Line 483: .. nit: duplicated "." -- To view, visit http://gerrit.cloudera.org:8080/12082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c1a80304cb6695d228aca8314e2231727ab1998 Gerrit-Change-Number: 12082 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 19 Dec 2018 00:54:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7931: fix executor shutdown races
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12082 ) Change subject: IMPALA-7931: fix executor shutdown races .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/common/status.cc File be/src/common/status.cc: http://gerrit.cloudera.org:8080/#/c/12082/7/be/src/common/status.cc@179 PS7, Line 179: Status(code); May be I misread the code but isn't this calling the version of Status ctor which will actually log and dump the stack trace ?! -- To view, visit http://gerrit.cloudera.org:8080/12082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c1a80304cb6695d228aca8314e2231727ab1998 Gerrit-Change-Number: 12082 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 18 Dec 2018 19:31:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add "network plane" as part of ConnectionId
Michael Ho has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/12106 ) Change subject: Add "network_plane" as part of ConnectionId .. Removed reviewer Kudu Jenkins. -- To view, visit http://gerrit.cloudera.org:8080/12106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Gerrit-Change-Number: 12106 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] Add "network plane" as part of ConnectionId
Michael Ho has removed Adar Dembo from this change. ( http://gerrit.cloudera.org:8080/12106 ) Change subject: Add "network_plane" as part of ConnectionId .. Removed reviewer Adar Dembo. -- To view, visit http://gerrit.cloudera.org:8080/12106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Gerrit-Change-Number: 12106 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Kudu Jenkins (120)
[Impala-ASF-CR] Add "network plane" as part of ConnectionId
Hello Kudu Jenkins, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/12106 to review the following change. Change subject: Add "network_plane" as part of ConnectionId .. Add "network_plane" as part of ConnectionId The motivation for doing so is to allow N services on the same host to be multiplexed on M different connections. For instance, a server may host multiple KRPC based services: one for control command and one for data transfer. Separating the connections between the control channel and the data channel prevents unnecessary delays of the control commands due to being stuck behind large data transfers from client to server. By default, the network_plane of a new ConnectionId is not set. A user can change it to a different value by calling Proxy::set_network_plane() on the ConnectionId. Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Reviewed-on: http://gerrit.cloudera.org:8080/11681 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins --- M be/src/kudu/rpc/connection_id.cc M be/src/kudu/rpc/connection_id.h M be/src/kudu/rpc/messenger.h M be/src/kudu/rpc/proxy.cc M be/src/kudu/rpc/proxy.h M be/src/kudu/rpc/reactor.h M be/src/kudu/rpc/rpc-test.cc 7 files changed, 138 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/12106/1 -- To view, visit http://gerrit.cloudera.org:8080/12106 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6767e631fd9530ea54f5ed63ff4c8c179ab216b2 Gerrit-Change-Number: 12106 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120)
[Impala-ASF-CR] IMPALA-7213: Use separate network plane for DataStream and Control services
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12107 Change subject: IMPALA-7213: Use separate network plane for DataStream and Control services .. IMPALA-7213: Use separate network plane for DataStream and Control services This change is a follow up for a review comment in https://gerrit.cloudera.org/#/c/10855/ about separating the TCP connections of DataStream and Control services so that control commands don't get blocked behind large payloads being sent in the DataStream services. Testing done: exhaustive build Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 --- M be/src/service/control-service.cc M be/src/service/data-stream-service.cc 2 files changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/12107/1 -- To view, visit http://gerrit.cloudera.org:8080/12107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Gerrit-Change-Number: 12107 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 2: (8 comments) Some trivial comments for now. I have yet to think through the retry logic to see if they can be simplified and more consistent. http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc@300 PS2, Line 300: || instance_stats->done_ If we send the final report multiple times, won't report_seq_no == last_report_seq_no here ? http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc@320 PS2, Line 320: auto const auto& http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/fragment-instance-state.h@101 PS2, Line 101: recieved nit: received http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.h@387 PS2, Line 387: The number of consecutive failed intermediate reports. This is reset to 0 upon a successful RPC, right ? So, will it be clearer to say "The number of failed intermediate reports since the last successfully sent report" ? http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc@386 PS2, Line 386: TUniqueId const TUniqueId& http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc@563 PS2, Line 563: FLAGS_status_report_interval_ms * (num_failed_reports_ + 1))) { This backoff time seems way larger than the retry interval used when failing to send the final report. This seems a bit inconsistent. http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@116 PS2, Line 116: FInstanceExecStatusStatefulPB Is there a more succinct name for it ? StatefulStatusPB ? http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@138 PS2, Line 138: // Cumulative structural changes made by the table sink of this fragment : // instance. This is sent only when 'done' above is true. Not idempotent. : optional DmlExecStatusPB dml_exec_status = 5; Why isn't this moved into FInstanceExecStatusStatefulPB -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 18 Dec 2018 00:43:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/12069/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12069/7//COMMIT_MSG@15 PS7, Line 15: This mechanism adds a new time series counter class that collects all : measured values and does not re-sample them. It will re-sample values : when printing them into a string profile to a max of 64 values, but : Thrift profiles will contain the full list of values. How hard is it to factor this subset of changes out as a patch itself ? It seems easier to review that way. http://gerrit.cloudera.org:8080/#/c/12069/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12069/7/be/src/service/impala-server.cc@1051 PS7, Line 1051: if (rand() < trace_ratio * (RAND_MAX + 1L)) query_ctx->__set_trace_resource_usage(true); > I think rand() isn't thread safe. (And do we initialize it with srand() som Seems helpful to have a log statement that tracing is enabled. http://gerrit.cloudera.org:8080/#/c/12069/7/be/src/util/periodic-counter-updater.h File be/src/util/periodic-counter-updater.h: http://gerrit.cloudera.org:8080/#/c/12069/7/be/src/util/periodic-counter-updater.h@33 PS7, Line 33: /// Singleton utility class that updates counter values. This is used to sample some : /// metric (e.g. memory used) at regular intervals. The samples can be summarized in : /// a few ways (e.g. averaged, stored as histogram, kept as a time series data, etc). : /// This class has one thread that will wake up at a regular period and update all : /// the registered counters. : /// Typically, the counter updates should be stopped as early as possible to prevent : /// future stale samples from polluting the useful values. Should this be updated to include the role of UpdateFn() below ? -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 7 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 17 Dec 2018 23:23:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning threads because of buggy handling of num unqueued files .
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning threads because of buggy handling of num_unqueued_files_. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12097/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12097/1//COMMIT_MSG@48 PS1, Line 48: Michael Ho. I didn't do much so please feel free to leave me out. http://gerrit.cloudera.org:8080/#/c/12097/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/1/be/src/exec/hdfs-scan-node-base.h@313 PS1, Line 313: /// Calls RangeComplete() with skipped=true for all the splits of the file nit: May also want to update the side effect of decrementing num_unqueued_files_; http://gerrit.cloudera.org:8080/#/c/12097/1/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/1/be/src/exec/hdfs-scan-node.cc@469 PS1, Line 469: // File ranges haven't been issued yet, skip entire file : SkipFile(partition->file_format(), desc); I guess this is slightly subtle here as we never decrement "num_unqueued_files_" for reading the sequence file's header in BaseSequenceScanner::IssueInitialRanges(), unlike other file formats which usually decrement that counter in their IssueInitialRanges(). -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 17 Dec 2018 22:51:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc@58 PS1, Line 58: DEFINE_int32(status_report_max_failures, 3, : "Max number of consecutive failed status reports to allow before cancelling"); > I thought we want to use a fixed timeout approach for the maximum retries ? Actually, max_retries seems to be safer in the sense that it guarantees a minimum amount of time the thread will sleep before giving up as we know the sleep time between each retry. With an absolute timeout, there is no guarantee on the number of retries we will do. If the system is overloaded, the query state thread may not get to run very often before expiration so the number of retries is non-deterministic. http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc@368 PS1, Line 368: fis_map_[id]->runtime_state()->ClearUnreportedErrors(); There is a race here: we may be clearing newly added errors we added after the profile was computed. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 11 Dec 2018 22:25:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7212: Remove dead code data-stream-mgr.cc
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12064 Change subject: IMPALA-7212: Remove dead code data-stream-mgr.cc .. IMPALA-7212: Remove dead code data-stream-mgr.cc Dead code which was accidentally left out in the last patch of IMPALA-7212. Testing done: Built Impala debug and release builds Change-Id: I047e2a01b835936f1066d4d7f87194dcc6857542 --- D be/src/runtime/data-stream-mgr.cc 1 file changed, 0 insertions(+), 298 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/12064/1 -- To view, visit http://gerrit.cloudera.org:8080/12064 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I047e2a01b835936f1066d4d7f87194dcc6857542 Gerrit-Change-Number: 12064 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/coordinator-backend-state.cc@299 PS1, Line 299: report_seq_no <= instance_stats->last_report_seq_no_ || instance_stats->done_) { Wouldn't this be a problem for intermediate report ? If the coordinator is slow to respond to the RPC, the client side (i.e. the executor) may assume that the intermediate reports were never applied even if it was actually applied by the coordinator. In which case, the new report may come with a larger sequence number which is larger than the old one and that may cause duplicated report. May be it makes sense to keep track of whether a FIS' report was successfully applied at the executor side and bump the sequence number if the last one was applied successfully. Not sure if there is any corner case which may not have been considered in this alternate approach ?! http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/fragment-instance-state.h@99 PS1, Line 99: void SetFinalReportSent() { final_report_sent_ = true; } May help to add comments on the expected caller and on when this is expected to be called ? http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.h@417 PS1, Line 417: (bool final Comment on the input parameter. http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc@58 PS1, Line 58: DEFINE_int32(status_report_max_failures, 3, : "Max number of consecutive failed status reports to allow before cancelling"); I thought we want to use a fixed timeout approach for the maximum retries ? I am okay with max_retries too but I think it may make sense to have larger number of retries than 3. http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc@324 PS1, Line 324: Try to send the RPC 3 times before failing. Sleep for 100ms between retries. May need to be updated. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 10 Dec 2018 21:56:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7943: Bump the default client timeout set on impala-shell
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12051 ) Change subject: IMPALA-7943: Bump the default client timeout set on impala-shell .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc40069e86cbf93634320804efba003fb5551afe Gerrit-Change-Number: 12051 Gerrit-PatchSet: 1 Gerrit-Owner: Anuj Phadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 07 Dec 2018 02:58:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
Michael Ho has restored this change. ( http://gerrit.cloudera.org:8080/9251 ) Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams .. Restored Relevant for IMPALA-7467 as we may still keep a lot of the Plan related structures from Frontend in Thrift. -- To view, visit http://gerrit.cloudera.org:8080/9251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: restore Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09 Gerrit-Change-Number: 9251 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9251 ) Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams .. Patch Set 1: We may consider reviving it as we look into IMPALA-7467. Let's keep it opened for now. -- To view, visit http://gerrit.cloudera.org:8080/9251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09 Gerrit-Change-Number: 9251 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 06 Dec 2018 18:54:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Hello Lars Volker, Balazs Jeszenszky, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12000 to look at the new patch set (#6). Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report received time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 66 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/6 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 6: Code-Review+2 Carry Lars' +2. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 03 Dec 2018 21:49:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/12000/5/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/5/tests/query_test/test_observability.py@28 PS5, Line 28: = > flake8: E225 missing whitespace around operator Done http://gerrit.cloudera.org:8080/#/c/12000/5/tests/query_test/test_observability.py@28 PS5, Line 28: MAX_WAIT > Now that this is removed from the code, I suggest naming it something like Done -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 03 Dec 2018 21:49:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6955: fix test query concurrency and server startup sequence
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12019 ) Change subject: IMPALA-6955: fix test_query_concurrency and server startup sequence .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12019/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12019/1/be/src/service/impala-server.cc@2202 PS1, Line 2202: http_handler_.reset(new ImpalaHttpHandler(this)); : http_handler_->RegisterHandlers(exec_env_->webserver()); Can you please add a comment that this needs to happen after the ImpalaServer has registered with ExecEnv or it may result in some sort of races ? -- To view, visit http://gerrit.cloudera.org:8080/12019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If22f71ab6edaf9a6b46afc0985c73dc4625b5103 Gerrit-Change-Number: 12019 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 03 Dec 2018 19:54:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/12000/4/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/4/tests/query_test/test_observability.py@19 PS4, Line 19: from tests.common.impala_cluster import ImpalaCluster > flake8: F401 'math.ceil' imported but unused Done http://gerrit.cloudera.org:8080/#/c/12000/4/tests/query_test/test_observability.py@74 PS4, Line 74: > The other test below uses 300 seconds here. I suggest to use the same time, Done -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 03 Dec 2018 19:38:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Hello Lars Volker, Balazs Jeszenszky, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12000 to look at the new patch set (#5). Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report received time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 63 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/5 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11947 ) Change subject: IMPALA-6656: BufferAllocator observability .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 01 Dec 2018 00:31:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/12000/3/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12000/3/be/src/runtime/coordinator-backend-state.cc@571 PS3, Line 571: > Maybe clamp this at 0 to make sure we won't return a negative value if some Done http://gerrit.cloudera.org:8080/#/c/12000/3/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/3/tests/query_test/test_observability.py@83 PS3, Line 83: rt_time_str, '%Y-% > get_thrift_profile takes a timeout, so you can call it once before the loop Done http://gerrit.cloudera.org:8080/#/c/12000/3/tests/query_test/test_observability.py@90 PS3, Line 90: > Can you make this a constant and add a comment. Otherwise this test will fa Iterate through all the nodes in the new PS. http://gerrit.cloudera.org:8080/#/c/12000/3/tests/query_test/test_observability.py@108 PS3, Line 108: assert result.exec_summary[scan_idx]['detail'] == 'functional_kudu.alltypestiny' > I found this part tricky to read. Why is it 8 seconds? Can you elaborate in The test in new PS is simplified to not check for monotonicity of the timestamp as that's not part of the scope. http://gerrit.cloudera.org:8080/#/c/12000/3/tests/query_test/test_observability.py@112 PS3, Line 112: result = self.execute_query(quer > Should we just call close_query() here? Done -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Sat, 01 Dec 2018 00:01:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Hello Lars Volker, Balazs Jeszenszky, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12000 to look at the new patch set (#4). Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report received time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 62 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/4 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6656: BufferAllocator observability
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11947 ) Change subject: IMPALA-6656: BufferAllocator observability .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/11947/4/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: http://gerrit.cloudera.org:8080/#/c/11947/4/be/src/runtime/bufferpool/buffer-allocator.cc@37 PS4, Line 37: // nit: /// Same below. http://gerrit.cloudera.org:8080/#/c/11947/4/be/src/runtime/bufferpool/buffer-pool-test.cc File be/src/runtime/bufferpool/buffer-pool-test.cc: http://gerrit.cloudera.org:8080/#/c/11947/4/be/src/runtime/bufferpool/buffer-pool-test.cc@75 PS4, Line 75: test_env_->DisableBufferPool(); Can you please add a comment on why this is necessary ? http://gerrit.cloudera.org:8080/#/c/11947/4/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/11947/4/common/thrift/metrics.json@1330 PS4, Line 1330: allocating from this system in system allocator ? http://gerrit.cloudera.org:8080/#/c/11947/4/common/thrift/metrics.json@1340 PS4, Line 1340: Counts the number of times "Number of times" may be more succinct. Same below. http://gerrit.cloudera.org:8080/#/c/11947/4/common/thrift/metrics.json@1350 PS4, Line 1350: the number the number of times http://gerrit.cloudera.org:8080/#/c/11947/4/common/thrift/metrics.json@1350 PS4, Line 1350: allocated directly allocated http://gerrit.cloudera.org:8080/#/c/11947/4/common/thrift/metrics.json@1370 PS4, Line 1370: a free buffer was recycled within same NUMA node to fulfi a recycled buffer within the same NUMA node was used to fufil... ? -- To view, visit http://gerrit.cloudera.org:8080/11947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12b740b8ea7773b3215681531dfa62db55cfdf18 Gerrit-Change-Number: 11947 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 30 Nov 2018 22:10:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 8: Code-Review+1 I suppose it's safe to assume that no external software depends on the format of the exec summary, right ? I suppose not but just wanna double check. -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 29 Nov 2018 18:52:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc@47 PS1, Line 47: "Last report received time"; > I am nit-picking somewhat, but this is really 'Last received time' from the How about "Last report received time". "received time" by itself may not be very clear on what was received. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@95 PS1, Line 95: if report_time < report_time_dict.get(node.name, datetime.min): > What happens here the first time through, when node.name does not yet exist The second argument (datetime.min) is the default value if the key doesn't exist. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@111 PS1, Line 111: assert num_time_backward <= ceil(elapsed_time / MIN_NTP_POLL_PERIOD) > Should this be num_time_backward? time_backward is a boolean, and shouldn't Nice catch. Fixed. http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@112 PS1, Line 112: s > flake8: F841 local variable 'results' is assigned to but never used Done -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 29 Nov 2018 00:04:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Hello Balazs Jeszenszky, Zoram Thanga, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12000 to look at the new patch set (#2). Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report received time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 85 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/2 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/exec/plan-root-sink.cc@79 PS7, Line 79: while (results_ == nullptr && !state->is_cancelled()) sender_cv_.Wait(l); Should we count the time spent here as inactive_time ? This seems to be the convention for other plan nodes and DataStreamSender. Didn't dive into other hdfs table writer sinks to see if inactive time also applies there. http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/runtime/coordinator.cc@234 PS7, Line 234: optional in the thrift Should they become required then ? http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.h@307 PS7, Line 307: plan node data sink http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc@962 PS7, Line 962: void RuntimeProfile::SetPlanNodeId(int node_id) { Should we DCHECK that "data_sink_id" is not set here ? http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/runtime-profile.cc@966 PS7, Line 966: void RuntimeProfile::SetDataSinkId(int sink_id) { Should we DCHECK that "plan_node_id" is not set here ? http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/summary-util.cc File be/src/util/summary-util.cc: http://gerrit.cloudera.org:8080/#/c/11967/7/be/src/util/summary-util.cc@79 PS7, Line 79: LOG(INFO) << label_ss.str() << " " << node.node_id; Is this for debugging only ? http://gerrit.cloudera.org:8080/#/c/11967/7/common/thrift/RuntimeProfile.thrift File common/thrift/RuntimeProfile.thrift: http://gerrit.cloudera.org:8080/#/c/11967/7/common/thrift/RuntimeProfile.thrift@67 PS7, Line 67: // Set if this node corresponds to a plan node. : 1: optional i32 plan_node_id : : // Set if this node corresponds to a data sink. : 2: optional i32 data_sink_id Are these two not supposed to be set at the same time ? if so, can you please add a comment ? -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 23:53:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc@502 PS1, Line 502: ToStringFromUnixMillis(last_report_time_ms_) An alternate approach would be to just store 'last_report_time_ms_' as string and convert it using ToStringFromUnixMillis() when exporting the profile as Thrift object or when we are pretty printing it. That said, it's unclear whether the complication is worth it given this is not in the critical path of returning query results. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 28 Nov 2018 01:24:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12000 Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 85 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/1 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-7882: Remove StringValue padding validation
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11989 ) Change subject: IMPALA-7882: Remove StringValue padding validation .. Patch Set 1: Code-Review+1 Make sense to me. I suppose it was only caught in ASAN builds, right ? -- To view, visit http://gerrit.cloudera.org:8080/11989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e7247b416ae9e1bc6da299572f2b6f326ad1c71 Gerrit-Change-Number: 11989 Gerrit-PatchSet: 1 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Nov 2018 00:46:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11964 ) Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11964/1/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/11964/1/tests/beeswax/impala_beeswax.py@206 PS1, Line 206: # Get the profile after unregistering the query so ExecSummary is included in it. > Thanks. I wonder if we should track the profile issue as a query lifecycle Yes, we should consider adding the ExecSummary to the profile once all backends completed. Filed IMPALA-7879 for it. -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 21 Nov 2018 21:23:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11964 to look at the new patch set (#3). Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. IMPALA-7852: Fix some flakiness in test_hash_join_timer.py test_hash_join_timer.py aims to verify the timers in the join nodes are functioning correctly. It does so by parsing the query profile for certain patterns after the query has finished. Before IMPALA-4063, each individual fragment instance will post its profile to the coordinator upon completion. After IMPALA-4063, the profiles of all fragment instances are sent together periodically and the final profile is sent once all fragment instances on a backend are done. The problem with the existing implementation of the test is that it doesn't actually fetch results before closing the query. As a result of it, the coordinator fragment never gets a chance to complete as it will block forever when inserting into the plan root sink. The lack of completion of the coordinator fragment causes the final profiles of fragment instances on the coordinator to be not sent before the query is closed. As a result, the profile of a fragment instance on the coordinator could be stale if it completes between two periodic updates, leading to random test failure. This change fixes the flankiness by always fetching results before closing the query. Ideally, if we fix IMPALA-539 and wait for all backends' final profiles before completing query unregistration, we should get the final profile from the coordinator fragment too. Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 --- M be/src/runtime/query-state.cc M tests/query_test/test_hash_join_timer.py 2 files changed, 47 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/11964/3 -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11964 ) Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11964/2/tests/query_test/test_hash_join_timer.py File tests/query_test/test_hash_join_timer.py: http://gerrit.cloudera.org:8080/#/c/11964/2/tests/query_test/test_hash_join_timer.py@148 PS2, Line 148: ; > flake8: E703 statement ends with a semicolon Done -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 21 Nov 2018 00:14:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11964 ) Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11964/1/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/11964/1/tests/beeswax/impala_beeswax.py@206 PS1, Line 206: # Get the profile after unregistering the query so ExecSummary is included in it. > Good point although in practice, given the default size of query log and th I rewrote part of test_hash_join_timer.py in PS2 to use the exec_summary from the result set. Also did a bit of clean up there so there is no need for the change in this file here. -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 21 Nov 2018 00:11:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11964 to look at the new patch set (#2). Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. IMPALA-7852: Fix some flakiness in test_hash_join_timer.py test_hash_join_timer.py aims to verify the timers in the join nodes are functioning correctly. It does so by parsing the query profile for certain patterns after the query has finished. Before IMPALA-4063, each individual fragment instance will post its profile to the coordinator upon completion. After IMPALA-4063, the profiles of all fragment instances are sent together periodically and the final profile is sent once all fragment instances on a backend are done. The problem with the existing implementation of the test is that it doesn't actually fetch results before closing the query. As a result of it, the coordinator fragment never gets a chance to complete as it will block forever when inserting into the plan root sink. The lack of completion of the coordinator fragment causes the final profiles of fragment instances on the coordinator to be not sent before the query is closed. As a result, the profile of a fragment instance on the coordinator could be stale if it completes between two periodic updates, leading to random test failure. This change fixes the flankiness by always fetching results before closing the query. Ideally, if we fix IMPALA-539 and wait for all backends' final profiles before completing query unregistration, we should get the final profile from the coordinator fragment too. Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 --- M be/src/runtime/query-state.cc M tests/query_test/test_hash_join_timer.py 2 files changed, 43 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/11964/2 -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7829: Mark a fragment instance as done only after Close() is called
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11939 ) Change subject: IMPALA-7829: Mark a fragment instance as done only after Close() is called .. Patch Set 2: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/11939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 Gerrit-Change-Number: 11939 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Nov 2018 19:45:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7829: Mark a fragment instance as done only after Close() is called
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11939 ) Change subject: IMPALA-7829: Mark a fragment instance as done only after Close() is called .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11939/1/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/11939/1/be/src/runtime/fragment-instance-state.cc@100 PS1, Line 100: ReleaseThreadToken(); > Consider moving to Close()? It's acquired in Prepare() so that feels symmet Done -- To view, visit http://gerrit.cloudera.org:8080/11939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 Gerrit-Change-Number: 11939 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Nov 2018 19:44:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7829: Mark a fragment instance as done only after Close() is called
Hello Thomas Marshall, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11939 to look at the new patch set (#2). Change subject: IMPALA-7829: Mark a fragment instance as done only after Close() is called .. IMPALA-7829: Mark a fragment instance as done only after Close() is called As shown in IMPALA-7828. there is some non-determinism on whether the errors detected in FragmentInstanceState::Close() will show up in the final profile sent to the coordinator. The reason is that the current code marks a fragment instance as "done" after ExecInternal() completes but before Close() is called. There is a window between when the final status report is sent and when Close() finishes. This change fixes the problem by not sending the final report until Close() is called. This has no implication on the first row available time for normal queries. It may slightly lengthen the first row available time for DML queries. Testing done: Updated udf-no-expr-rewrite.test to exercise this test Perf run on an 8 node clusters didn't show any regression: TPCH-300 ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 23.94 | -2.05% | 12.55 | -2.62% | ++---+-++++ Small concurrency +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 6.89| -0.66% | 6.62 | +0.41% | +-+---+-++++ Medium concurrency +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 55.57 | -1.04% | 55.27 | -0.98% | +-+---+-++++ Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 --- M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test 3 files changed, 12 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/11939/2 -- To view, visit http://gerrit.cloudera.org:8080/11939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 Gerrit-Change-Number: 11939 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11964 ) Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11964/1/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/11964/1/tests/beeswax/impala_beeswax.py@206 PS1, Line 206: # Get the profile after unregistering the query so ExecSummary is included in it. > I think this is introducing a new race between the query being evicted from Good point although in practice, given the default size of query log and the size of the race window, it's unlikely to occur. Let me add a check in the test_hash_join_timer.py to make sure we check if the profile is not None. -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Nov 2018 19:24:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7852: Fix some flakiness in test hash join timer.py
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11964 Change subject: IMPALA-7852: Fix some flakiness in test_hash_join_timer.py .. IMPALA-7852: Fix some flakiness in test_hash_join_timer.py test_hash_join_timer.py aims to verify the timers in the join nodes are functioning correctly. It does so by parsing the query profile for certain patterns after the query has finished. Before IMPALA-4063, each individual fragment instance will post its profile to the coordinator upon completion. After IMPALA-4063, the profiles of all fragment instances are sent together periodically and the final profile is sent once all fragment instances on a backend are done. The problem with the existing implementation of the test is that it doesn't actually fetch results before closing the query. As a result of it, the coordinator fragment never gets a chance to complete as it will block forever when inserting into the plan root sink. The lack of completion of the coordinator fragment causes the final profiles of fragment instances on the coordinator to be not sent before the query is closed. As a result, the profile of a fragment instance on the coordinator could be stale if it completes between two periodic updates, leading to random test failure. This change fixes the flankiness by always fetching results before closing the query. Ideally, if we fix IMPALA-539 and wait for all backends' final profiles before completing query unregistration, we should get the final profile from the coordinator fragment too. Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 --- M be/src/runtime/query-state.cc M tests/beeswax/impala_beeswax.py M tests/query_test/test_hash_join_timer.py 3 files changed, 9 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/11964/1 -- To view, visit http://gerrit.cloudera.org:8080/11964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I851824dffb78c7731e60793d90f1e57050c54955 Gerrit-Change-Number: 11964 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-7829: Mark a fragment instance as done only after Close() is called
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11939 Change subject: IMPALA-7829: Mark a fragment instance as done only after Close() is called .. IMPALA-7829: Mark a fragment instance as done only after Close() is called As shown in IMPALA-7828. there is some non-determinism on whether the errors detected in FragmentInstanceState::Close() will show up in the final profile sent to the coordinator. The reason is that the current code marks a fragment instance as "done" after ExecInternal() completes but before Close() is called. There is a window between when the final status report is sent and when Close() finishes. This change fixes the problem by not sending the final report until Close() is called. This has no implication on the first row available time for normal queries. It may slightly lengthen the first row available time for DML queries. Testing done: Updated udf-no-expr-rewrite.test to exercise this test Perf run on an 8 node clusters didn't show any regression: TPCH-300 ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 23.94 | -2.05% | 12.55 | -2.62% | ++---+-++++ Small concurrency +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 6.89| -0.66% | 6.62 | +0.41% | +-+---+-++++ Medium concurrency +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 55.57 | -1.04% | 55.27 | -0.98% | +-+---+-++++ Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 --- M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test 3 files changed, 11 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/11939/1 -- To view, visit http://gerrit.cloudera.org:8080/11939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01 Gerrit-Change-Number: 11939 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 ) Change subject: IMPALA-7738: Implement timeouts for HDFS open calls .. Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/11874/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11874/4//COMMIT_MSG@17 PS4, Line 17: 5 minutes How is this determined ? In this patch, only hdfsOpen() has been ported over to have a timeout. Do we anticipate more HDFS operations to be ported over in the future ? If so, will they need a different timeout periods ? http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/handle-cache.inline.h File be/src/runtime/io/handle-cache.inline.h: http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/handle-cache.inline.h@33 PS4, Line 33: /* TODO: check return code */ nit: I believe we tend to use // for inline comments http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.h File be/src/runtime/io/hdfs-monitored-ops.h: http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.h@18 PS4, Line 18: IMPALA_RUNTIME_IO_HDFS_SHIM_H nit: should match file name http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.h@30 PS4, Line 30: // nit: /// http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.cc File be/src/runtime/io/hdfs-monitored-ops.cc: http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.cc@32 PS4, Line 32: HDFS operation should wait before timing out and failing Please see comments below in HdfsMonitor::OpenFileHandle(). It appears that a call to Open() can potentially block for longer than this timeout period. http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.cc@55 PS4, Line 55: virtual Status ExecuteImpl(); Why not just make it a pure virtual function ? and we don't need line 70 - line 73 below? http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.cc@66 PS4, Line 66: // Set when ExecuteImpl() completes Set to return status of ExecuteImpl() upon completion. http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/runtime/io/hdfs-monitored-ops.cc@139 PS4, Line 139: FLAGS_hdfs_operation_timeout_sec * MILLIS_PER_SEC); Both the Offer() and Wait() calls can block for almost FLAGS_hdfs_operation_timeout_sec seconds, right ? So, this function can in theory block for 2 * FLAGS_hdfs_operation_timeout_sec seconds, right ? So, should we be more clear about exactly what a HDFS operation in the definition (hdfs_operation_timeout_sec) means ? http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/util/thread-pool.h File be/src/util/thread-pool.h: http://gerrit.cloudera.org:8080/#/c/11874/4/be/src/util/thread-pool.h@116 PS4, Line 116: bool Offer(V&& work, int64_t timeout_millis) { May warrant a test in thread-pool-test.cc -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 08 Nov 2018 20:19:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7148: Make test profile fragment instances() more robust
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11906 Change subject: IMPALA-7148: Make test_profile_fragment_instances() more robust .. IMPALA-7148: Make test_profile_fragment_instances() more robust test_profile_fragment_instances() makes assumption about number of instances of each scan node in the distributed query plan. The number of instances for each scan nodes can actually vary depending on how data is loaded and scheduler's decision. This change relaxes the check for number of instances of each scan node is a multiple of 3 which is the number of scan nodes in the plan. Change-Id: I08b068c21e9637575c85f4d54be9f7c56c106bf1 --- M tests/query_test/test_observability.py 1 file changed, 5 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/11906/1 -- To view, visit http://gerrit.cloudera.org:8080/11906 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I08b068c21e9637575c85f4d54be9f7c56c106bf1 Gerrit-Change-Number: 11906 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-7828: A temporary workaround for flaky UDF test (test mem leak())
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11900 Change subject: IMPALA-7828: A temporary workaround for flaky UDF test (test_mem_leak()) .. IMPALA-7828: A temporary workaround for flaky UDF test (test_mem_leak()) Before IMPALA-4063, the error message detected during FragmentInstanceState::Close() was always lost. After IMPALA-4063, we may sometimes get the error message in FragmentInstanceState::Close(). It's non-deterministic as the fragment instance thread may race with the query state thread which reports the final status. The UDF test currently tries to handle this non-determinism by using "row_regex:.*" in the ERRORS section but it doesn't always seem to work. This change workarounds the issue by commenting out the ERRORS section in udf-no-expr-rewrite.text for now. The actual fix will be done in IMPALA-7829. Change-Id: I6a55d5ad1a5a7278e7390f60854a8df28c1b9f28 --- M testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/11900/1 -- To view, visit http://gerrit.cloudera.org:8080/11900 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6a55d5ad1a5a7278e7390f60854a8df28c1b9f28 Gerrit-Change-Number: 11900 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-6323 Allow constant analytic window expressions.
Michael Ho has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11556 ) Change subject: IMPALA-6323 Allow constant analytic window expressions. .. IMPALA-6323 Allow constant analytic window expressions. The constraint imposed by IMPALA-1354 was artificial. If there are constant "partition by" expressions, simply drop them, they are no-ops. Constant "order by" expressions can be ignored as well, though in effect they should be accounted for as null expressions in the backend, with the effect that combine all rows in the same window (i.e. no window breaks). Change-Id: Idf129026c45120e9470df601268863634037908c Reviewed-on: http://gerrit.cloudera.org:8080/11556 Tested-by: Impala Public Jenkins Reviewed-by: Michael Ho --- M be/src/exec/analytic-eval-node.cc M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test 6 files changed, 77 insertions(+), 26 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idf129026c45120e9470df601268863634037908c Gerrit-Change-Number: 11556 Gerrit-PatchSet: 16 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6323 Allow constant analytic window expressions.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11556 ) Change subject: IMPALA-6323 Allow constant analytic window expressions. .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf129026c45120e9470df601268863634037908c Gerrit-Change-Number: 11556 Gerrit-PatchSet: 15 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 06 Nov 2018 07:54:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6323 Allow constant analytic window expressions.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11556 ) Change subject: IMPALA-6323 Allow constant analytic window expressions. .. Patch Set 14: Michal, can you please look into the failures here: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3524/ Please do a private build with the new patch to make sure everything is sane. -- To view, visit http://gerrit.cloudera.org:8080/11556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf129026c45120e9470df601268863634037908c Gerrit-Change-Number: 11556 Gerrit-PatchSet: 14 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 05 Nov 2018 23:50:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7565: Add startup flag to set thrift connection setup thread pool size
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11873 ) Change subject: IMPALA-7565: Add startup flag to set thrift connection setup thread pool size .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/11873/2/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/11873/2/be/src/rpc/TAcceptQueueServer.cpp@30 PS2, Line 30: Impala internal, Beeswax and HiveServer2 connections nit: could be simpler to just say "Impala internal and external connections". http://gerrit.cloudera.org:8080/#/c/11873/2/be/src/rpc/TAcceptQueueServer.cpp@35 PS2, Line 35: internal, Beeswax and HiveServer2 connections internal and external -- To view, visit http://gerrit.cloudera.org:8080/11873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31344321a5f9e840a399ccb0f963c0759e2ab234 Gerrit-Change-Number: 11873 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 05 Nov 2018 23:12:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6323 Allow constant analytic window expressions.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11556 ) Change subject: IMPALA-6323 Allow constant analytic window expressions. .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf129026c45120e9470df601268863634037908c Gerrit-Change-Number: 11556 Gerrit-PatchSet: 13 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 05 Nov 2018 19:16:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 ) Change subject: IMPALA-4063: Merge report of query fragment instances per executor .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test File testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test: http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test@26 PS8, Line 26: #A fragment instance's last status report may be sent before calling Close() which is > This in fact may be racy as the final profile may be sent before Close() is So, workaround this flakiness in test by changing this to row_regex:.* for now. -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 05 Nov 2018 18:57:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11615 to look at the new patch set (#9). Change subject: IMPALA-4063: Merge report of query fragment instances per executor .. IMPALA-4063: Merge report of query fragment instances per executor Previously, each fragment instance executing on an executor will independently report its status to the coordinator periodically. This creates a huge amount of RPCs to the coordinator under highly concurrent workloads, causing lock contention in the coordinator's backend states when multiple fragment instances send them at the same time. In addition, due to the lack of coordination between query fragment instances, a query may end without collecting the profiles from all fragment instances when one of them hits an error before another fragment instance manages to finish Prepare(), leading to missing profiles for certain fragment instances. This change fixes the problem above by making a thread per QueryState (started by QueryExecMgr) to be responsible for periodically reporting the status and profiles of all fragment instances of a query running on a backend. As part of this refactoring, each query fragment instance will not report their errors individually. Instead, there is a cumulative status maintained per QueryState. It's set to the error status of the first fragment instance which hits an error or any general error (e.g. failure to start a thread) when starting fragment instances. With this change, the status reporting threads are also removed. Testing done: exhaustive tests This patch is based on a patch by Sailesh Mukil Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/control-service.cc M be/src/service/control-service.h M common/protobuf/control_service.proto M common/thrift/RuntimeProfile.thrift M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test D testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/query_test/test_observability.py M tests/query_test/test_udfs.py 22 files changed, 423 insertions(+), 475 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/9 -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7565: Add startup flag to set thrift connection setup thread pool size
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11873 ) Change subject: IMPALA-7565: Add startup flag to set thrift connection setup thread pool size .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/11873/1/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/11873/1/be/src/rpc/TAcceptQueueServer.cpp@30 PS1, Line 30: internal connections nit: not necessarily Impala internal connections. This class is used for Beeswax and HS2 servers too. http://gerrit.cloudera.org:8080/#/c/11873/1/be/src/rpc/TAcceptQueueServer.cpp@32 PS1, Line 32: connection May be "accepted_cnxn_setup_thread_pool_size" so it's consistent with the other knob. http://gerrit.cloudera.org:8080/#/c/11873/1/be/src/rpc/TAcceptQueueServer.cpp@34 PS1, Line 34: internal connections nit: not necessarily Impala internal connections. This class is used for Beeswax and HS2 servers too. -- To view, visit http://gerrit.cloudera.org:8080/11873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31344321a5f9e840a399ccb0f963c0759e2ab234 Gerrit-Change-Number: 11873 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sat, 03 Nov 2018 00:54:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7775: fix some lifecycle issues in statestore/session tests
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11864 ) Change subject: IMPALA-7775: fix some lifecycle issues in statestore/session tests .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11864/2/be/src/statestore/statestore-test.cc File be/src/statestore/statestore-test.cc: http://gerrit.cloudera.org:8080/#/c/11864/2/be/src/statestore/statestore-test.cc@40 PS2, Line 40: static ObjectPool* perm_objects > Yeah, it does - if we actually run the destructors stuff can crash while be This may be a bit subtle. Can you please add a comment about it ? -- To view, visit http://gerrit.cloudera.org:8080/11864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b25c8b8a96bfa1183ce273b3bb4debde234dd01 Gerrit-Change-Number: 11864 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 03 Nov 2018 00:10:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7775: fix some lifecycle issues in statestore/session tests
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11864 ) Change subject: IMPALA-7775: fix some lifecycle issues in statestore/session tests .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/11864/2/be/src/service/session-expiry-test.cc File be/src/service/session-expiry-test.cc: http://gerrit.cloudera.org:8080/#/c/11864/2/be/src/service/session-expiry-test.cc@53 PS2, Line 53: scoped_ptr metrics(new MetricGroup("statestore")); Does this need to live in an ObjectPool too like what was done in statestore-test.cc ? http://gerrit.cloudera.org:8080/#/c/11864/2/be/src/statestore/statestore-test.cc File be/src/statestore/statestore-test.cc: http://gerrit.cloudera.org:8080/#/c/11864/2/be/src/statestore/statestore-test.cc@40 PS2, Line 40: static ObjectPool* perm_objects Why not just static ObjectPool perm_object; ? Does it have anything to do with the IMPALA-5291 comment in main() ? -- To view, visit http://gerrit.cloudera.org:8080/11864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b25c8b8a96bfa1183ce273b3bb4debde234dd01 Gerrit-Change-Number: 11864 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 02 Nov 2018 22:54:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6323 Allow constant analytic window expressions.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11556 ) Change subject: IMPALA-6323 Allow constant analytic window expressions. .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11556 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf129026c45120e9470df601268863634037908c Gerrit-Change-Number: 11556 Gerrit-PatchSet: 12 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 02 Nov 2018 17:52:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6661 Make NaN values equal for grouping purposes.
Michael Ho has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11535 ) Change subject: IMPALA-6661 Make NaN values equal for grouping purposes. .. IMPALA-6661 Make NaN values equal for grouping purposes. Similar to the treatment of NULLs, we want to consider NaN values as equal when grouping. - When detecting a NaN in a set of row values, the NaN value must be converted to a canonical value - so that all NaN values have the same bit-pattern for hashing purposes. - When doing equality evaluation, floating point types must have additional logic to consider NaN values as equal. - Existing logic for handling NULLs in this way is appropriate for triggering this behavior for NaN values. - Relabel "force null equality" as "inclusive equality" to expand the scope of the concept to a more generic form that includes NaN. Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Reviewed-on: http://gerrit.cloudera.org:8080/11535 Tested-by: Impala Public Jenkins Reviewed-by: Michael Ho --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/runtime/raw-value.inline.h M testdata/workloads/functional-query/queries/QueryTest/aggregation.test M testdata/workloads/functional-query/queries/QueryTest/exprs.test M testdata/workloads/functional-query/queries/QueryTest/joins.test 11 files changed, 202 insertions(+), 35 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Gerrit-Change-Number: 11535 Gerrit-PatchSet: 24 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6661 Make NaN values equal for grouping purposes.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11535 ) Change subject: IMPALA-6661 Make NaN values equal for grouping purposes. .. Patch Set 23: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Gerrit-Change-Number: 11535 Gerrit-PatchSet: 23 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 02 Nov 2018 02:58:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 ) Change subject: IMPALA-4063: Merge report of query fragment instances per executor .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test File testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test: http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test@26 PS8, Line 26: UDF WARNING: Memory leaked via FunctionContext::Allocate(), 100 bytes leaked via FunctionContext::TrackAl This in fact may be racy as the final profile may be sent before Close() is called on a Fragment Instance. So, this is not 100% deterministic. -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 02 Nov 2018 00:49:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 ) Change subject: IMPALA-4063: Merge report of query fragment instances per executor .. Patch Set 7: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Nov 2018 16:56:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 ) Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC .. Patch Set 21: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/10855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe Gerrit-Change-Number: 10855 Gerrit-PatchSet: 21 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 01 Nov 2018 16:56:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 ) Change subject: IMPALA-4063: Merge report of query fragment instances per executor .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14 PS5, Line 14: In addition, due to the lack of coordination between query : fragment instances, a query may end without collecting the profiles : from all fragment instances when one of them hits an error before : another fragment instance manages to finish Prepare(), leading to : missing profiles for certain fragment instances. > Great. Could you also remove the xfail from test_profile_fragment_instances Done http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py@290 PS6, Line 290: self.run_test_case('QueryTest/udf-codegen-required', vector, use_db=unique_database) > This comment is outdated, and below Done -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Nov 2018 16:55:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
Hello Thomas Marshall, Todd Lipcon, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, Michal Ostrowski, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10855 to look at the new patch set (#21). Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC .. IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC This change converts ReportExecStatus() RPC from thrift based RPC to KRPC. This is done in part of the preparation for fixing IMPALA-2990 as we can take advantage of TCP connection multiplexing in KRPC to avoid overwhelming the coordinator with too many connections by reducing the number of TCP connection to one for each executor. This patch also introduces a new service pool for all query execution control related RPCs in the future so that control commands from coordinators aren't blocked by long-running DataStream services' RPCs. To avoid unnecessary delays due to sharing the network connections between DataStream service and Control service, this change added the service name as part of the user credentials for the ConnectionId so each service will use a separate connection. The majority of this patch is mechanical conversion of some Thrift structures used in ReportExecStatus() RPC to Protobuf. Note that the runtime profile is still retained as a Thrift structure as Impala clients will still fetch query profiles using Thrift RPCs. This also avoids duplicating the serialization implementation in both Thrift and Protobuf for the runtime profile. The Thrift runtime profiles are serialized and sent as a sidecar in ReportExecStatus() RPC. This patch also fixes IMPALA-7241 which may lead to duplicated dml stats being applied. The fix is by adding a monotonically increasing version number for fragment instances' reports. The coordinator will ignore any report smaller than or equal to the version in the last report. Testing done: 1. Exhaustive build. 2. Added some targeted test cases for profile serialization failure and RPC retries/timeout. Change-Id: I7638583b433dcac066b87198e448743d90415ebe --- M be/src/benchmarks/expr-benchmark.cc M be/src/catalog/catalog-util.cc M be/src/common/global-flags.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/rpc/CMakeLists.txt M be/src/rpc/jni-thrift-util.h M be/src/rpc/rpc-mgr-kerberized-test.cc M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr-test.h M be/src/rpc/rpc-mgr.h M be/src/rpc/thrift-util-test.cc M be/src/rpc/thrift-util.h M be/src/runtime/backend-client.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/scheduler-test-util.cc M be/src/service/CMakeLists.txt M be/src/service/client-request-state.cc M be/src/service/client-request-state.h A be/src/service/control-service.cc A be/src/service/control-service.h M be/src/service/data-stream-service.cc M be/src/service/data-stream-service.h M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/testutil/in-process-servers.cc M be/src/util/container-util.h A be/src/util/error-util-internal.h M be/src/util/error-util-test.cc M be/src/util/error-util.cc M be/src/util/error-util.h M be/src/util/runtime-profile.cc M be/src/util/uid-util.h M common/protobuf/CMakeLists.txt M common/protobuf/common.proto A common/protobuf/control_service.proto M common/protobuf/data_stream_service.proto M common/protobuf/row_batch.proto M common/protobuf/rpc_test.proto M common/thrift/ImpalaInternalService.thrift M tests/custom_cluster/test_rpc_timeout.py 65 files changed, 1,299 insertions(+), 769 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/10855/21 -- To view, visit http://gerrit.cloudera.org:8080/10855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe Gerrit-Change-Number: 10855 Gerrit-PatchSet: 21 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11615 to look at the new patch set (#7). Change subject: IMPALA-4063: Merge report of query fragment instances per executor .. IMPALA-4063: Merge report of query fragment instances per executor Previously, each fragment instance executing on an executor will independently report its status to the coordinator periodically. This creates a huge amount of RPCs to the coordinator under highly concurrent workloads, causing lock contention in the coordinator's backend states when multiple fragment instances send them at the same time. In addition, due to the lack of coordination between query fragment instances, a query may end without collecting the profiles from all fragment instances when one of them hits an error before another fragment instance manages to finish Prepare(), leading to missing profiles for certain fragment instances. This change fixes the problem above by making a thread per QueryState (started by QueryExecMgr) to be responsible for periodically reporting the status and profiles of all fragment instances of a query running on a backend. As part of this refactoring, each query fragment instance will not report their errors individually. Instead, there is a cumulative status maintained per QueryState. It's set to the error status of the first fragment instance which hits an error or any general error (e.g. failure to start a thread) when starting fragment instances. With this change, the status reporting threads are also removed. Testing done: exhaustive tests This patch is based on a patch by Sailesh Mukil Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/control-service.cc M be/src/service/control-service.h M common/protobuf/control_service.proto M common/thrift/RuntimeProfile.thrift M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test D testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/query_test/test_observability.py M tests/query_test/test_udfs.py 22 files changed, 420 insertions(+), 475 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/7 -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 06:00:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 4: OK. Will do a pass tonight. -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 00:23:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6661 Make NaN values equal for grouping purposes.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11535 ) Change subject: IMPALA-6661 Make NaN values equal for grouping purposes. .. Patch Set 19: There seems to be other test failures in addition to the one you fixed: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3421/ Please run a private build to confirm all newly added tests actually passed. -- To view, visit http://gerrit.cloudera.org:8080/11535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Gerrit-Change-Number: 11535 Gerrit-PatchSet: 19 Gerrit-Owner: Michal Ostrowski Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Oct 2018 22:24:26 + Gerrit-HasComments: No