[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. IMPALA-5199: prevent hang on empty row batch exchange The error path where delivery of "eos" fails now behaves the same as if delivery of a row batch fails. Testing: Added a timeout test where the leaf fragments return 0 rows. Before the change this reproduced the hang. Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Reviewed-on: http://gerrit.cloudera.org:8080/8005 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/runtime/data-stream-mgr.cc M testdata/workloads/functional-query/queries/QueryTest/exchange-delays.test 2 files changed, 23 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1228/ -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 3: Code-Review+2 carry -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: I spoke to Alex. He had some general concerns with whether it could possible cause queries to fail when they didn't before. I think our reasoning is valid that this is a small subset of the queries that would fail anyway in the same circumstance - those with non-empty exchanges. -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Dan Hecht has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Code-Review+2 Please check if Alex is still concerned before committing. -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Code-Review+1 The only concern I had is listed above. However, since this is not going as a part of a release now, we have a good amount of time for testing. So, I think we can go ahead with this and observe the impact. -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Dan Hecht has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Alex/Sailesh, are you okay with this change? -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Yeah I think there are some fundamental problems with the current parallel startup logic - I don't think it's actually possible to make all cases work with the current approach where all state is receiver-side and there's no additional communication (point-to-point or via the coordinator) to disambiguate hangs and delays. IMPALA-3990 -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Dan Hecht has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: Code-Review+1 Making these cases consistent makes sense to me. Really, I think we need to eventually rethink the whole parallel startup stuff, especially now that we have per-query execrpc, etc. It just seems too fragile. But in the mean time, I see no reason the eos and non-eos case inconsistent. -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: The query is definitely cancellable but I'm not sure how a user would necessarily know that it's hung as opposed to slow or what to do about it (this got reported to me by a user as a "slow" query). -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Alex Behm has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: If we hit IMPALA-5199 without this patch, can system be brought back into a good state by cancelling the query? I think it's generally preferable to avoid hangs even if it means we unnecessarily (gracefully) have cancel some queries under load. Whether the latter case is rare is really the key question here. Is there an experiment we can do to get data to make a decision? -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: I think there is a fundamental problem with the "closed streams cache" mechanism and long-running queries with limits in the middle of the plan though - IMPALA-3990 - which can happen even not under load. I think that's the bug you're describing. I guess this change makes IMPALA-3990 possible when 0 rows are sent. To me, it seems better to make behaviour consistent and avoid this hang even if adds another presumably rare case to IMPALA-3990. I don't think we've seen IMPALA-3990 in practice. I can see the argument though for just leaving this alone given the hang also seems to be rare. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: That seems pretty rare but possible. In the common case where the limit is at the top level of the query, I think this would be fine, because the query would hit the limit on the coordinator, return successfully, and ignore any future errors. Those timeout errors would have to arrive after STREAM_EXPIRATION_TIME_MS (since if the closed stream was in the cache, we wouldn't report an error before that time). So I think in general the timeline would need to be: * Limit is hit at t1 * The eos message is received at t2 >= t1 + STREAM_EXPIRATION_TIME_MS and an error is sent to the coordinator * The query fails, but it would have succeeded at t3 > t2 if the error hadn't been received If the limit is not at the top level, it seems possible but unlikely that the error would cancel a query that was on the path to succeeding. I'm not sure if it changes the odds of queries succeeding enough to worry about, given that the cluster is already unhealthy if these timeouts are occurring. -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. Patch Set 2: The fix looks good to me, and should work fine for most cases. One case I'm worried about is that, under heavy load, if a receiver deregisters itself early (due to a LIMIT, etc.), and the sender sends the 'eos' after STREAM_EXPIRATION_TIME_MS, the query would have previously succeeded, even though the sender couldn't find the receiver in CloseSender(). And the query would be right to succeed in this case since the 'eos' here is spurious as the receiver is already gone. Although, I know this counts as succeeding "by mistake". Now, we would be failing these queries. And given that we see the 'DATASTREAM_SENDER_TIMEOUT' fairly often, that would regress a few workloads. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. IMPALA-5199: prevent hang on empty row batch exchange The error path where delivery of "eos" fails now behaves the same as if delivery of a row batch fails. Testing: Added a timeout test where the leaf fragments return 0 rows. Before the change this reproduced the hang. Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 --- M be/src/runtime/data-stream-mgr.cc M testdata/workloads/functional-query/queries/QueryTest/exchange-delays.test 2 files changed, 23 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8005/2 -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/8005 Change subject: IMPALA-5199: prevent hang on empty row batch exchange .. IMPALA-5199: prevent hang on empty row batch exchange The error path where delivery of "eos" fails now behaves the same as if delivery of a row batch fails. Testing: Added a timeout test where the leaf fragments return 0 rows. Before the change this reproduced the hang. Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 --- M be/src/runtime/data-stream-mgr.cc M testdata/workloads/functional-query/queries/QueryTest/exchange-delays.test 2 files changed, 20 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8005/1 -- To view, visit http://gerrit.cloudera.org:8080/8005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong