[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config always contains the local backend. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Reviewed-on: http://gerrit.cloudera.org:8080/5127 Reviewed-by: Bharath VissapragadaTested-by: Internal Jenkins --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc M be/src/scheduling/simple-scheduler.h 3 files changed, 157 insertions(+), 96 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 10: Code-Review+2 Carrying +2 for Lars. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Hello Marcel Kornacker, Internal Jenkins, Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5127 to look at the new patch set (#10). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config always contains the local backend. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc M be/src/scheduling/simple-scheduler.h 3 files changed, 157 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/10 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 9: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/539/ -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 9: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/538/ -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 9: Code-Review+2 > Uploaded patch set 9: Patch Set 8 was rebased. Carrying +2 for Lars -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 7: (4 comments) Thanks for the review. Addressed the comments and ran git-clang-format. Will rebase next. http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 201: if (topic != incoming_topic_deltas.end()) { > while you're at it, change this to == and return Done http://gerrit.cloudera.org:8080/#/c/5127/7/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 266: // and tell the statestore. We also ensure that it is always registered with itself. > 'registered with itself' sounds a bit odd. '.. that it is part of our backe Done Line 269: local_backend_descriptor_.address.hostname, nullptr)); > odd indentation Done Line 606: BackendConfig coord_only_config; > make this a class member and initialize in c'tor. Done, but needed to initialize it in Init() since we need to lookup the local IP address first, which is done there. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5127 to look at the new patch set (#8). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config always contains the local backend. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc M be/src/scheduling/simple-scheduler.h 3 files changed, 159 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/8 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 7: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 201: if (topic != incoming_topic_deltas.end()) { while you're at it, change this to == and return http://gerrit.cloudera.org:8080/#/c/5127/7/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 266: // and tell the statestore. We also ensure that it is always registered with itself. 'registered with itself' sounds a bit odd. '.. that it is part of our backend config'? Line 269: local_backend_descriptor_.address.hostname, nullptr)); odd indentation Line 606: BackendConfig coord_only_config; make this a class member and initialize in c'tor. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 6: (1 comment) Thanks for the reviews! http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 250: << be_desc.address; > Could use continue; here to be consistent with the above checks. Done -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5127 to look at the new patch set (#7). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config always contains the local backend. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc 2 files changed, 89 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/7 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 6: Code-Review+1 (1 comment) Thanks for your patience, I think this is looking good. http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 250: << be_desc.address; Could use continue; here to be consistent with the above checks. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new patch set (#6). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config always contains the local backend. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc 2 files changed, 91 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/6 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/5127/3/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 204: if (delta.is_delta && delta.topic_entries.empty() && delta.topic_deletions.empty()) { > Can you comment on what this achieves (it's an optimisation to avoid a copy Done in the next PS http://gerrit.cloudera.org:8080/#/c/5127/4/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 261: if (new_backend_config->NumBackends() > 0 > I'm not sure about the NumBackends() > 0 check. The reasoning behind the check was that it prevents empty messages from the statestore (heartbeats) from setting up the local backend as the only known backend. If first an empty heartbeat arrives, then a query, and then the initial statestore message, then the whole query will be assigned to the scheduler. Assuming the first message by the statestore contains the cluster membership, I removed the >0 check. Line 264: new_backend_config->AddBackend(local_backend_descriptor_); > Do we also need to add this to current_membership_? I think we want curren So far current_membership_ has mirrored the statestore's view. Adding the local backend here will prevent the code below from registering it with the statestore. current_membership_ is also not used outside this method. Do you see a way to keep them consistent while still tracking both the local and the statestore views? -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new patch set (#5). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config is either empty or contains the local backend as well. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc 2 files changed, 85 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/5 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 4: (2 comments) Just saw PS4 (which invalidates some of my PS3 comments). http://gerrit.cloudera.org:8080/#/c/5127/4/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 261: if (new_backend_config->NumBackends() > 0 I'm not sure about the NumBackends() > 0 check. I think the only case when we have an empty set of backends now is when there are no other registered backends, which means the cluster includes only the current impalad. In that case I think we want the backend config to include the current backend, rather than waiting for the change to propagate. Line 264: new_backend_config->AddBackend(local_backend_descriptor_); Do we also need to add this to current_membership_? I think we want current_membership_ and the BackendConfig to be consistent. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 3: (4 comments) It would still be nice to remove the situation between the first topic update and the second where we have all of the backends except the local one. I feel like that scenario just makes things hard to reason about. http://gerrit.cloudera.org:8080/#/c/5127/3/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 204: if (delta.is_delta && delta.topic_entries.empty() && delta.topic_deletions.empty()) { Can you comment on what this achieves (it's an optimisation to avoid a copy rather than a correctness thing, right?) Line 251: new_backend_config->AddBackend(be_desc); Maybe we should just ignore updates for the local backend here. Line 255: for (const string& backend_id: delta.topic_deletions) { And here Line 265: if (current_membership_.find(local_backend_id_) == current_membership_.end()) { And then add the local backend directly instead of waiting for it to propagate back from the statestore. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new patch set (#4). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check when handling updates from the statestore to make sure that the backend config is either empty or contains the local backend as well. It also changes scheduling when exec_at_coord is true to always use the local backend, irrespective of whether it is present in the backend config. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc 2 files changed, 77 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/4 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 3: > I think it would make sense to add the local backend to the list of > backends at the same time as we process the initial update. That > way we only have two states: no backend, and all the backends. I > think that's what you were suggesting. Scheduling fails if we have > no backends, right? I agree. To make sure we only have these two states I added the local backend to the backend_state every time it is missing and the backend_state is not empty. A few tests had removed the first backend and now needed to be adjusted to remove a different backend to be effective. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 3: I think it would make sense to add the local backend to the list of backends at the same time as we process the initial update. That way we only have two states: no backend, and all the backends. I think that's what you were suggesting. Scheduling fails if we have no backends, right? -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: Thank you for your comments. I modified the change to always schedule at the coordinator if exec_at_coord is true. However, I think that scheduling at the coordinator if !exec_at_coord and no other backends are registered is not what we would want, since that way a large query can be completely scheduled at the coordinator, just because it has not received the initial update from the statestored. It is also the behavior of Impala 2.6 to abort scheduling in this case. There is still a small time window during daemon startup in which the statestore could have sent the initial update, to which the coordinator will reply by registering itself with the statestore. If we hit that window before the statestore sends back the local backend's registration, then the coordinator will not schedule scan ranges on itself. Should we change that, too? For example, we could add the local backend to the list of backends if it is non-empty but does not contain the local scheduler. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check to ensure that the local backend has been registered with the scheduler before issuing scan ranges when exec_at_coord is set. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc 2 files changed, 55 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/3 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new patch set (#2). Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check to ensure that the local backend has been registered with the scheduler before issuing scan ranges when exec_at_coord is set. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc 2 files changed, 55 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/2 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: (1 comment) It might be best to special-case the local backend. At a high level this is the most intuitive thing to do. It seems like we already special-case the local backend to some extent in UpdateMembership(). Otherwise waiting until the membership update is received may make sense. http://gerrit.cloudera.org:8080/#/c/5127/1/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 300: BackendConfigPtr backend_config = GetBackendConfig(); I think it would make sense to detect at this point if the backend config doesn't have the coordinator in it. Then we could wait for the config to be updated (maybe with a timeout) before failing the query. Even if we don't crash, it seems very strange for a coordinator to not schedule a query on itself in the !exec_on_coord case. -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: Thanks for your feedback. I agree that this is not optimal, and if there was a way to send the equivalent of EAGAIN to the client and make it try again either with this or a different coordinator that could be a workaround. I'd also be happy to add a small timeout of a few seconds before returning an error. I'm rather reluctant to change the way we register the local backend with itself, since that way we will have two mechanisms of updating the membership information and need to handle conflicts as well. That being said, between a) Relying on the client to try again, b) waiting for 5 seconds (or whatever value), and c) reworking the cluster membership information management code in the scheduler, which one do you prefer? -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. Patch Set 1: This is better than crashing, but it also seems like we shouldn't fail queries for this reason. Probably we should wait until we've gotten the statestore update message before trying to schedule queries (either that or make sure the local backend is always present). -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/5127 Change subject: IMPALA-4494: Fix crash in SimpleScheduler .. IMPALA-4494: Fix crash in SimpleScheduler The scheduler maintains a local list of active backends, which is updated through messages from the statestore. Even the local backend enters this list by registering with the statestore and being included in a statestore update message. Thus, during restarts it can happen that a query gets scheduled with exec_at_coord set to true, while the local backend has not been registered with the scheduler. In this case the IP address lookup in the internal BackendConfig fails and an empty IP address is returned, leading to a nullptr dereference down the line. This change adds an additional check to ensure that the local backend has been registered with the scheduler before issuing scan ranges when exec_at_coord is set. Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f --- M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc M common/thrift/generate_error_codes.py 3 files changed, 32 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/1 -- To view, visit http://gerrit.cloudera.org:8080/5127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker