[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config always contains the
local backend. It also changes scheduling when exec_at_coord is true to
always use the local backend, irrespective of whether it is present in
the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Reviewed-on: http://gerrit.cloudera.org:8080/5127
Reviewed-by: Bharath Vissapragada 
Tested-by: Internal Jenkins
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
3 files changed, 157 insertions(+), 96 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 10: Code-Review+2

Carrying +2 for Lars.

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Internal Jenkins, Sailesh Mukil, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5127

to look at the new patch set (#10).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config always contains the
local backend. It also changes scheduling when exec_at_coord is true to
always use the local backend, irrespective of whether it is present in
the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
3 files changed, 157 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 9:

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/539/

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 9: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/538/

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 9: Code-Review+2

> Uploaded patch set 9: Patch Set 8 was rebased.

Carrying +2 for Lars

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 7:

(4 comments)

Thanks for the review. Addressed the comments and ran git-clang-format. Will 
rebase next.

http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 201:   if (topic != incoming_topic_deltas.end()) {
> while you're at it, change this to == and return
Done


http://gerrit.cloudera.org:8080/#/c/5127/7/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 266: // and tell the statestore. We also ensure that it is always 
registered with itself.
> 'registered with itself' sounds a bit odd. '.. that it is part of our backe
Done


Line 269: local_backend_descriptor_.address.hostname, nullptr));
> odd indentation
Done


Line 606:   BackendConfig coord_only_config;
> make this a class member and initialize in c'tor.
Done, but needed to initialize it in Init() since we need to lookup the local 
IP address first, which is done there.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-23 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5127

to look at the new patch set (#8).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config always contains the
local backend. It also changes scheduling when exec_at_coord is true to
always use the local backend, irrespective of whether it is present in
the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
3 files changed, 159 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 7: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 201:   if (topic != incoming_topic_deltas.end()) {
while you're at it, change this to == and return


http://gerrit.cloudera.org:8080/#/c/5127/7/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 266: // and tell the statestore. We also ensure that it is always 
registered with itself.
'registered with itself' sounds a bit odd. '.. that it is part of our backend 
config'?


Line 269: local_backend_descriptor_.address.hostname, nullptr));
odd indentation


Line 606:   BackendConfig coord_only_config;
make this a class member and initialize in c'tor.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-22 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 6:

(1 comment)

Thanks for the reviews!

http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 250: << be_desc.address;
> Could use continue; here to be consistent with the above checks.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-22 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5127

to look at the new patch set (#7).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config always contains the
local backend. It also changes scheduling when exec_at_coord is true to
always use the local backend, irrespective of whether it is present in
the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
2 files changed, 89 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 6: Code-Review+1

(1 comment)

Thanks for your patience, I think this is looking good.

http://gerrit.cloudera.org:8080/#/c/5127/6/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 250: << be_desc.address;
Could use continue; here to be consistent with the above checks.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-21 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#6).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config always contains the
local backend. It also changes scheduling when exec_at_coord is true to
always use the local backend, irrespective of whether it is present in
the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
2 files changed, 91 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5127/3/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 204: if (delta.is_delta && delta.topic_entries.empty() && 
delta.topic_deletions.empty()) {
> Can you comment on what this achieves (it's an optimisation to avoid a copy
Done in the next PS


http://gerrit.cloudera.org:8080/#/c/5127/4/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 261: if (new_backend_config->NumBackends() > 0
> I'm not sure about the NumBackends() > 0 check.
The reasoning behind the check was that it prevents empty messages from the 
statestore (heartbeats) from setting up the local backend as the only known 
backend. If first an empty heartbeat arrives, then a query, and then the 
initial statestore message, then the whole query will be assigned to the 
scheduler. Assuming the first message by the statestore contains the cluster 
membership, I removed the >0 check.


Line 264:   new_backend_config->AddBackend(local_backend_descriptor_);
> Do we also need to add this to current_membership_? I think  we want curren
So far current_membership_ has mirrored the statestore's view. Adding the local 
backend here will prevent the code below from registering it with the 
statestore. current_membership_ is also not used outside this method. Do you 
see a way to keep them consistent while still tracking both the local and the 
statestore views?


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#5).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config is either empty or
contains the local backend as well. It also changes scheduling when
exec_at_coord is true to always use the local backend, irrespective of
whether it is present in the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
2 files changed, 85 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 4:

(2 comments)

Just saw PS4 (which invalidates some of my PS3 comments).

http://gerrit.cloudera.org:8080/#/c/5127/4/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 261: if (new_backend_config->NumBackends() > 0
I'm not sure about the NumBackends() > 0 check.

I think the only case when we have an empty set of backends now is when there 
are no other registered backends, which means the cluster includes only the 
current impalad. In that case I think we want the backend config to include the 
current backend, rather than waiting for the change to propagate.


Line 264:   new_backend_config->AddBackend(local_backend_descriptor_);
Do we also need to add this to current_membership_? I think  we want 
current_membership_ and the BackendConfig to be consistent.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 3:

(4 comments)

It would still be nice to remove the situation between the first topic update 
and the second where we have all of the backends except the local one. I feel 
like that scenario just makes things hard to reason about.

http://gerrit.cloudera.org:8080/#/c/5127/3/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 204: if (delta.is_delta && delta.topic_entries.empty() && 
delta.topic_deletions.empty()) {
Can you comment on what this achieves (it's an optimisation to avoid a copy 
rather than a correctness thing, right?)


Line 251:   new_backend_config->AddBackend(be_desc);
Maybe we should just ignore updates for the local backend here.


Line 255: for (const string& backend_id: delta.topic_deletions) {
And here


Line 265: if (current_membership_.find(local_backend_id_) == 
current_membership_.end()) {
And then add the local backend directly instead of waiting for it to propagate 
back from the statestore.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#4).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check when handling updates from the
statestore to make sure that the backend config is either empty or
contains the local backend as well. It also changes scheduling when
exec_at_coord is true to always use the local backend, irrespective of
whether it is present in the backend config.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
2 files changed, 77 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 3:

> I think it would make sense to add the local backend to the list of
 > backends at the same time as we process the initial update. That
 > way we only have two states: no backend, and all the backends. I
 > think that's what you were suggesting. Scheduling fails if we have
 > no backends, right?

I agree. To make sure we only have these two states I added the local backend 
to the backend_state every time it is missing and the backend_state is not 
empty. A few tests had removed the first backend and now needed to be adjusted 
to remove a different backend to be effective.

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 3:

I think it would make sense to add the local backend to the list of backends at 
the same time as we process the initial update. That way we only have two 
states: no backend, and all the backends. I think that's what you were 
suggesting. Scheduling fails if we have no backends, right?

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

Thank you for your comments. I modified the change to always schedule at the 
coordinator if exec_at_coord is true. However, I think that scheduling at the 
coordinator if !exec_at_coord and no other backends are registered is not what 
we would want, since that way a large query can be completely scheduled at the 
coordinator, just because it has not received the initial update from the 
statestored. It is also the behavior of Impala 2.6 to abort scheduling in this 
case.

There is still a small time window during daemon startup in which the 
statestore could have sent the initial update, to which the coordinator will 
reply by registering itself with the statestore. If we hit that window before 
the statestore sends back the local backend's registration, then the 
coordinator will not schedule scan ranges on itself.

Should we change that, too? For example, we could add the local backend to the 
list of backends if it is non-empty but does not contain the local scheduler.

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#3).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check to ensure that the local backend
has been registered with the scheduler before issuing scan ranges when
exec_at_coord is set.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
2 files changed, 55 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-18 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#2).

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check to ensure that the local backend
has been registered with the scheduler before issuing scan ranges when
exec_at_coord is set.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
2 files changed, 55 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

(1 comment)

It might be best to special-case the local backend. At a high level this is the 
most intuitive thing to do. It seems like we already special-case the local 
backend to some extent in UpdateMembership(). Otherwise waiting until the 
membership update is received may make sense.

http://gerrit.cloudera.org:8080/#/c/5127/1/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 300:   BackendConfigPtr backend_config = GetBackendConfig();
I think it would make sense to detect at this point if the backend config 
doesn't have the coordinator in it. Then we could wait for the config to be 
updated (maybe with a timeout) before failing the query.

Even if we don't crash, it seems very strange for a coordinator to not schedule 
a query on itself in the !exec_on_coord case.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

Thanks for your feedback. I agree that this is not optimal, and if there was a 
way to send the equivalent of EAGAIN to the client and make it try again either 
with this or a different coordinator that could be a workaround. I'd also be 
happy to add a small timeout of a few seconds before returning an error. I'm 
rather reluctant to change the way we register the local backend with itself, 
since that way we will have two mechanisms of updating the membership 
information and need to handle conflicts as well.

That being said, between a) Relying on the client to try again, b) waiting for 
5 seconds (or whatever value), and c) reworking the cluster membership 
information management code in the scheduler, which one do you prefer?

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

This is better than crashing, but it also seems like we shouldn't fail queries 
for this reason. Probably we should wait until we've gotten the statestore 
update message before trying to schedule queries (either that or make sure the 
local backend is always present).

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5127

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check to ensure that the local backend
has been registered with the scheduler before issuing scan ranges when
exec_at_coord is set.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
M common/thrift/generate_error_codes.py
3 files changed, 32 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker