[jira] [Comment Edited] (MESOS-8247) Executor registered message is lost

2018-01-11 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322114#comment-16322114
 ] 

Alexander Rukletsov edited comment on MESOS-8247 at 1/11/18 12:09 PM:
--

{noformat}
Commit: 9c03a463c1ac8f63dc00255945a04016c45f04e9 [9c03a46]
Author: Alexander Rukletsov 
Date: 11 January 2018 at 13:07:29 GMT+1
Committer: Alexander Rukletsov 

Logged socket create/connect failures on warning level.

If a socket create or connect failure occurs during link() or send(),
the reason for error is not propagated to the library user. Also, the
data being sent or queued is silently dropped on the floor. The socket
code does not log itself on a higher level when an error situation
occurs. The only trace is the log entries touched in this patch:
having them at warning level will significantly simplify debugging.

This patch also consistently logs send / link target.

Review: https://reviews.apache.org/r/65048/
{noformat}


was (Author: alexr):
{noformat}
Commit: 9c03a463c1ac8f63dc00255945a04016c45f04e9 [9c03a46]
Parents: 164d99e1be
Author: Alexander Rukletsov 
Date: 11 January 2018 at 13:07:29 GMT+1
Committer: Alexander Rukletsov 
Labels: HEAD -> master

Logged socket create/connect failures on warning level.

If a socket create or connect failure occurs during link() or send(),
the reason for error is not propagated to the library user. Also, the
data being sent or queued is silently dropped on the floor. The socket
code does not log itself on a higher level when an error situation
occurs. The only trace is the log entries touched in this patch:
having them at warning level will significantly simplify debugging.

This patch also consistently logs send / link target.

Review: https://reviews.apache.org/r/65048/
{noformat}

> Executor registered message is lost
> ---
>
> Key: MESOS-8247
> URL: https://issues.apache.org/jira/browse/MESOS-8247
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>
> h3. Brief description of successful agent-executor communication.
> Executor sends `RegisterExecutorMessage` message to Agent during 
> initialization step. Agent sends a `ExecutorRegisteredMessage` message as a 
> response to the Executor in `registerExecutor()` method. Whenever executor 
> receives `ExecutorRegisteredMessage`, it prints a `Executor registered on 
> agent...` to stderr logs.
> h3. Problem description.
> The agent launches built-in docker executor, which is stuck in `STAGING` 
> state.
> stderr logs of the docker executor:
> {code}
> I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3
> {code}
> It doesn't contain a message like `Executor registered on agent...`. At the 
> same time agent received `RegisterExecutorMessage` and sent `runTask` message 
> to the executor.
> stdout logs consists of the same repeating message:
> {code}
> Received killTask for task ...
> {code}
> Also, the docker executor process doesn't contain child processes.
> Currently, executor [doesn't 
> attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320]
>  to launch a task if it is not registered at the agent, while [task 
> killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343]
>  doesn't have such a check.
> It looks like `ExecutorRegisteredMessage` has been lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8247) Executor registered message is lost

2017-12-04 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262862#comment-16262862
 ] 

Alexander Rukletsov edited comment on MESOS-8247 at 12/4/17 6:00 PM:
-

There are two problems here.

1. Docker executor does not receive the registration confirmation from the 
agent though the agent sends it out. In other words, 
{{ExecutorRegisteredMessage}} is lost. I do not yet know the reason, why the 
message has been lost. All messages except task status updates have 
"at-most-once" delivery policy, so this is theoretically possible. I will 
continue investigation after fixing the problem mentioned below.

2. If docker executor receives a kill task request and the task has never been 
launch, the request is ignored. We now know that: the executor has never 
received the registration confirmation, hence has ignored the launch task 
request, hence the task has never started. And this is how the executor enters 
an idle state, waiting for registration and ignoring kill task requests. This 
problem is captured by MESOS-8297.


was (Author: alexr):
There are two problems here.

1. Docker executor does not receive the registration confirmation from the 
agent though the agent sends it out. In other words, 
{{ExecutorRegisteredMessage}} is lost. I do not yet know the reason, why the 
message has been lost. All messages except task status updates have 
"at-most-once" delivery policy, so this is theoretically possible. I will 
continue investigation after fixing the problem mentioned below.

2. If docker executor receives a kill task request and the task has never been 
launch, the request is ignored. We now know that: the executor has never 
received the registration confirmation, hence has ignored the launch task 
request, hence the task has never started. And this is how the executor enters 
an idle state, waiting for registration and ignoring kill task requests. 

These patches ensure that the driver-based executors react at kill task 
requests even if the task has not been launched:
https://reviews.apache.org/r/64032/
https://reviews.apache.org/r/64033/

> Executor registered message is lost
> ---
>
> Key: MESOS-8247
> URL: https://issues.apache.org/jira/browse/MESOS-8247
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>
> h3. Brief description of successful agent-executor communication.
> Executor sends `RegisterExecutorMessage` message to Agent during 
> initialization step. Agent sends a `ExecutorRegisteredMessage` message as a 
> response to the Executor in `registerExecutor()` method. Whenever executor 
> receives `ExecutorRegisteredMessage`, it prints a `Executor registered on 
> agent...` to stderr logs.
> h3. Problem description.
> The agent launches built-in docker executor, which is stuck in `STAGING` 
> state.
> stderr logs of the docker executor:
> {code}
> I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3
> {code}
> It doesn't contain a message like `Executor registered on agent...`. At the 
> same time agent received `RegisterExecutorMessage` and sent `runTask` message 
> to the executor.
> stdout logs consists of the same repeating message:
> {code}
> Received killTask for task ...
> {code}
> Also, the docker executor process doesn't contain child processes.
> Currently, executor [doesn't 
> attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320]
>  to launch a task if it is not registered at the agent, while [task 
> killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343]
>  doesn't have such a check.
> It looks like `ExecutorRegisteredMessage` has been lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8247) Executor registered message is lost

2017-11-23 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262862#comment-16262862
 ] 

Alexander Rukletsov edited comment on MESOS-8247 at 11/23/17 10:41 AM:
---

There are two problems here.

1. Docker executor does not receive the registration confirmation from the 
agent though the agent sends it out. In other words, 
{{ExecutorRegisteredMessage}} is lost. I do not yet know the reason, why the 
message has been lost. All messages except task status updates have 
"at-most-once" delivery policy, so this is theoretically possible. I will 
continue investigation after fixing the problem mentioned below.

2. If docker executor receives a kill task request and the task has never been 
launch, the request is ignored. We now know that: the executor has never 
received the registration confirmation, hence has ignored the launch task 
request, hence the task has never started. And this is how the executor enters 
an idle state, waiting for registration and ignoring kill task requests. 

These patches ensure that the driver-based executors react at kill task 
requests even if the task has not been launched:
https://reviews.apache.org/r/64032/
https://reviews.apache.org/r/64033/


was (Author: alexr):
These patches ensure that the driver-based executors react at kill task 
requests even if the task has not been launched:
https://reviews.apache.org/r/64032/
https://reviews.apache.org/r/64033/

> Executor registered message is lost
> ---
>
> Key: MESOS-8247
> URL: https://issues.apache.org/jira/browse/MESOS-8247
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>
> h3. Brief description of successful agent-executor communication.
> Executor sends `RegisterExecutorMessage` message to Agent during 
> initialization step. Agent sends a `ExecutorRegisteredMessage` message as a 
> response to the Executor in `registerExecutor()` method. Whenever executor 
> receives `ExecutorRegisteredMessage`, it prints a `Executor registered on 
> agent...` to stderr logs.
> h3. Problem description.
> The agent launches built-in docker executor, which is stuck in `STAGING` 
> state.
> stderr logs of the docker executor:
> {code}
> I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3
> {code}
> It doesn't contain a message like `Executor registered on agent...`. At the 
> same time agent received `RegisterExecutorMessage` and sent `runTask` message 
> to the executor.
> stdout logs consists of the same repeating message:
> {code}
> Received killTask for task ...
> {code}
> Also, the docker executor process doesn't contain child processes.
> Currently, executor [doesn't 
> attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320]
>  to launch a task if it is not registered at the agent, while [task 
> killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343]
>  doesn't have such a check.
> It looks like `ExecutorRegisteredMessage` has been lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)