[jira] [Updated] (MESOS-6821) Override of automatic resources should be by exact match not substring

2017-01-19 Thread Bruce Merry (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Merry updated MESOS-6821:
---
Description: The agent code for auto-detecting resources (cpus, mem, disk) 
assumes that, say, "cpus" has been specified if the string "cpus" appears 
anywhere in the resource string (see 
[here|https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79]).
 This means that using a custom resource called, say, "members", will disable 
auto-detection of the "mem" resource.  (was: The agent code for auto-detecting 
resources (cpus, mem, disk) assumes that, say, "cpus" has been specified if the 
string "cpus" appears anywhere in the resource string (see 
[here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)).
 This means that using a custom resource called, say, "members", will disable 
auto-detection of the "mem" resource.)

> Override of automatic resources should be by exact match not substring
> --
>
> Key: MESOS-6821
> URL: https://issues.apache.org/jira/browse/MESOS-6821
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04 x86_64
>Reporter: Bruce Merry
>Priority: Minor
>  Labels: newbie
>
> The agent code for auto-detecting resources (cpus, mem, disk) assumes that, 
> say, "cpus" has been specified if the string "cpus" appears anywhere in the 
> resource string (see 
> [here|https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79]).
>  This means that using a custom resource called, say, "members", will disable 
> auto-detection of the "mem" resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6821) Override of automatic resources should be by exact match not substring

2017-01-19 Thread Bruce Merry (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831350#comment-15831350
 ] 

Bruce Merry commented on MESOS-6821:


I'm going to look into this now.

> Override of automatic resources should be by exact match not substring
> --
>
> Key: MESOS-6821
> URL: https://issues.apache.org/jira/browse/MESOS-6821
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04 x86_64
>Reporter: Bruce Merry
>Priority: Minor
>  Labels: newbie
>
> The agent code for auto-detecting resources (cpus, mem, disk) assumes that, 
> say, "cpus" has been specified if the string "cpus" appears anywhere in the 
> resource string (see 
> [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)).
>  This means that using a custom resource called, say, "members", will disable 
> auto-detection of the "mem" resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6821) Override of automatic resources should be by exact match not substring

2017-01-19 Thread Bruce Merry (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Merry updated MESOS-6821:
---
Description: The agent code for auto-detecting resources (cpus, mem, disk) 
assumes that, say, "cpus" has been specified if the string "cpus" appears 
anywhere in the resource string (see 
[here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)).
 This means that using a custom resource called, say, "members", will disable 
auto-detection of the "mem" resource.  (was: The agent code for auto-detecting 
resources (cpus, mem, disk) assumes that, say, "cpus" has been specified in the 
string "cpus" appears anywhere in the resource string (see 
[here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)).
 This means that using a custom resource called, say, "members", will disable 
auto-detection of the "mem" resource.)

> Override of automatic resources should be by exact match not substring
> --
>
> Key: MESOS-6821
> URL: https://issues.apache.org/jira/browse/MESOS-6821
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04 x86_64
>Reporter: Bruce Merry
>Priority: Minor
>  Labels: newbie
>
> The agent code for auto-detecting resources (cpus, mem, disk) assumes that, 
> say, "cpus" has been specified if the string "cpus" appears anywhere in the 
> resource string (see 
> [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)).
>  This means that using a custom resource called, say, "members", will disable 
> auto-detection of the "mem" resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task was stuck at staging state almost 6hours even after slave executor is 
terminated on the slave. Since the task was stuck at staging, framework have 
not received update from mesos-master.

 The issue got fixed after slave restart and the task was moved from staging to 
task lost state.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 

[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task was stuck at staging state almost 6hours even after slave executor is 
terminated on the slave. Since the task was stuck at staging, framework have 
not received update from mesos-master.

 The issue got fixed after slave restart and the task was moved from staging to 
task lost state.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 

[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task was stuck at staging state almost 6hours even after slave executor is 
terminated on the slave. Since the task was stuck at staging, framework have 
not received update from mesos-master.

 The issue got fixed after slave restart and the task was moved from staging to 
task lost state.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 

[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task was stuck at staging state almost 6hours even after slave executor is 
terminated on the slave. Since the task was stuck at staging, framework have 
not received update from mesos-master.

 The issue got fixed after slave restart and the task was removed from staging 
to task lost state.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 

[jira] [Commented] (MESOS-6944) Mesos - AD integration Process / Example

2017-01-19 Thread Rahul Bhardwaj (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831294#comment-15831294
 ] 

Rahul Bhardwaj commented on MESOS-6944:
---

Hi [~klueska], [~arojas], 

Thanks a lot for sharing these details. We will look for our options.

It would be good if in future you provide us this option with Mesos just like 
Marathon does.

 


Thanks
Rahul

> Mesos - AD integration Process / Example
> 
>
> Key: MESOS-6944
> URL: https://issues.apache.org/jira/browse/MESOS-6944
> Project: Mesos
>  Issue Type: Task
>  Components: modules
>Reporter: Rahul Bhardwaj
>  Labels: mesosphere
>
> Hi Team,
> We are trying to configure AD authentication with Mesos for HTTP endpoints 
> (only UI). 
> But we couldnt find any clear documentation or exmaple on your site  
> http://mesos.apache.org/ that shows the process of integration with AD 
> (ldap).  Also we could not find reference to any existing Ldap library to use 
> with Mesos on the Module page.
> Authentication doc: 
> http://mesos.apache.org/documentation/latest/authentication/. 
> Module doc:http://mesos.apache.org/documentation/latest/modules/ 
> (Authentication section).
> Can you please tell us if this feature is already available and an example 
> documentation will help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6940) Do not send offers to MULTI_ROLE schedulers if agent does not have MULTI_ROLE capability.

2017-01-19 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-6940:
--

Assignee: Jay Guo

> Do not send offers to MULTI_ROLE schedulers if agent does not have MULTI_ROLE 
> capability.
> -
>
> Key: MESOS-6940
> URL: https://issues.apache.org/jira/browse/MESOS-6940
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> Old agents that do not have the MULTI_ROLE capability cannot correctly 
> receive tasks from schedulers that have the MULTI_ROLE capability *and are 
> using multiple roles*. In this case, we should not send the offer to the 
> scheduler, rather than sending an offer but rejecting the scheduler's 
> operations.
> Note also that since we allow a single role scheduler to upgrade into having 
> the MULTI_ROLE capability (use of the {{FrameworkInfo.roles}} field) so long 
> as they continue to use a single role (in phase 1 of multi-role support the 
> roles cannot be changed), we could continue sending offers if the scheduler 
> is MULTI_ROLE capable but only uses a single role.
> In phase 2 of multi-role support, we cannot safely allow a MULTI_ROLE 
> scheduler to receive resources from a non-MULTI_ROLE agent, so it seems we 
> should simply disallow MULTI_ROLE schedulers from receiving offers from 
> non-MULTI_ROLE agents, regardless of how many roles the scheduler is using.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6902) Add support for agent capabilities

2017-01-19 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831216#comment-15831216
 ] 

Jay Guo commented on MESOS-6902:


https://reviews.apache.org/r/55710/ Add agent capabilities to v0 master API 
/state

> Add support for agent capabilities
> --
>
> Key: MESOS-6902
> URL: https://issues.apache.org/jira/browse/MESOS-6902
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> Similarly to how we might add support for master capabilities (MESOS-5675), 
> agent capabilities would also make sense: in a mixed cluster, the master 
> might have support for features that are not present on certain agents, and 
> vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6876) Default "Accept" type for LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_OUTPUT should be streaming type

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6876:
--
Sprint: Mesosphere Sprint 50

> Default "Accept" type for LAUNCH_NESTED_CONTAINER_SESSION and 
> ATTACH_CONTAINER_OUTPUT should be streaming type
> --
>
> Key: MESOS-6876
> URL: https://issues.apache.org/jira/browse/MESOS-6876
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>Priority: Blocker
>
> Right now the default "Accept" type in the HTTP response to 
> LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_OUTPUT is 
> "application/json". This should be instead "application/json+recordio" or 
> whatever we decide the streaming type should be in MESOS-3601.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830997#comment-15830997
 ] 

Adam B commented on MESOS-6553:
---

Can you (or your shepherd, [~jieyu]?) add the commits and close with 
appropriate fixVersion?

> Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` 
> to launcher->fork()`
> 
>
> Key: MESOS-6553
> URL: https://issues.apache.org/jira/browse/MESOS-6553
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>  Labels: tech-debt
>
> Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of 
> our isolators and extract information from them, which we pass one by one to 
> our {{launcher->fork()}} call in separate parameters.
> Instead, we should construct a new {{ContainerLaunchInfo}} which is the 
> concatenation of the ones returned by each isolator, and pass this new one 
> down to {{launcher->fork()}} instead of building up individual arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6639) Update 'io::redirect()' to take an optional vector of callback hooks.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830996#comment-15830996
 ] 

Adam B commented on MESOS-6639:
---

[~jieyu] Want to close this with FixVersion=1.2.0 and the appropriate commit(s) 
in a comment?

> Update 'io::redirect()' to take an optional vector of callback hooks.
> -
>
> Key: MESOS-6639
> URL: https://issues.apache.org/jira/browse/MESOS-6639
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>
> These callback hooks should be invoked before passing any data read from
> the 'from' file descriptor on to the 'to' file descriptor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6714) Port `slave_tests.cpp`

2017-01-19 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830972#comment-15830972
 ] 

Joseph Wu commented on MESOS-6714:
--

Some progress: 
{code}
commit d56139556ae41d3f47fb5b391e071d409832edb9
Author: Alex Clemmer 
Date:   Wed Jan 18 14:49:24 2017 -0800

Windows: Added more agent tests.

These tests can pass with some minor scripting changes
(changing the sleep command to a Windows compatible command)
and due fixing subprocess lifecycles with Job Objects.

Review: https://reviews.apache.org/r/55314/
{code}

> Port `slave_tests.cpp`
> --
>
> Key: MESOS-6714
> URL: https://issues.apache.org/jira/browse/MESOS-6714
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: microsoft, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6707) Port `gc_tests.cpp`

2017-01-19 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830967#comment-15830967
 ] 

Joseph Wu edited comment on MESOS-6707 at 1/20/17 1:17 AM:
---

{code}
commit 5b52217f34197a459fffe3c09be9167046be9df6
Author: Alex Clemmer 
Date:   Wed Jan 18 14:53:38 2017 -0800

Windows: Fixed hanging symlink bug in `os::rmdir`.

The Windows implementation of `os::rmdir` will fail to delete "hanging"
symlinks (i.e., symlinks whose targets do not exist). Note that on
Windows this bug is specific to symlinks whose targets are _deleted_,
since it is impossible to create a symlink whose target does not exist.

The primary issue that causes this problem is that it is very difficult
to tell whether a symlink points at a directory or a file unless you
resolve the symlink and determine whether the target is a directory or a
file. In situations where the target does not exist, we can't use this
information, and so `os::rmdir` occasionally mis-routes a symlink to
(what was) a directory to a `::remove` call, which will fail with a
cryptic error.

To fix this behavior, this commit will introduce code that simply tries
to remove the reparse point with both `RemoveDirectory` and
`DeleteFile`, and if either succeeds, we report success for the
operation. This represents a "best effort"; in the case that the reparse
point represents something more exotic than a symlink, we will still
fail, but by choosing not to verify whether the target is a directory or
a file, we simplify the code and still obtain the outcome of having
deleted the directory.

This commit is the primary blocker for MESOS-6707, as deleting the Agent
sandbox will sometimes cause us to delete the latest run directory for
the executor before the symlinked `latest` directory itself. This causes
the delete to fail, and then the GC tests to fail, since they tend to
assert the directory does not exist.

Review: https://reviews.apache.org/r/55327/
{code}
{code}
commit 08e5cd2580a142977b2d8a3abf2a70a398147f01
Author: Alex Clemmer 
Date:   Wed Jan 18 14:59:17 2017 -0800

Windows: Added GC tests to the build.

These tests are fixed by the fix to `os::rmdir` in review #55327.
The tests were failing to delete sandbox folders when the sandbox
was deleted before deleting the symlink to the sandbox.

Review: https://reviews.apache.org/r/55328/
{code}


was (Author: kaysoky):
{code}
commit 08e5cd2580a142977b2d8a3abf2a70a398147f01
Author: Alex Clemmer 
Date:   Wed Jan 18 14:59:17 2017 -0800

Windows: Added GC tests to the build.

These tests are fixed by the fix to `os::rmdir` in review #55327.
The tests were failing to delete sandbox folders when the sandbox
was deleted before deleting the symlink to the sandbox.

Review: https://reviews.apache.org/r/55328/
{code}

> Port `gc_tests.cpp`
> ---
>
> Key: MESOS-6707
> URL: https://issues.apache.org/jira/browse/MESOS-6707
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: microsoft, windows-mvp
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6357:
--
Target Version/s:   (was: 1.1.1, 1.2.0)

> `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.
> 
>
> Key: MESOS-6357
> URL: https://issues.apache.org/jira/browse/MESOS-6357
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.1.0
> Environment: Debian 8 with SSL enabled
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: flaky-test
>
> {noformat}
> [00:21:51] :   [Step 10/10] [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.357839 23530 
> containerizer.cpp:202] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.361143 23530 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.366930 23547 
> containerizer.cpp:557] Recovering containerizer
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.367962 23551 provisioner.cpp:253] 
> Provisioner recovery complete
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.368253 23549 
> containerizer.cpp:954] Starting container 
> 42589936-56b2-4e41-86d8-447bfaba4666 for executor 'executor' of framework 
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.368577 23548 cgroups.cpp:404] 
> Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_458f8018-67e7-4cc6-8126-a535974db35d/42589936-56b2-4e41-86d8-447bfaba4666'
>  for container 42589936-56b2-4e41-86d8-447bfaba4666
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.369863 23544 cpu.cpp:103] Updated 
> 'cpu.shares' to 1024 (cpus 1) for container 
> 42589936-56b2-4e41-86d8-447bfaba4666
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.370384 23545 
> containerizer.cpp:1443] Launching 'mesos-containerizer' with flags 
> '--command="{"shell":true,"value":"read key <&30"}" --help="false" 
> --pipe_read="30" --pipe_write="34" 
> --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
> --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666"
>  --unshare_namespace_mnt="false" 
> --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0"'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.370483 23544 
> linux_launcher.cpp:421] Launching container 
> 42589936-56b2-4e41-86d8-447bfaba4666 and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.374867 23545 
> containerizer.cpp:1480] Checkpointing container's forked pid 14139 to 
> '/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_gzjeKG/meta/slaves/frameworks/executors/executor/runs/42589936-56b2-4e41-86d8-447bfaba4666/pids/forked.pid'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.376519 23551 
> containerizer.cpp:1648] Starting nested container 
> 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.377296 23549 
> containerizer.cpp:1443] Launching 'mesos-containerizer' with flags 
> '--command="{"shell":true,"value":"sleep 1000"}" --help="false" 
> --pipe_read="30" --pipe_write="34" 
> --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
> --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"
>  --unshare_namespace_mnt="false" 
> --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.377424 23548 
> linux_launcher.cpp:421] Launching nested container 
> 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8 and 
> cloning with namespaces CLONE_NEWNS | CLONE_NEWPID
> [00:21:51] :   [Step 10/10] Executing pre-exec command 
> 

[jira] [Commented] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830960#comment-15830960
 ] 

Adam B commented on MESOS-6357:
---

No progress in over a month. Dropping from the 1.2 release until somebody 
updates otherwise.

> `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.
> 
>
> Key: MESOS-6357
> URL: https://issues.apache.org/jira/browse/MESOS-6357
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.1.0
> Environment: Debian 8 with SSL enabled
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: flaky-test
>
> {noformat}
> [00:21:51] :   [Step 10/10] [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.357839 23530 
> containerizer.cpp:202] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.361143 23530 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.366930 23547 
> containerizer.cpp:557] Recovering containerizer
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.367962 23551 provisioner.cpp:253] 
> Provisioner recovery complete
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.368253 23549 
> containerizer.cpp:954] Starting container 
> 42589936-56b2-4e41-86d8-447bfaba4666 for executor 'executor' of framework 
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.368577 23548 cgroups.cpp:404] 
> Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_458f8018-67e7-4cc6-8126-a535974db35d/42589936-56b2-4e41-86d8-447bfaba4666'
>  for container 42589936-56b2-4e41-86d8-447bfaba4666
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.369863 23544 cpu.cpp:103] Updated 
> 'cpu.shares' to 1024 (cpus 1) for container 
> 42589936-56b2-4e41-86d8-447bfaba4666
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.370384 23545 
> containerizer.cpp:1443] Launching 'mesos-containerizer' with flags 
> '--command="{"shell":true,"value":"read key <&30"}" --help="false" 
> --pipe_read="30" --pipe_write="34" 
> --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
> --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666"
>  --unshare_namespace_mnt="false" 
> --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0"'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.370483 23544 
> linux_launcher.cpp:421] Launching container 
> 42589936-56b2-4e41-86d8-447bfaba4666 and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.374867 23545 
> containerizer.cpp:1480] Checkpointing container's forked pid 14139 to 
> '/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_gzjeKG/meta/slaves/frameworks/executors/executor/runs/42589936-56b2-4e41-86d8-447bfaba4666/pids/forked.pid'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.376519 23551 
> containerizer.cpp:1648] Starting nested container 
> 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.377296 23549 
> containerizer.cpp:1443] Launching 'mesos-containerizer' with flags 
> '--command="{"shell":true,"value":"sleep 1000"}" --help="false" 
> --pipe_read="30" --pipe_write="34" 
> --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
> --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"
>  --unshare_namespace_mnt="false" 
> --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.377424 23548 
> linux_launcher.cpp:421] Launching nested container 
> 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8 and 
> cloning with namespaces CLONE_NEWNS | CLONE_NEWPID
> [00:21:51] :   [Step 10/10] Executing pre-exec command 
> 

[jira] [Commented] (MESOS-6339) Support docker registry that requires basic auth.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830956#comment-15830956
 ] 

Adam B commented on MESOS-6339:
---

If this isn't In Progress yet, I doubt it'll land in time for 1.2, right? Shall 
we drop/defer it?

> Support docker registry that requires basic auth.
> -
>
> Key: MESOS-6339
> URL: https://issues.apache.org/jira/browse/MESOS-6339
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Gilbert Song
>
> Currently, we assume Bearer auth (in Mesos containerizer) because it's what 
> docker hub uses. We also need to support basic auth for some private registry 
> that people deploys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6506) Show framework info in /state and /frameworks for frameworks that have orphan tasks

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6506:
--
Target Version/s:   (was: 1.2.0)

> Show framework info in /state and /frameworks for frameworks that have orphan 
> tasks
> ---
>
> Key: MESOS-6506
> URL: https://issues.apache.org/jira/browse/MESOS-6506
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> Since Mesos 1.0, the master has access to FrameworkInfo of frameworks that 
> have orphan tasks. So we could expose this information in /state and 
> /frameworks endpoints. Note that this information is already present in the 
> v1 operator API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`

2017-01-19 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830954#comment-15830954
 ] 

Kevin Klues commented on MESOS-6553:


This one is already completed too.




> Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` 
> to launcher->fork()`
> 
>
> Key: MESOS-6553
> URL: https://issues.apache.org/jira/browse/MESOS-6553
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>  Labels: tech-debt
>
> Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of 
> our isolators and extract information from them, which we pass one by one to 
> our {{launcher->fork()}} call in separate parameters.
> Instead, we should construct a new {{ContainerLaunchInfo}} which is the 
> concatenation of the ones returned by each isolator, and pass this new one 
> down to {{launcher->fork()}} instead of building up individual arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6506) Show framework info in /state and /frameworks for frameworks that have orphan tasks

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830953#comment-15830953
 ] 

Adam B commented on MESOS-6506:
---

No progress in over a month. Dropping from the 1.2 release until somebody 
updates otherwise.

> Show framework info in /state and /frameworks for frameworks that have orphan 
> tasks
> ---
>
> Key: MESOS-6506
> URL: https://issues.apache.org/jira/browse/MESOS-6506
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> Since Mesos 1.0, the master has access to FrameworkInfo of frameworks that 
> have orphan tasks. So we could expose this information in /state and 
> /frameworks endpoints. Note that this information is already present in the 
> v1 operator API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6843) Fetcher should not assume stdout/stderr in the sandbox.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6843:
--
Target Version/s: 1.3.0  (was: 1.2.0)

> Fetcher should not assume stdout/stderr in the sandbox.
> ---
>
> Key: MESOS-6843
> URL: https://issues.apache.org/jira/browse/MESOS-6843
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Jie Yu
>Priority: Critical
>  Labels: mesosphere
>
> If container logger is used, this assumption might not be true. For instance, 
> a journald logger might redirect all task logs to journald. So in theory, the 
> fetcher log should go to journald as well, rather than writing to 
> sandbox/stdout and sandbox/stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6035) Add non-recursive version of cgroups::get

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6035:
--
Target Version/s:   (was: 1.2.0)

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6540:
--
Target Version/s:   (was: 1.2.0)

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> In order to do this properly, we should pull the "init" process out of the 
> container and update 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830935#comment-15830935
 ] 

Adam B commented on MESOS-6540:
---

No progress in a month. Dropping from the 1.2 release until somebody updates 
otherwise.

> Pass the forked pid from `containerizer launch` to the agent and checkpoint 
> it.
> ---
>
> Key: MESOS-6540
> URL: https://issues.apache.org/jira/browse/MESOS-6540
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Right now the agent only knows about the pid of the "init" process forked by 
> {{launcher->fork()}}. However, in order to properly enter the namespaces of a 
> task for a nested container, we actually need the pid of the process that 
> gets launched by the {{containerizer launch}} binary.
> Using this pid, isolators can properly enter the namespaces of the actual 
> *task* or *executor* launched by the {{containerizer launch}} binary instead 
> of just the namespaces of the "init" process (which may be different).
> In order to do this properly, we should pull the "init" process out of the 
> container and update 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6639) Update 'io::redirect()' to take an optional vector of callback hooks.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830934#comment-15830934
 ] 

Adam B commented on MESOS-6639:
---

No progress in a month. Dropping from the 1.2 release until somebody updates 
otherwise.

> Update 'io::redirect()' to take an optional vector of callback hooks.
> -
>
> Key: MESOS-6639
> URL: https://issues.apache.org/jira/browse/MESOS-6639
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>
> These callback hooks should be invoked before passing any data read from
> the 'from' file descriptor on to the 'to' file descriptor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6667) Update vendored ZooKeeper to 3.4.9

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6667:
--
Target Version/s: 1.3.0  (was: 1.2.0)

> Update vendored ZooKeeper to 3.4.9
> --
>
> Key: MESOS-6667
> URL: https://issues.apache.org/jira/browse/MESOS-6667
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: mesosphere
>
> 3.4.9 has a few notable fixes for the C client library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6553:
--
Target Version/s:   (was: 1.2.0)

> Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` 
> to launcher->fork()`
> 
>
> Key: MESOS-6553
> URL: https://issues.apache.org/jira/browse/MESOS-6553
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>  Labels: tech-debt
>
> Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of 
> our isolators and extract information from them, which we pass one by one to 
> our {{launcher->fork()}} call in separate parameters.
> Instead, we should construct a new {{ContainerLaunchInfo}} which is the 
> concatenation of the ones returned by each isolator, and pass this new one 
> down to {{launcher->fork()}} instead of building up individual arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6542) Pull the current "init" process for a container out of the container.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6542:
--
Target Version/s:   (was: 1.2.0)

> Pull the current "init" process for a container out of the container.
> -
>
> Key: MESOS-6542
> URL: https://issues.apache.org/jira/browse/MESOS-6542
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>
> Currently the mesos agent is in control of the "init" process launched inside 
> of a container. However, in order to properly support things like 
> systemd-in-a-container, we need to allow users to control the init process 
> that ultimately gets launched.
> We will still need to fork a process equivalent to the current "init" 
> process, but it shouldn't be placed inside the container itself (instead, it 
> should be the parent process of whatever init process it is directed to 
> launch).
> In order to do this properly, we will need to rework some of the logic in 
> {{launcher->fork()}} to allow this new parent process to do the namespace 
> entering / cloning instead of {{launcher->fork()}} itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830923#comment-15830923
 ] 

Adam B commented on MESOS-6743:
---

No progress in a month. Dropping from the 1.2 release until somebody updates 
otherwise.

> Docker executor hangs forever if `docker stop` fails.
> -
>
> Key: MESOS-6743
> URL: https://issues.apache.org/jira/browse/MESOS-6743
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.0.1, 1.1.0
>Reporter: Alexander Rukletsov
>  Labels: mesosphere
>
> If {{docker stop}} finishes with an error status, the executor should catch 
> this and react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
> and what to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
> However, in this case it is unclear what status updates we should send: 
> {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill 
> a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is 
> killed or notify the framework and the operator that the container may still 
> be running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6683) Return error from recordio::Reader if data is still buffered when EOF reached.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830925#comment-15830925
 ] 

Adam B commented on MESOS-6683:
---

No progress in a month. Dropping from the 1.2 release until somebody updates 
otherwise.

> Return error from recordio::Reader if data is still buffered when EOF reached.
> --
>
> Key: MESOS-6683
> URL: https://issues.apache.org/jira/browse/MESOS-6683
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Anand Mazumdar
>  Labels: bug, mesosphere
>
> Right now, whenever EOF is reached a {{None()}} is returned to indicate that 
> no more records will be read.
> However, we should only return {{None()}} if we reach EOF and there are no 
> bytes in the readers internal data buffer. If there are bytes in the buffer, 
> that indicates that a *partial* record has been read, but EOF was reached 
> before reading a full record. We should return an error in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6622) NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage is flaky

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6622:
--
Target Version/s:   (was: 1.2.0)

> NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage is flaky
> --
>
> Key: MESOS-6622
> URL: https://issues.apache.org/jira/browse/MESOS-6622
> Project: Mesos
>  Issue Type: Bug
>  Components: flaky, tests
>Affects Versions: 1.1.0
>Reporter: Joseph Wu
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: mesosphere, newbie
> Attachments: gpu-test.log
>
>
> This test occasionally times out after one minute:
> {code}
> I1122 02:07:25.721348  2328 slave.cpp:4263] Received ping from 
> slave-observer(563)@172.16.10.39:45772
> I1122 02:07:25.728559  2324 slave.cpp:5122] Terminating executor 
> ''b5a3a115-27da-4b81-902e-b99602f902a6' of framework 
> 42a4cb0e-aea9-4b9d-8bab-3279ee5a7b8b-' because it did not register within 
> 1mins
> I1122 02:07:25.728667  2330 containerizer.cpp:2038] Destroying container 
> b4711187-157c-421e-a6d9-9fa32a6e263c in PROVISIONING state
> I1122 02:07:25.728734  2330 containerizer.cpp:2093] Waiting for the 
> provisioner to complete provisioning before destroying container 
> b4711187-157c-421e-a6d9-9fa32a6e263c
> {code}
> The test itself has a future that waits for 2 minutes for the executor to 
> start up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6641) Remove deprecated hooks from our module API.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6641:
--
Target Version/s: 1.3.0  (was: 1.2.0)

> Remove deprecated hooks from our module API.
> 
>
> Key: MESOS-6641
> URL: https://issues.apache.org/jira/browse/MESOS-6641
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Till Toenshoff
>Priority: Minor
>  Labels: deprecation, hooks, tech-debt
>
> By now we have at least one deprecated hook in our modules API which is 
> {{slavePreLaunchDockerHook}}. 
> There is a new one coming in now which is deprecating 
> {{slavePreLaunchDockerEnvironmentDecorator}}.
> We need to actually remove those deprecations while making the community 
> aware - this ticket is meant for tracking this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6827) Fix the order in which "self.hpp" is included in "self.cpp".

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6827:
--
Target Version/s:   (was: 1.2.0)

> Fix the order in which "self.hpp" is included in "self.cpp".
> 
>
> Key: MESOS-6827
> URL: https://issues.apache.org/jira/browse/MESOS-6827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: newbie
>
> According to our 
> [styleguide|https://github.com/apache/mesos/blob/master/docs/c%2B%2B-style-guide.md#order-of-includes],
>  each {{.cpp}} file should include the related {{.hpp}} first to ensure that 
> a header file always includes all symbols it requires. However, our codebase 
> does not follow this rule strictly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6959) Separate the mesos-containerizer binary into a static binary, which only depends on stout

2017-01-19 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6959:
-
Component/s: (was: ke)

> Separate the mesos-containerizer binary into a static binary, which only 
> depends on stout
> -
>
> Key: MESOS-6959
> URL: https://issues.apache.org/jira/browse/MESOS-6959
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Joseph Wu
>  Labels: cmake, mesosphere, microsoft
>
> The {{mesos-containerizer}} binary currently has [three 
> commands|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/main.cpp#L46-L48]:
> * 
> [MesosContainerizerLaunch|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/launch.cpp]
> * 
> [MesosContainerizerMount|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/mount.cpp]
> * 
> [NetworkCniIsolatorSetup|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1776-L1997]
> These commands are all heavily dependent on stout, and have no need to be 
> linked to libprocess.  In fact, adding an erroneous call to 
> {{process::initialize}} (either explicitly, or by accidentally using a 
> libprocess method) will break {{mesos-containerizer}} can cause several Mesos 
> containerizer tests to fail.  (The tasks fail to launch, saying {{Failed to 
> synchronize with agent (it's probably exited)}}).
> Because this binary only depends on stout, we can separate it from the other 
> source files and make this a static binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6959) Separate the mesos-containerizer binary into a static binary, which only depends on stout

2017-01-19 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-6959:


 Summary: Separate the mesos-containerizer binary into a static 
binary, which only depends on stout
 Key: MESOS-6959
 URL: https://issues.apache.org/jira/browse/MESOS-6959
 Project: Mesos
  Issue Type: Task
  Components: ke, cmake
Reporter: Joseph Wu


The {{mesos-containerizer}} binary currently has [three 
commands|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/main.cpp#L46-L48]:

* 
[MesosContainerizerLaunch|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/launch.cpp]
* 
[MesosContainerizerMount|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/mount.cpp]
* 
[NetworkCniIsolatorSetup|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1776-L1997]

These commands are all heavily dependent on stout, and have no need to be 
linked to libprocess.  In fact, adding an erroneous call to 
{{process::initialize}} (either explicitly, or by accidentally using a 
libprocess method) will break {{mesos-containerizer}} can cause several Mesos 
containerizer tests to fail.  (The tasks fail to launch, saying {{Failed to 
synchronize with agent (it's probably exited)}}).

Because this binary only depends on stout, we can separate it from the other 
source files and make this a static binary.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3542) Separate libmesos into compiling from many binaries.

2017-01-19 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3542:
-
Epic Name: lib-breakdown

> Separate libmesos into compiling from many binaries.
> 
>
> Key: MESOS-3542
> URL: https://issues.apache.org/jira/browse/MESOS-3542
> Project: Mesos
>  Issue Type: Epic
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: cmake, mesosphere, microsoft, windows-mvp
>
> Historically libmesos is built as a huge monolithic binary. Another idea 
> would be to build it from a bunch of smaller libraries (_e.g._, libagent, 
> _etc_.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3542) Separate libmesos into compiling from many binaries.

2017-01-19 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3542:
-
Issue Type: Epic  (was: Task)

> Separate libmesos into compiling from many binaries.
> 
>
> Key: MESOS-3542
> URL: https://issues.apache.org/jira/browse/MESOS-3542
> Project: Mesos
>  Issue Type: Epic
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: cmake, mesosphere, microsoft, windows-mvp
>
> Historically libmesos is built as a huge monolithic binary. Another idea 
> would be to build it from a bunch of smaller libraries (_e.g._, libagent, 
> _etc_.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6858) network/cni isolator generates incomplete resolv.conf

2017-01-19 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-6858:
--

Assignee: James Peach

> network/cni isolator generates incomplete resolv.conf
> -
>
> Key: MESOS-6858
> URL: https://issues.apache.org/jira/browse/MESOS-6858
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network
>Reporter: James Peach
>Assignee: James Peach
>
> The CNI [network 
> configuration|https://github.com/containernetworking/cni/blob/master/SPEC.md#network-configuration]
>  dictionary contains entries for the {{/etc/resolv.conf}}, {{nameservers}}, 
> {{domain}}, {{search}} and {{options}} fields.
> In {{NetworkCniIsolatorProcess::_isolate()}}, the {{network/cni}} isolator 
> only emits the {{nameservers}} and ignores the remaining fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6654:

Priority: Critical  (was: Blocker)

> Duplicate image layer ids may make the backend failed to mount rootfs.
> --
>
> Key: MESOS-6654
> URL: https://issues.apache.org/jira/browse/MESOS-6654
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: aufs, backend, containerizer
>
> Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in 
> manifest, which may cause some backends unable to mount the rootfs (e.g., 
> 'aufs' backend). We should make sure that each layer path returned in 
> 'ImageInfo' is unique.
> Here is an example manifest from 'mesosphere/inky':
> {noformat}
> [20:13:08]W:   [Step 10/10]"name": "mesosphere/inky",
> [20:13:08]W:   [Step 10/10]"tag": "latest",
> [20:13:08]W:   [Step 10/10]"architecture": "amd64",
> [20:13:08]W:   [Step 10/10]"fsLayers": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   }
> [20:13:08]W:   [Step 10/10]],
> [20:13:08]W:   [Step 10/10]"history": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop)
>  ENTRYPOINT 
> [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> 

[jira] [Updated] (MESOS-6958) Support linux filesystem type detection.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6958:

Priority: Critical  (was: Blocker)

> Support linux filesystem type detection.
> 
>
> Key: MESOS-6958
> URL: https://issues.apache.org/jira/browse/MESOS-6958
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: filesystem, linux
>
> We should support detecting a linux filesystem type (e.g., xfs, extfs) and 
> its filesystem id mapping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6653:

Priority: Critical  (was: Blocker)

> Overlayfs backend may fail to mount the rootfs if both container image and 
> image volume are specified.
> --
>
> Key: MESOS-6653
> URL: https://issues.apache.org/jira/browse/MESOS-6653
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: backend, containerizer, overlayfs
>
> Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting 
> arguments. However, if more than one image need to be provisioned (e.g., a 
> container image is specified while image volumes are specified for the same 
> container), the symlink .../backends/overlay/links would fail to be created 
> since it exists already.
> Here is a simple log when we hard code overlayfs as our default backend:
> {noformat}
> [07:02:45] :   [Step 10/10] [ RUN  ] 
> Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0
> [07:02:46] :   [Step 10/10] I1127 07:02:46.416021  2919 
> containerizer.cpp:207] Using isolation: 
> filesystem/linux,volume/image,docker/runtime,network/cni
> [07:02:46] :   [Step 10/10] I1127 07:02:46.419312  2919 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [07:02:46] :   [Step 10/10] E1127 07:02:46.425336  2919 shell.hpp:107] 
> Command 'hadoop version 2>&1' failed; this is the output:
> [07:02:46] :   [Step 10/10] sh: 1: hadoop: not found
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425379  2919 fetcher.cpp:69] 
> Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to 
> create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
> either not found or exited with a non-zero exit status: 127
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425452  2919 local_puller.cpp:94] 
> Creating local puller with docker registry '/tmp/R6OUei/registry'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427258  2934 
> containerizer.cpp:956] Starting container 
> 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of 
> framework 
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427592  2938 
> metadata_manager.cpp:167] Looking for image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427774  2936 local_puller.cpp:147] 
> Untarring image 'test_image_rootfs' from 
> '/tmp/R6OUei/registry/test_image_rootfs.tar' to 
> '/tmp/R6OUei/store/staging/9krDz2'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512070  2933 local_puller.cpp:167] 
> The repositories JSON file for image 'test_image_rootfs' is 
> '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512279  2933 local_puller.cpp:295] 
> Extracting layer tar ball 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar
>  to rootfs 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617442  2937 
> metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617908  2938 provisioner.cpp:286] 
> Image layers: 1
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617925  2938 provisioner.cpp:296] 
> Should hit here
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617949  2938 provisioner.cpp:315] 
> : bind
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617959  2938 provisioner.cpp:315] 
> : overlay
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617967  2938 provisioner.cpp:315] 
> : copy
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617974  2938 provisioner.cpp:318] 
> Provisioning image rootfs 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7'
>  for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618408  2936 overlay.cpp:175] 
> Created symlink 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links'
>  -> '/tmp/DQ3blT'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618472  2936 overlay.cpp:203] 
> Provisioning image rootfs with overlayfs: 
> 

[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6001:

Priority: Critical  (was: Blocker)

> Aufs backend cannot support the image with numerous layers.
> ---
>
> Key: MESOS-6001
> URL: https://issues.apache.org/jira/browse/MESOS-6001
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any other os with aufs module
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: aufs, backend, containerizer
>
> This issue was exposed in this unit test 
> `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually 
> specifying the `bind` backend. Most likely mounting the aufs with specific 
> options is limited by string length.
> {noformat}
> [20:13:07] :   [Step 10/10] [ RUN  ] 
> DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] 
> Opened db in 8.148813ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] 
> Compacted db in 3.126629ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] 
> Created db iterator in 4410ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] 
> Seeked to beginning of db in 763ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 491ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] 
> Starting replica recovery
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] 
> Replica is in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5852)@172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] 
> Updating replica status to STARTING
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] 
> Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) 
> started on 172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs"
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/OZHDIQ/credentials'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629114 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 

[jira] [Updated] (MESOS-6913) AgentAPIStreamingTest.AttachInputToNestedContainerSession fails on Mac OS.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6913:
--
Fix Version/s: 1.3.0

> AgentAPIStreamingTest.AttachInputToNestedContainerSession fails on Mac OS.
> --
>
> Key: MESOS-6913
> URL: https://issues.apache.org/jira/browse/MESOS-6913
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: Mac OS 10.11.6 with Apple clang-703.0.31
>Reporter: Alexander Rukletsov
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] 
> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
> make[3]: *** [check-local] Illegal instruction: 4
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6954) Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent

2017-01-19 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-6954:
--

Assignee: Kevin Klues

> Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent
> 
>
> Key: MESOS-6954
> URL: https://issues.apache.org/jira/browse/MESOS-6954
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: debugging, mesosphere
>
> Attempting to run {{LAUNCH_NESTED_CONTAINER}} with a parent container that 
> was launched with the docker containerizer causes the agent to crash as 
> below. We should add a safeguard in the handler to fail gracefully instead.
> {noformat}
> I0119 21:41:42.438295  3281 http.cpp:304] HTTP POST for /slave(1)/api/v1 from 
> 10.0.7.194:46700 with User-Agent='python-requests/2.12.4' with 
> X-Forwarded-For='10.0.6.162'
> I0119 21:41:42.441571  3281 http.cpp:465] Processing call 
> LAUNCH_NESTED_CONTAINER_SESSION
> W0119 21:41:42.442286  3281 http.cpp:2251] Failed to launch nested container 
> 62a16556-9c3b-48f2-aa1e-ba1d70093637.09a9d3b0-a245-4aa1-94f1-d10a13526b9b: 
> Unsupported
> F0119 21:41:42.442371  3282 docker.cpp:2013] Check failed: 
> !containerId.has_parent()
> *** Check failure stack trace: ***
> @ 0x7f539aca01ad  google::LogMessage::Fail()
> @ 0x7f539aca1fdd  google::LogMessage::SendToLog()
> @ 0x7f539ac9fd9c  google::LogMessage::Flush()
> @ 0x7f539aca28d9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f539a46e2cd  
> mesos::internal::slave::DockerContainerizerProcess::destroy()
> @ 0x7f539a48a8a7  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIbN5mesos8internal5slave26DockerContainerizerProcessERKNS5_11ContainerIDEbS9_bEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSG_FSE_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f539ac14ca1  process::ProcessManager::resume()
> @ 0x7f539ac1dba7  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f53990a5d73  (unknown)
> @ 0x7f5398ba652c  (unknown)
> @ 0x7f53988e41dd  (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-19 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6780:
---
Target Version/s:   (was: 1.2.0)

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled
> I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master!
> I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar
> I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 4.131072ms
> I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations in 
> 27us; attempting to update the registry
> I1212 17:11:12.421799 14143488 registrar.cpp:506] Successfully updated the 
> registry in 4.10496ms
> I1212 17:11:12.421835 14143488 registrar.cpp:392] Successfully recovered 
> registrar
> I1212 17:11:12.421998 17362944 master.cpp:1684] Recovered 0 agents from the 
> registry (136B); allowing 10mins for agents to re-register
> I1212 17:11:12.422780 3971208128 containerizer.cpp:220] Using isolation: 
> 

[jira] [Comment Edited] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-19 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830838#comment-15830838
 ] 

Kevin Klues edited comment on MESOS-6780 at 1/19/17 11:44 PM:
--

I'm changing this bug to critical rather than blocker for 1.2 because:

1) I'm 99% percent positive this is a test bug, not an actual API bug

2) If it is an API bug, it should only affect top-level containers launched 
with a {{tty}} and then to attached to via the {{ATTACH_CONTAINER_OUTPUT}} and 
{{ATTACH_CONTAINER_INPUT}} agent API calls. We don't have any external tooling 
that exercises these paths at the moment, so these APIs will mostly go unused 
in this release.


was (Author: klueska):
I'm changing this bug to critical rather than blocker for 1.2 because:

1) I'm 99% percent positive this is a test bug, not an actual API bug

2) If it is an API bug, it should only affect top-level containers launched 
with a {{tty}} and then to attached to via the {{ATTACH_CONTAINER_OUTPUT}} and 
{[ATTACH_CONTAINER_INPUT}} agent API calls. We don't have any external tooling 
that exercises these paths at the moment, so these APIs will mostly go unused 
in this release.

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 

[jira] [Updated] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably

2017-01-19 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6780:
---
Priority: Critical  (was: Blocker)

I'm changing this bug to critical rather than blocker for 1.2 because:

1) I'm 99% percent positive this is a test bug, not an actual API bug

2) If it is an API bug, it should only affect top-level containers launched 
with a {{tty}} and then to attached to via the {{ATTACH_CONTAINER_OUTPUT}} and 
{[ATTACH_CONTAINER_INPUT}} agent API calls. We don't have any external tooling 
that exercises these paths at the moment, so these APIs will mostly go unused 
in this release.

> ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
> --
>
> Key: MESOS-6780
> URL: https://issues.apache.org/jira/browse/MESOS-6780
> Project: Mesos
>  Issue Type: Bug
> Environment: Mac OS 10.12, clang version 4.0.0 
> (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) 
> (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), 
> libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46
>Reporter: Benjamin Bannier
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: mesosphere
> Attachments: attach_container_input_no_ssl.log
>
>
> The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both 
> {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized 
> build.
> {code}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ContentType/AgentAPIStreamingTest
> [ RUN  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' 
> authorizer
> I1212 17:11:12.393844 17362944 master.cpp:380] Master 
> c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on 
> 172.18.8.114:51059
> I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials"
>  --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master"
>  --zk_session_timeout="10secs"
> I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials'
> I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL
> I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled
> I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master!
> I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar
> I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the 
> registry (0B) in 4.131072ms
> I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations 

[jira] [Updated] (MESOS-6948) AgentAPITest.LaunchNestedContainerSession is flaky

2017-01-19 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6948:
---
Priority: Critical  (was: Blocker)

I'm moving this to a Critical bug rather than a blocker because:

1) It only happens very rarely ([~greggomann] can get it to trigger 
periodically inside his CentOS vagrant image, but no where else)

2) We haven't seen it manifest in practice with the CLI tool we built around 
these APIs (e.g. I have no problem doing a quick `dcos task exec  printf 
output` and getting the output back).

3) Even if there is an error in the wild, it's very rare and only happens at 
connection time. After the connection is established, thing should run smoothly.

> AgentAPITest.LaunchNestedContainerSession is flaky
> --
>
> Key: MESOS-6948
> URL: https://issues.apache.org/jira/browse/MESOS-6948
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: CentOS 7 VM, libevent and SSL enabled
>Reporter: Greg Mann
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: debugging, tests
> Attachments: AgentAPITest.LaunchNestedContainerSession.txt
>
>
> This was observed in a CentOS 7 VM, with libevent and SSL enabled:
> {code}
> I0118 22:17:23.528846  2887 http.cpp:464] Processing call 
> LAUNCH_NESTED_CONTAINER_SESSION
> I0118 22:17:23.530452  2887 containerizer.cpp:1807] Starting nested container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.532265  2887 containerizer.cpp:1831] Trying to chown 
> '/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e'
>  to user 'vagrant'
> I0118 22:17:23.535213  2887 switchboard.cpp:570] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430"
>  --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" 
> --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.537210  2887 switchboard.cpp:600] Created I/O switchboard 
> server (pid: 3335) listening on socket file 
> '/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430' for 
> container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.543665  2887 containerizer.cpp:1540] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"printf output && printf 
> error 
> 1>&2"},"environment":{},"err":{"fd":16,"type":"FD"},"in":{"fd":11,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"vagrant"}"
>  --pipe_read="12" --pipe_write="13" 
> --runtime_directory="/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_QVZGrY/containers/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e"
>  --unshare_namespace_mnt="false"'
> I0118 22:17:23.556032  2887 launcher.cpp:133] Forked child with pid '3337' 
> for container 
> '492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e'
> I0118 22:17:23.563900  2887 fetcher.cpp:349] Starting to fetch URIs for 
> container: 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e, 
> directory: 
> /tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.962441  2887 containerizer.cpp:2481] Container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e has 
> exited
> I0118 22:17:23.962484  2887 containerizer.cpp:2118] Destroying container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e in 
> RUNNING state
> I0118 22:17:23.962715  2887 launcher.cpp:149] Asked to destroy container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.977562  2887 process.cpp:3733] Failed to process request for 
> '/slave(69)/api/v1': Container has or is being destroyed
> W0118 22:17:23.978216  2887 http.cpp:2734] Failed to attach to nested 
> container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e: 
> Container has or is being destroyed
> I0118 22:17:23.978330  2887 process.cpp:1435] Returning '500 Internal 

[jira] [Updated] (MESOS-5931) Support auto backend in Unified Containerizer.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-5931:

Story Points: 8  (was: 3)

> Support auto backend in Unified Containerizer.
> --
>
> Key: MESOS-5931
> URL: https://issues.apache.org/jira/browse/MESOS-5931
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: backend, containerizer, mesosphere
>
> Currently in Unified Containerizer, copy backend will be selected by default. 
> This is not ideal, especially for production environment. It would take a 
> long time to prepare an huge container image to copy it from the store to 
> provisioner.
> Ideally, we should support `auto backend`, which would 
> automatically/intelligently select the best/optimal backend for image 
> provisioner if user does not specify one from the agent flag.
> We should have a logic design first in this ticket, to determine how we want 
> to choose the right backend (e.g., overlayfs or aufs should be preferred if 
> available from the kernel).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6653:

Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49)

> Overlayfs backend may fail to mount the rootfs if both container image and 
> image volume are specified.
> --
>
> Key: MESOS-6653
> URL: https://issues.apache.org/jira/browse/MESOS-6653
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: backend, containerizer, overlayfs
>
> Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting 
> arguments. However, if more than one image need to be provisioned (e.g., a 
> container image is specified while image volumes are specified for the same 
> container), the symlink .../backends/overlay/links would fail to be created 
> since it exists already.
> Here is a simple log when we hard code overlayfs as our default backend:
> {noformat}
> [07:02:45] :   [Step 10/10] [ RUN  ] 
> Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0
> [07:02:46] :   [Step 10/10] I1127 07:02:46.416021  2919 
> containerizer.cpp:207] Using isolation: 
> filesystem/linux,volume/image,docker/runtime,network/cni
> [07:02:46] :   [Step 10/10] I1127 07:02:46.419312  2919 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [07:02:46] :   [Step 10/10] E1127 07:02:46.425336  2919 shell.hpp:107] 
> Command 'hadoop version 2>&1' failed; this is the output:
> [07:02:46] :   [Step 10/10] sh: 1: hadoop: not found
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425379  2919 fetcher.cpp:69] 
> Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to 
> create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
> either not found or exited with a non-zero exit status: 127
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425452  2919 local_puller.cpp:94] 
> Creating local puller with docker registry '/tmp/R6OUei/registry'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427258  2934 
> containerizer.cpp:956] Starting container 
> 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of 
> framework 
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427592  2938 
> metadata_manager.cpp:167] Looking for image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427774  2936 local_puller.cpp:147] 
> Untarring image 'test_image_rootfs' from 
> '/tmp/R6OUei/registry/test_image_rootfs.tar' to 
> '/tmp/R6OUei/store/staging/9krDz2'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512070  2933 local_puller.cpp:167] 
> The repositories JSON file for image 'test_image_rootfs' is 
> '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512279  2933 local_puller.cpp:295] 
> Extracting layer tar ball 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar
>  to rootfs 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617442  2937 
> metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617908  2938 provisioner.cpp:286] 
> Image layers: 1
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617925  2938 provisioner.cpp:296] 
> Should hit here
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617949  2938 provisioner.cpp:315] 
> : bind
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617959  2938 provisioner.cpp:315] 
> : overlay
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617967  2938 provisioner.cpp:315] 
> : copy
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617974  2938 provisioner.cpp:318] 
> Provisioning image rootfs 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7'
>  for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618408  2936 overlay.cpp:175] 
> Created symlink 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links'
>  -> '/tmp/DQ3blT'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618472  2936 overlay.cpp:203] 
> Provisioning image rootfs with 

[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6001:

Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49)

> Aufs backend cannot support the image with numerous layers.
> ---
>
> Key: MESOS-6001
> URL: https://issues.apache.org/jira/browse/MESOS-6001
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any other os with aufs module
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: aufs, backend, containerizer
>
> This issue was exposed in this unit test 
> `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually 
> specifying the `bind` backend. Most likely mounting the aufs with specific 
> options is limited by string length.
> {noformat}
> [20:13:07] :   [Step 10/10] [ RUN  ] 
> DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] 
> Opened db in 8.148813ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] 
> Compacted db in 3.126629ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] 
> Created db iterator in 4410ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] 
> Seeked to beginning of db in 763ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 491ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] 
> Starting replica recovery
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] 
> Replica is in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5852)@172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] 
> Updating replica status to STARTING
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] 
> Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) 
> started on 172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs"
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/OZHDIQ/credentials'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:13:07]W:   

[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6654:

Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49)

> Duplicate image layer ids may make the backend failed to mount rootfs.
> --
>
> Key: MESOS-6654
> URL: https://issues.apache.org/jira/browse/MESOS-6654
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: aufs, backend, containerizer
>
> Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in 
> manifest, which may cause some backends unable to mount the rootfs (e.g., 
> 'aufs' backend). We should make sure that each layer path returned in 
> 'ImageInfo' is unique.
> Here is an example manifest from 'mesosphere/inky':
> {noformat}
> [20:13:08]W:   [Step 10/10]"name": "mesosphere/inky",
> [20:13:08]W:   [Step 10/10]"tag": "latest",
> [20:13:08]W:   [Step 10/10]"architecture": "amd64",
> [20:13:08]W:   [Step 10/10]"fsLayers": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   }
> [20:13:08]W:   [Step 10/10]],
> [20:13:08]W:   [Step 10/10]"history": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop)
>  ENTRYPOINT 
> [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> 

[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6001:

Priority: Blocker  (was: Major)

> Aufs backend cannot support the image with numerous layers.
> ---
>
> Key: MESOS-6001
> URL: https://issues.apache.org/jira/browse/MESOS-6001
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any other os with aufs module
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: aufs, backend, containerizer
>
> This issue was exposed in this unit test 
> `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually 
> specifying the `bind` backend. Most likely mounting the aufs with specific 
> options is limited by string length.
> {noformat}
> [20:13:07] :   [Step 10/10] [ RUN  ] 
> DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] 
> Opened db in 8.148813ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] 
> Compacted db in 3.126629ms
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] 
> Created db iterator in 4410ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] 
> Seeked to beginning of db in 763ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 491ns
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] 
> Starting replica recovery
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] 
> Replica is in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5852)@172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] 
> Updating replica status to STARTING
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] 
> Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) 
> started on 172.30.2.138:44256
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs"
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/OZHDIQ/credentials'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:13:07]W:   [Step 10/10] I0805 20:13:07.629114 23432 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 

[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6654:

Priority: Blocker  (was: Major)

> Duplicate image layer ids may make the backend failed to mount rootfs.
> --
>
> Key: MESOS-6654
> URL: https://issues.apache.org/jira/browse/MESOS-6654
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: aufs, backend, containerizer
>
> Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in 
> manifest, which may cause some backends unable to mount the rootfs (e.g., 
> 'aufs' backend). We should make sure that each layer path returned in 
> 'ImageInfo' is unique.
> Here is an example manifest from 'mesosphere/inky':
> {noformat}
> [20:13:08]W:   [Step 10/10]"name": "mesosphere/inky",
> [20:13:08]W:   [Step 10/10]"tag": "latest",
> [20:13:08]W:   [Step 10/10]"architecture": "amd64",
> [20:13:08]W:   [Step 10/10]"fsLayers": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "blobSum": 
> "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
> [20:13:08]W:   [Step 10/10]   }
> [20:13:08]W:   [Step 10/10]],
> [20:13:08]W:   [Step 10/10]"history": [
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop)
>  ENTRYPOINT 
> [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n"
> [20:13:08]W:   [Step 10/10]   },
> [20:13:08]W:   [Step 10/10]   {
> [20:13:08]W:   [Step 10/10]  "v1Compatibility": 
> 

[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6653:

Priority: Blocker  (was: Major)

> Overlayfs backend may fail to mount the rootfs if both container image and 
> image volume are specified.
> --
>
> Key: MESOS-6653
> URL: https://issues.apache.org/jira/browse/MESOS-6653
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: backend, containerizer, overlayfs
>
> Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting 
> arguments. However, if more than one image need to be provisioned (e.g., a 
> container image is specified while image volumes are specified for the same 
> container), the symlink .../backends/overlay/links would fail to be created 
> since it exists already.
> Here is a simple log when we hard code overlayfs as our default backend:
> {noformat}
> [07:02:45] :   [Step 10/10] [ RUN  ] 
> Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0
> [07:02:46] :   [Step 10/10] I1127 07:02:46.416021  2919 
> containerizer.cpp:207] Using isolation: 
> filesystem/linux,volume/image,docker/runtime,network/cni
> [07:02:46] :   [Step 10/10] I1127 07:02:46.419312  2919 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [07:02:46] :   [Step 10/10] E1127 07:02:46.425336  2919 shell.hpp:107] 
> Command 'hadoop version 2>&1' failed; this is the output:
> [07:02:46] :   [Step 10/10] sh: 1: hadoop: not found
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425379  2919 fetcher.cpp:69] 
> Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to 
> create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
> either not found or exited with a non-zero exit status: 127
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425452  2919 local_puller.cpp:94] 
> Creating local puller with docker registry '/tmp/R6OUei/registry'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427258  2934 
> containerizer.cpp:956] Starting container 
> 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of 
> framework 
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427592  2938 
> metadata_manager.cpp:167] Looking for image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427774  2936 local_puller.cpp:147] 
> Untarring image 'test_image_rootfs' from 
> '/tmp/R6OUei/registry/test_image_rootfs.tar' to 
> '/tmp/R6OUei/store/staging/9krDz2'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512070  2933 local_puller.cpp:167] 
> The repositories JSON file for image 'test_image_rootfs' is 
> '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512279  2933 local_puller.cpp:295] 
> Extracting layer tar ball 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar
>  to rootfs 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617442  2937 
> metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617908  2938 provisioner.cpp:286] 
> Image layers: 1
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617925  2938 provisioner.cpp:296] 
> Should hit here
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617949  2938 provisioner.cpp:315] 
> : bind
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617959  2938 provisioner.cpp:315] 
> : overlay
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617967  2938 provisioner.cpp:315] 
> : copy
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617974  2938 provisioner.cpp:318] 
> Provisioning image rootfs 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7'
>  for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618408  2936 overlay.cpp:175] 
> Created symlink 
> '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links'
>  -> '/tmp/DQ3blT'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.618472  2936 overlay.cpp:203] 
> Provisioning image rootfs with overlayfs: 
> 

[jira] [Updated] (MESOS-6504) Use 'geteuid()' for the root privileges check.

2017-01-19 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6504:

Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50  
(was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49)

> Use 'geteuid()' for the root privileges check.
> --
>
> Key: MESOS-6504
> URL: https://issues.apache.org/jira/browse/MESOS-6504
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: backend, isolator, mesosphere, user
>
> Currently, parts of code in Mesos check the root privileges using os::user() 
> to compare to "root", which is not sufficient, since it compares the real 
> user. When people change the mesos binary by 'setuid root', the process may 
> not have the right permission to execute.
> We should check the effective user id instead in our code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.

2017-01-19 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830821#comment-15830821
 ] 

Adam B commented on MESOS-6904:
---

Marking it "Critical" for 1.2 so it's not lost in the pool of "Major"s 
(default). We'll keep an eye on it, but won't hold the rc1 for it if all the 
real release-blockers are resolved. I can't imagine we'll be down to 0 Blockers 
before Tuesday.

> Perform batching of allocations to reduce allocator queue backlogging.
> --
>
> Key: MESOS-6904
> URL: https://issues.apache.org/jira/browse/MESOS-6904
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>Priority: Critical
>  Labels: allocator
>
> Per MESOS-3157:
> {quote}
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> {quote}
> To remedy the above scenario, it is proposed to perform batching of the 
> enqueued allocation operations so that a single allocation operation can 
> satisfy N enqueued allocations. This should reduce the potential for 
> backlogging in the allocator. See the discussion 
> [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
>  in MESOS-3157.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6958) Support linux filesystem type detection.

2017-01-19 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-6958:
---

 Summary: Support linux filesystem type detection.
 Key: MESOS-6958
 URL: https://issues.apache.org/jira/browse/MESOS-6958
 Project: Mesos
  Issue Type: Bug
Reporter: Gilbert Song
Assignee: Gilbert Song
Priority: Blocker


We should support detecting a linux filesystem type (e.g., xfs, extfs) and its 
filesystem id mapping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.

2017-01-19 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6904:
--
Priority: Critical  (was: Major)

> Perform batching of allocations to reduce allocator queue backlogging.
> --
>
> Key: MESOS-6904
> URL: https://issues.apache.org/jira/browse/MESOS-6904
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>Priority: Critical
>  Labels: allocator
>
> Per MESOS-3157:
> {quote}
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> {quote}
> To remedy the above scenario, it is proposed to perform batching of the 
> enqueued allocation operations so that a single allocation operation can 
> satisfy N enqueued allocations. This should reduce the potential for 
> backlogging in the allocator. See the discussion 
> [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
>  in MESOS-3157.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6900) Add test for framework upgrading to multi-role capability.

2017-01-19 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830754#comment-15830754
 ] 

Benjamin Mahler commented on MESOS-6900:


{noformat}
commit 052fb4414e2cce2b550ce0644f039b6d4a1876fa
Author: Benjamin Bannier 
Date:   Thu Jan 19 14:25:48 2017 -0800

Added a test for framework upgrading to MULTI_ROLE capability.

Review: https://reviews.apache.org/r/55381/
{noformat}

[~bbannier] do you want to add another test that ensures that frameworks can 
upgrade even when tasks are running, and that new tasks can be launched? We can 
do this in a separate ticket as we get closer to having a working 
implementation.

> Add test for framework upgrading to multi-role capability.
> --
>
> Key: MESOS-6900
> URL: https://issues.apache.org/jira/browse/MESOS-6900
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> Frameworks can upgrade to multi-role capability as long as the framework's 
> role remains the same.
> We consider the framework roles unchanged if 
> * a framework previously didn't specify a {{role}} now has {{roles=()}}, or
> * a framework which previously had {{role=A}} and now has {{roles=(A)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6957) timestamp based Task reconcillation

2017-01-19 Thread Shi Lu (JIRA)
Shi Lu created MESOS-6957:
-

 Summary: timestamp based Task reconcillation
 Key: MESOS-6957
 URL: https://issues.apache.org/jira/browse/MESOS-6957
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Shi Lu


If mesos master supports timestamp based task reconciliation, e.g. client sends 
reconcile request with a list of tasklDs and time T, and master streams back 
task changes that is after T. This can reduce the overhead f task 
reconciliation a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4119) Add support for enabling --3way to apply-reviews.py.

2017-01-19 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830749#comment-15830749
 ] 

Zhitao Li commented on MESOS-4119:
--

https://reviews.apache.org/r/55732/

> Add support for enabling --3way to apply-reviews.py.
> 
>
> Key: MESOS-4119
> URL: https://issues.apache.org/jira/browse/MESOS-4119
> Project: Mesos
>  Issue Type: Task
>Reporter: Artem Harutyunyan
>  Labels: mesosphere, newbie
>
> Currently if {{git apply}} fails during apply-reviews, then the change must 
> be rebased and re-uploaded to reviewboard in order for apply-reviews to 
> succeed.
> However, it is often the case that {{git apply --3way}} will succeed since 
> the blob information is included in the diff. Even if it doesn't succeed it 
> will leave conflict markers, which allows the committer to do a manual 
> conflict resolution if desired, or abort if conflict resolution is too 
> difficult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6956) Out of band Task reconcillation

2017-01-19 Thread Shi Lu (JIRA)
Shi Lu created MESOS-6956:
-

 Summary: Out of band Task reconcillation 
 Key: MESOS-6956
 URL: https://issues.apache.org/jira/browse/MESOS-6956
 Project: Mesos
  Issue Type: Task
Reporter: Shi Lu


Can we add capability in mesos master to have out of band task reconcillation? 
Like the client can send a request to master with a list of taskIDs that it 
want to reconcile and the mesos master returns the state of those tasks in the 
response, instead of sending back via the subscribed connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6955) Add capability to batch acknowledge task updates

2017-01-19 Thread Shi Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shi Lu updated MESOS-6955:
--
Shepherd: Zhitao Li

> Add capability to batch acknowledge task updates 
> -
>
> Key: MESOS-6955
> URL: https://issues.apache.org/jira/browse/MESOS-6955
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Shi Lu
>Priority: Critical
>
> We are building a high task throughput framework, and we are not getting 
> offers fast enough, because we have to ack all the task updates, each one 
> need a single HTTP call to the meso master. If mesos master can support batch 
> ack task updates that would be great



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6955) Add capability to batch acknowledge task updates

2017-01-19 Thread Shi Lu (JIRA)
Shi Lu created MESOS-6955:
-

 Summary: Add capability to batch acknowledge task updates 
 Key: MESOS-6955
 URL: https://issues.apache.org/jira/browse/MESOS-6955
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Shi Lu
Priority: Critical


We are building a high task throughput framework, and we are not getting offers 
fast enough, because we have to ack all the task updates, each one need a 
single HTTP call to the meso master. If mesos master can support batch ack task 
updates that would be great



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6953) A compromised mesos-master node can execute code as root on agents.

2017-01-19 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830550#comment-15830550
 ] 

Anindya Sinha edited comment on MESOS-6953 at 1/19/17 10:00 PM:


To mitigate this, we can add an optional arg in mesos-agent called 
{{whitelisted_users}} which is a list of users who are authorized to run tasks 
on the agent.
If this list contains the task user or if this list is empty (or the arg is 
missing), we allow the task to be launched on the agent. Otherwise, agent shall 
not let the task be launched, and send a {{TASK_FAILED}} StatusUpdate with a 
new {{Reason}} denoting that the user is not authorized to run the task.


was (Author: anindya.sinha):
To mitigate this, we can add an optional arg in mesos-agent called 
{{whitelisted-users}} which is a list of users who are authorized to run tasks 
on the agent.
If this list contains the task user or if this list is empty (or the arg is 
missing), we allow the task to be launched on the agent. Otherwise, agent shall 
not let the task be launched, and send a {{TASK_FAILED}} StatusUpdate with a 
new {{Reason}} denoting that the user is not authorized to run the task.

> A compromised mesos-master node can execute code as root on agents.
> ---
>
> Key: MESOS-6953
> URL: https://issues.apache.org/jira/browse/MESOS-6953
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: security, slave
>
> mesos-master has a `--[no-]root_submissions` flag that controls whether 
> frameworks with `root` user are admitted to the cluster.
> However, if a mesos-master node is compromised, it can attempt to schedule 
> tasks on agent as the `root` user. Since mesos-agent has no check against 
> tasks running on the agent for specific users, tasks can get run with `root` 
> privileges can get run within the container on the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task is stuck at staging almost 6hours in stage even after slave executor is 
terminated.
Mesos master keeps the task state in staging state. Since the task is stuck at 
staging framework have not got the update from mesos-master

 The issue got fixed after slave restart.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated

full Log of slave

mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001

[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task is stuck at staging almost 6hours in stage even after slave executor is 
terminated.
Mesos master keeps the task state in staging state. Since the task is stuck at 
staging framework have not got the update from mesos-master

 The issue got fixed after slave restart.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated
{noformat}

full Log of slave
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 

[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task is stuck at staging almost 6hours in stage even after slave executor is 
terminated.
Mesos master keeps the task state in staging state. Since the task is stuck at 
staging framework have not got the update from mesos-master

 The issue got fixed after slave restart.

I can see in the slave logs Asked to run task ' which is terminating/terminated
{noformat}
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated

full Log of slave

mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001

[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436
 ] 

Sathish Kumar edited comment on MESOS-6952 at 1/19/17 9:59 PM:
---

In below log we can see that after 14:42:17. its staged and until we restarted 
around 20.53 it was in staging state. 
FYI no slave reboot/leader election happened

Please find the attached master logs 

{noformat}

I0119 14:41:13.023109 29504 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:13.023146 29504 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: 
a3b53759-3c7e-408c-aec9-80b048e38938) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: 
c65c736c-a00b-4ef3-beb6-589793169edb) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.893996 29499 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736646 29495 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736698 29495 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 
46408395-d5f5-4db2-babf-cefc2145f7f4) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.230947 29494 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862722 29500 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862761 29500 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 

[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436
 ] 

Sathish Kumar edited comment on MESOS-6952 at 1/19/17 9:57 PM:
---

In below log we can see that after 14:42:17. its staged and until we restarted 
around 20.53 it was in staging state. 
FYI no slave reboot/leader election happened

Please find the attached master logs 

\{noformat\}

I0119 14:41:13.023109 29504 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:13.023146 29504 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: 
a3b53759-3c7e-408c-aec9-80b048e38938) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: 
c65c736c-a00b-4ef3-beb6-589793169edb) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.893996 29499 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736646 29495 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736698 29495 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 
46408395-d5f5-4db2-babf-cefc2145f7f4) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.230947 29494 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862722 29500 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862761 29500 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 

[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436
 ] 

Sathish Kumar edited comment on MESOS-6952 at 1/19/17 9:56 PM:
---

In below log we can see that after 14:42:17. its staged and until we restarted 
around 20.53 it was in staging state. 
FYI no slave reboot/leader election happened

Please find the attached master logs 
\{noformat\}
I0119 14:41:13.023109 29504 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:13.023146 29504 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: 
a3b53759-3c7e-408c-aec9-80b048e38938) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: 
c65c736c-a00b-4ef3-beb6-589793169edb) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.893996 29499 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736646 29495 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736698 29495 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 
46408395-d5f5-4db2-babf-cefc2145f7f4) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.230947 29494 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862722 29500 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862761 29500 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 

[jira] [Created] (MESOS-6954) Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent

2017-01-19 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6954:
--

 Summary: Running LAUNCH_NESTED_CONTAINER with a docker container 
id crashes the agent
 Key: MESOS-6954
 URL: https://issues.apache.org/jira/browse/MESOS-6954
 Project: Mesos
  Issue Type: Bug
Reporter: Kevin Klues
Priority: Blocker


Attempting to run {{LAUNCH_NESTED_CONTAINER}} with a parent container that was 
launched with the docker containerizer causes the agent to crash as below. We 
should add a safeguard in the handler to fail gracefully instead.

{noformat}
I0119 21:41:42.438295  3281 http.cpp:304] HTTP POST for /slave(1)/api/v1 from 
10.0.7.194:46700 with User-Agent='python-requests/2.12.4' with 
X-Forwarded-For='10.0.6.162'
I0119 21:41:42.441571  3281 http.cpp:465] Processing call 
LAUNCH_NESTED_CONTAINER_SESSION
W0119 21:41:42.442286  3281 http.cpp:2251] Failed to launch nested container 
62a16556-9c3b-48f2-aa1e-ba1d70093637.09a9d3b0-a245-4aa1-94f1-d10a13526b9b: 
Unsupported
F0119 21:41:42.442371  3282 docker.cpp:2013] Check failed: 
!containerId.has_parent()
*** Check failure stack trace: ***
@ 0x7f539aca01ad  google::LogMessage::Fail()
@ 0x7f539aca1fdd  google::LogMessage::SendToLog()
@ 0x7f539ac9fd9c  google::LogMessage::Flush()
@ 0x7f539aca28d9  google::LogMessageFatal::~LogMessageFatal()
@ 0x7f539a46e2cd  
mesos::internal::slave::DockerContainerizerProcess::destroy()
@ 0x7f539a48a8a7  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIbN5mesos8internal5slave26DockerContainerizerProcessERKNS5_11ContainerIDEbS9_bEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSG_FSE_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f539ac14ca1  process::ProcessManager::resume()
@ 0x7f539ac1dba7  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f53990a5d73  (unknown)
@ 0x7f5398ba652c  (unknown)
@ 0x7f53988e41dd  (unknown)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.

2017-01-19 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6904:
--
Target Version/s: 1.2.0

[~adam-mesos] trying to land this in the next couple of days to get it into 
1.2. Should it be a blocker? (It doesn't have to go in but it would be nice if 
we could)

> Perform batching of allocations to reduce allocator queue backlogging.
> --
>
> Key: MESOS-6904
> URL: https://issues.apache.org/jira/browse/MESOS-6904
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Jacob Janco
>Assignee: Jacob Janco
>  Labels: allocator
>
> Per MESOS-3157:
> {quote}
> Our deployment environments have a lot of churn, with many short-live 
> frameworks that often revive offers. Running the allocator takes a long time 
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the 
> allocator process to get very long, and the allocator effectively becomes 
> unresponsive (eg. a revive offers message takes too long to come to the head 
> of the queue).
> {quote}
> To remedy the above scenario, it is proposed to perform batching of the 
> enqueued allocation operations so that a single allocation operation can 
> satisfy N enqueued allocations. This should reduce the potential for 
> backlogging in the allocator. See the discussion 
> [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
>  in MESOS-3157.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6953) A compromised mesos-master node can execute code as root on agents.

2017-01-19 Thread Anindya Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anindya Sinha updated MESOS-6953:
-
Summary: A compromised mesos-master node can execute code as root on 
agents.  (was: A compromised mesos-Master can execute code as root on agents.)

> A compromised mesos-master node can execute code as root on agents.
> ---
>
> Key: MESOS-6953
> URL: https://issues.apache.org/jira/browse/MESOS-6953
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: security, slave
>
> mesos-master has a `--[no-]root_submissions` flag that controls whether 
> frameworks with `root` user are admitted to the cluster.
> However, if a mesos-master node is compromised, it can attempt to schedule 
> tasks on agent as the `root` user. Since mesos-agent has no check against 
> tasks running on the agent for specific users, tasks can get run with `root` 
> privileges can get run within the container on the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6953) A compromised mesos-Master can execute code as root on agents.

2017-01-19 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830550#comment-15830550
 ] 

Anindya Sinha edited comment on MESOS-6953 at 1/19/17 8:27 PM:
---

To mitigate this, we can add an optional arg in mesos-agent called 
{{whitelisted-users}} which is a list of users who are authorized to run tasks 
on the agent.
If this list contains the task user or if this list is empty (or the arg is 
missing), we allow the task to be launched on the agent. Otherwise, agent shall 
not let the task be launched, and send a {{TASK_FAILED}} StatusUpdate with a 
new {{Reason}} denoting that the user is not authorized to run the task.


was (Author: anindya.sinha):
To mitigate this, we can add an optional arg in mesos-agent called 
`whitelisted-users` which is a list of users who are authorized to run tasks on 
the agent.
If this list contains the task user or if this list is empty (or the arg is 
missing), we allow the task to be launched on the agent. Otherwise, agent shall 
not let the task be launched, and send a `TASK_FAILED` StatusUpdate with a new 
`Reason` denoting that the user is not authorized to run the task.

> A compromised mesos-Master can execute code as root on agents.
> --
>
> Key: MESOS-6953
> URL: https://issues.apache.org/jira/browse/MESOS-6953
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: security, slave
>
> mesos-master has a `--[no-]root_submissions` flag that controls whether 
> frameworks with `root` user are admitted to the cluster.
> However, if a mesos-master node is compromised, it can attempt to schedule 
> tasks on agent as the `root` user. Since mesos-agent has no check against 
> tasks running on the agent for specific users, tasks can get run with `root` 
> privileges can get run within the container on the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6953) A compromised mesos-Master can execute code as root on agents.

2017-01-19 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830550#comment-15830550
 ] 

Anindya Sinha commented on MESOS-6953:
--

To mitigate this, we can add an optional arg in mesos-agent called 
`whitelisted-users` which is a list of users who are authorized to run tasks on 
the agent.
If this list contains the task user or if this list is empty (or the arg is 
missing), we allow the task to be launched on the agent. Otherwise, agent shall 
not let the task be launched, and send a `TASK_FAILED` StatusUpdate with a new 
`Reason` denoting that the user is not authorized to run the task.

> A compromised mesos-Master can execute code as root on agents.
> --
>
> Key: MESOS-6953
> URL: https://issues.apache.org/jira/browse/MESOS-6953
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: security, slave
>
> mesos-master has a `--[no-]root_submissions` flag that controls whether 
> frameworks with `root` user are admitted to the cluster.
> However, if a mesos-master node is compromised, it can attempt to schedule 
> tasks on agent as the `root` user. Since mesos-agent has no check against 
> tasks running on the agent for specific users, tasks can get run with `root` 
> privileges can get run within the container on the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6953) A compromised mesos-Master can execute code as root on agents.

2017-01-19 Thread Anindya Sinha (JIRA)
Anindya Sinha created MESOS-6953:


 Summary: A compromised mesos-Master can execute code as root on 
agents.
 Key: MESOS-6953
 URL: https://issues.apache.org/jira/browse/MESOS-6953
 Project: Mesos
  Issue Type: Bug
  Components: security
Reporter: Anindya Sinha
Assignee: Anindya Sinha


mesos-master has a `--[no-]root_submissions` flag that controls whether 
frameworks with `root` user are admitted to the cluster.

However, if a mesos-master node is compromised, it can attempt to schedule 
tasks on agent as the `root` user. Since mesos-agent has no check against tasks 
running on the agent for specific users, tasks can get run with `root` 
privileges can get run within the container on the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830529#comment-15830529
 ] 

Kevin Klues commented on MESOS-6952:


Can you pelase edit the logs you pasted to make it more readable. Just put tags 
around them like:

\{noformat\}
LOGS
\{noformat\}

> Mesos task state was stuck in staging even after executor terminated
> 
>
> Key: MESOS-6952
> URL: https://issues.apache.org/jira/browse/MESOS-6952
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 0.28.2
> Environment: ubuntu 14.04
>Reporter: Sathish Kumar
>
> Task is stuck at staging almost 6hours in stage even after slave executor is 
> terminated.
> Mesos master keeps the task state in staging state. Since the task is stuck 
> at staging framework have not got the update from mesos-master
>  The issue got fixed after slave restart.
> I can see in the slave logs Asked to run task ' which is 
> terminating/terminated
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
> status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for 
> task ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
> ct:148481682:0:foocare_zendesk_round_robin: for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.097453 107774 slave.cpp:1480] Launching task 
> ct:148481682:0:foocare_zendesk_round_robin: for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
>  14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
> 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
> 'ct:148481682:0:foocare_zendesk_round_robin:' which is 
> terminating/terminated
> full Log of slave
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
> (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
> TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE 
> for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) 
> for task ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
> (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
> update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
> acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
> status update TASK_FAILED (UUID: 

[jira] [Updated] (MESOS-6948) AgentAPITest.LaunchNestedContainerSession is flaky

2017-01-19 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6948:
--
Assignee: Kevin Klues

> AgentAPITest.LaunchNestedContainerSession is flaky
> --
>
> Key: MESOS-6948
> URL: https://issues.apache.org/jira/browse/MESOS-6948
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
> Environment: CentOS 7 VM, libevent and SSL enabled
>Reporter: Greg Mann
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: debugging, tests
> Attachments: AgentAPITest.LaunchNestedContainerSession.txt
>
>
> This was observed in a CentOS 7 VM, with libevent and SSL enabled:
> {code}
> I0118 22:17:23.528846  2887 http.cpp:464] Processing call 
> LAUNCH_NESTED_CONTAINER_SESSION
> I0118 22:17:23.530452  2887 containerizer.cpp:1807] Starting nested container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.532265  2887 containerizer.cpp:1831] Trying to chown 
> '/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e'
>  to user 'vagrant'
> I0118 22:17:23.535213  2887 switchboard.cpp:570] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430"
>  --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" 
> --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.537210  2887 switchboard.cpp:600] Created I/O switchboard 
> server (pid: 3335) listening on socket file 
> '/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430' for 
> container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.543665  2887 containerizer.cpp:1540] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"printf output && printf 
> error 
> 1>&2"},"environment":{},"err":{"fd":16,"type":"FD"},"in":{"fd":11,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"vagrant"}"
>  --pipe_read="12" --pipe_write="13" 
> --runtime_directory="/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_QVZGrY/containers/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e"
>  --unshare_namespace_mnt="false"'
> I0118 22:17:23.556032  2887 launcher.cpp:133] Forked child with pid '3337' 
> for container 
> '492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e'
> I0118 22:17:23.563900  2887 fetcher.cpp:349] Starting to fetch URIs for 
> container: 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e, 
> directory: 
> /tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.962441  2887 containerizer.cpp:2481] Container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e has 
> exited
> I0118 22:17:23.962484  2887 containerizer.cpp:2118] Destroying container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e in 
> RUNNING state
> I0118 22:17:23.962715  2887 launcher.cpp:149] Asked to destroy container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.977562  2887 process.cpp:3733] Failed to process request for 
> '/slave(69)/api/v1': Container has or is being destroyed
> W0118 22:17:23.978216  2887 http.cpp:2734] Failed to attach to nested 
> container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e: 
> Container has or is being destroyed
> I0118 22:17:23.978330  2887 process.cpp:1435] Returning '500 Internal Server 
> Error' for '/slave(69)/api/v1' (Container has or is being destroyed)
> ../../src/tests/api_tests.cpp:3960: Failure
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> {code}
> Find attached the full log from a failed run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"

2017-01-19 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6804:
--
Priority: Critical  (was: Blocker)

> Running 'tty' inside a debug container that has a tty reports "Not a tty"
> -
>
> Key: MESOS-6804
> URL: https://issues.apache.org/jira/browse/MESOS-6804
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: debugging, mesosphere
>
> We need to inject `/dev/console` into the container and map it to the slave 
> end of the TTY we are attached to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436
 ] 

Sathish Kumar edited comment on MESOS-6952 at 1/19/17 7:26 PM:
---

In below log we can see that after 14:42:17. its staged and until we restarted 
around 20.53 it was in staging state. 
FYI no slave reboot/leader election happened

Please find the attached master logs 

I0119 14:41:13.023109 29504 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:13.023146 29504 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: 
a3b53759-3c7e-408c-aec9-80b048e38938) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: 
c65c736c-a00b-4ef3-beb6-589793169edb) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.893996 29499 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736646 29495 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736698 29495 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 
46408395-d5f5-4db2-babf-cefc2145f7f4) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.230947 29494 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862722 29500 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862761 29500 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 

[jira] [Commented] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436
 ] 

Sathish Kumar commented on MESOS-6952:
--

In below log we can see that after 14:42:17. its staged and until we restarted 
around 20.53 it was in staging state. 
Please find the attached master logs 

I0119 14:41:13.023109 29504 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:13.023146 29504 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: 
a3b53759-3c7e-408c-aec9-80b048e38938) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: 
c65c736c-a00b-4ef3-beb6-589793169edb) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:53.893996 29499 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736646 29495 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:54.736698 29495 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 
46408395-d5f5-4db2-babf-cefc2145f7f4) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 
5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.230947 29494 master.cpp:6487] Removing task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 
on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862722 29500 master.hpp:177] Adding task 
ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; 
mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:15.862761 29500 master.cpp:3589] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at 
scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with 
resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 
247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 
22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 
(distancematrix8.prod-foo-dcos.foobar.net)
I0119 14:42:17.093446 29504 master.cpp:6487] Removing task 

[jira] [Updated] (MESOS-6906) Introduce a general non-interpreting task check.

2017-01-19 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6906:
---
Sprint: Mesosphere Sprint 50  (was: Mesosphere Sprint 49)

> Introduce a general non-interpreting task check.
> 
>
> Key: MESOS-6906
> URL: https://issues.apache.org/jira/browse/MESOS-6906
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> In addition to result-interpreting, killing health check, there is a 
> requirement from Mesos framework authors for a general check that can execute 
> an arbitrary command or send an HTTP request and pass the result to the 
> scheduler without interpreting it.
> This ticket aims to implement this functionality by introducing a new class 
> of a check in Mesos. Design doc: 
> https://docs.google.com/document/d/1VLdaH7i7UDT3_38aOlzTOtH7lwH-laB8dCwNzte0DkU



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6908) Zero health check timeout is interpreted literally.

2017-01-19 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6908:
---
Sprint: Mesosphere Sprint 50  (was: Mesosphere Sprint 49)

> Zero health check timeout is interpreted literally.
> ---
>
> Key: MESOS-6908
> URL: https://issues.apache.org/jira/browse/MESOS-6908
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: health-check, mesosphere
>
> Currently zero health check timeout is interpreted literally, which is not 
> very helpful since a health check does not even get a chance to finish. We 
> suggest to fixe this behaviour by interpreting zero as {{Duration::max()}} 
> effectively rendering the timeout infinite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6944) Mesos - AD integration Process / Example

2017-01-19 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830297#comment-15830297
 ] 

Kevin Klues commented on MESOS-6944:


Here is a link to what Alexander is referring to:
https://docs.mesosphere.com/1.8/administration/id-and-access-mgt/ldap/

> Mesos - AD integration Process / Example
> 
>
> Key: MESOS-6944
> URL: https://issues.apache.org/jira/browse/MESOS-6944
> Project: Mesos
>  Issue Type: Task
>  Components: modules
>Reporter: Rahul Bhardwaj
>  Labels: mesosphere
>
> Hi Team,
> We are trying to configure AD authentication with Mesos for HTTP endpoints 
> (only UI). 
> But we couldnt find any clear documentation or exmaple on your site  
> http://mesos.apache.org/ that shows the process of integration with AD 
> (ldap).  Also we could not find reference to any existing Ldap library to use 
> with Mesos on the Module page.
> Authentication doc: 
> http://mesos.apache.org/documentation/latest/authentication/. 
> Module doc:http://mesos.apache.org/documentation/latest/modules/ 
> (Authentication section).
> Can you please tell us if this feature is already available and an example 
> documentation will help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1582) Improve build time.

2017-01-19 Thread Alex Clemmer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830295#comment-15830295
 ] 

Alex Clemmer commented on MESOS-1582:
-

Fortunately, [~vinodkone], it happens to be the case that this is already 
roadmap-adjacent for [~kaysoky] and I. It would be natural to tackle this when 
we also tackle the "break libmesos up into many binaries" issue: 
https://issues.apache.org/jira/browse/MESOS-3542

I think the next step is to write up a little design doc about the plan.

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>  Labels: microsoft, tech-debt
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830280#comment-15830280
 ] 

Vinod Kone commented on MESOS-6952:
---

Can you paste the corresponding master logs?

> Mesos task state was stuck in staging even after executor terminated
> 
>
> Key: MESOS-6952
> URL: https://issues.apache.org/jira/browse/MESOS-6952
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 0.28.2
> Environment: ubuntu 14.04
>Reporter: Sathish Kumar
>
> Task is stuck at staging almost 6hours in stage even after slave executor is 
> terminated.
> Mesos master keeps the task state in staging state. Since the task is stuck 
> at staging framework have not got the update from mesos-master
>  The issue got fixed after slave restart.
> I can see in the slave logs Asked to run task ' which is 
> terminating/terminated
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
> status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for 
> task ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
> ct:148481682:0:foocare_zendesk_round_robin: for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.097453 107774 slave.cpp:1480] Launching task 
> ct:148481682:0:foocare_zendesk_round_robin: for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
>  14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
> 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
> 'ct:148481682:0:foocare_zendesk_round_robin:' which is 
> terminating/terminated
> full Log of slave
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
> (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
> TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE 
> for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) 
> for task ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
> (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
> update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
> acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
> status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for 
> task 

[jira] [Updated] (MESOS-6355) Improvements to task group support.

2017-01-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6355:
--
Sprint: Mesosphere Sprint 49

> Improvements to task group support.
> ---
>
> Key: MESOS-6355
> URL: https://issues.apache.org/jira/browse/MESOS-6355
> Project: Mesos
>  Issue Type: Epic
>Reporter: Vinod Kone
>  Labels: mesosphere
>
> This is a follow up epic to MESOS-2249 to capture further improvements and 
> changes that need to be made to the MVP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6864) Container Exec should be possible with tasks belonging to a task group

2017-01-19 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6864:
--
Story Points: 5

> Container Exec should be possible with tasks belonging to a task group
> --
>
> Key: MESOS-6864
> URL: https://issues.apache.org/jira/browse/MESOS-6864
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>  Labels: debugging, mesosphere
>
> {{LaunchNestedContainerSession}} currently requires the parent container to 
> be an Executor 
> (https://github.com/apache/mesos/blob/f89f28724f5837ff414dc6cc84e1afb63f3306e5/src/slave/http.cpp#L2189-L2211).
> This works for command tasks, because the task container id is the same as 
> the executor container id.
> But it won't work for pod tasks whose container id is different from 
> executor’s container id.
> In order to resolve this ticket, we need to allow launching a child container 
> at an arbitrary level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6864) Container Exec should be possible with tasks belonging to a task group

2017-01-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821809#comment-15821809
 ] 

Gastón Kleiman edited comment on MESOS-6864 at 1/19/17 4:42 PM:


https://reviews.apache.org/r/55676/
https://reviews.apache.org/r/55722/
https://reviews.apache.org/r/55677/
https://reviews.apache.org/r/55678/
https://reviews.apache.org/r/55679/
https://reviews.apache.org/r/55464/


was (Author: gkleiman):
https://reviews.apache.org/r/55464/

> Container Exec should be possible with tasks belonging to a task group
> --
>
> Key: MESOS-6864
> URL: https://issues.apache.org/jira/browse/MESOS-6864
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Blocker
>  Labels: debugging, mesosphere
>
> {{LaunchNestedContainerSession}} currently requires the parent container to 
> be an Executor 
> (https://github.com/apache/mesos/blob/f89f28724f5837ff414dc6cc84e1afb63f3306e5/src/slave/http.cpp#L2189-L2211).
> This works for command tasks, because the task container id is the same as 
> the executor container id.
> But it won't work for pod tasks whose container id is different from 
> executor’s container id.
> In order to resolve this ticket, we need to allow launching a child container 
> at an arbitrary level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6947) Fix pailer XSS vulnerability

2017-01-19 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6947:

Description: 
There exists an XSS vulnerability in pailer.html.

{{window.name}} can be set to an external domain serving js which is wrapped in 

[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Description: 
Task is stuck at staging almost 6hours in stage even after slave executor is 
terminated.
Mesos master keeps the task state in staging state. Since the task is stuck at 
staging framework have not got the update from mesos-master

 The issue got fixed after slave restart.

I can see in the slave logs Asked to run task ' which is terminating/terminated

mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated

full Log of slave

mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858762 107759 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001

[jira] [Commented] (MESOS-1582) Improve build time.

2017-01-19 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830103#comment-15830103
 ] 

Vinod Kone commented on MESOS-1582:
---

[~hausdorff] I'm huge +1 to fixing this. 

Unfortunately, I don't have cycles to shepherd this myself, but I'm hoping we 
can find one from our ever growing committer pool.

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>  Labels: microsoft, tech-debt
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated

2017-01-19 Thread Sathish Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish Kumar updated MESOS-6952:
-
Summary: Mesos task state was stuck in staging even after executor 
terminated  (was: Mesos task state was stuck in staging inspite)

> Mesos task state was stuck in staging even after executor terminated
> 
>
> Key: MESOS-6952
> URL: https://issues.apache.org/jira/browse/MESOS-6952
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 0.28.2
> Environment: ubuntu 14.04
>Reporter: Sathish Kumar
>
> Task is stuck at staging stage even after slave executor is terminated.
> Mesos master keeps the task state in staging state. Since the task is stuck 
> at staging framework have not got the update from mesos-master
>  The issue got fixed after slave restart.
> I can see in the slave logs Asked to run task ' which is 
> terminating/terminated
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
> status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for 
> task ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
> ct:148481682:0:foocare_zendesk_round_robin: for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:17.097453 107774 slave.cpp:1480] Launching task 
> ct:148481682:0:foocare_zendesk_round_robin: for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
>  14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
> 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
> 'ct:148481682:0:foocare_zendesk_round_robin:' which is 
> terminating/terminated
> full Log of slave
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
> (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
> TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE 
> for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) 
> for task ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
> (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
> update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
> acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
> ct:148481682:0:foocare_zendesk_round_robin: of framework 
> 19393553-2061-4d2f-8c05-a0ba688334f4-0001
> mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
>  14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
> status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for 
> task 

[jira] [Created] (MESOS-6952) Mesos task state was stuck in staging inspite

2017-01-19 Thread Sathish Kumar (JIRA)
Sathish Kumar created MESOS-6952:


 Summary: Mesos task state was stuck in staging inspite
 Key: MESOS-6952
 URL: https://issues.apache.org/jira/browse/MESOS-6952
 Project: Mesos
  Issue Type: Bug
  Components: executor
Affects Versions: 0.28.2
 Environment: ubuntu 14.04
Reporter: Sathish Kumar


Task is stuck at staging stage even after slave executor is terminated.
Mesos master keeps the task state in staging state. Since the task is stuck at 
staging framework have not got the update from mesos-master

 The issue got fixed after slave restart.

I can see in the slave logs Asked to run task ' which is terminating/terminated

mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097193 107774 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:17.097453 107774 slave.cpp:1480] Launching task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119
 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 
'ct:148481682:0:foocare_zendesk_round_robin:' for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 
'ct:148481682:0:foocare_zendesk_round_robin:' which is 
terminating/terminated

full Log of slave

mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update 
TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED 
(UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status 
update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update 
acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for 
status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task 
ct:148481682:0:foocare_zendesk_round_robin: of framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001
mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119
 14:42:15.858510 107759 slave.cpp:1361] Got assigned task 
ct:148481682:0:foocare_zendesk_round_robin: for framework 
19393553-2061-4d2f-8c05-a0ba688334f4-0001

[jira] [Updated] (MESOS-6946) Make wait status checks consistent.

2017-01-19 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6946:
--
Labels: tech-debt  (was: )

> Make wait status checks consistent.
> ---
>
> Key: MESOS-6946
> URL: https://issues.apache.org/jira/browse/MESOS-6946
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>Priority: Trivial
>  Labels: tech-debt
>
> There are various places that test the {{wait(2)}} exit status in different 
> ways. Clean this up to be consistent and use {{WSTRINGIFY}} to format error 
> messages where appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6917) Segfault when the executor sets an invalid UUID when sending a status update.

2017-01-19 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6917:
--
Fix Version/s: 1.0.3

> Segfault when the executor sets an invalid UUID  when sending a status update.
> --
>
> Key: MESOS-6917
> URL: https://issues.apache.org/jira/browse/MESOS-6917
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0
>Reporter: Aaron Wood
>Assignee: Aaron Wood
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.1, 1.2.0, 1.0.3
>
>
> A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and 
> sends it off to the agent:
> {code}
> ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state 
> == ERROR: Not a valid UUID
> *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are 
> using GNU date ***
> PC: @ 0x7efeb6101428 (unknown)
> *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 
> 14007; stack trace: ***
> @ 0x7efeb64a6390 (unknown)
> @ 0x7efeb6101428 (unknown)
> @ 0x7efeb610302a (unknown)
> @ 0x560df739fa6e _Abort()
> @ 0x560df739fa9c _Abort()
> @ 0x7efebb53a5ad Try<>::get()
> @ 0x7efebb5363d6 Try<>::get()
> @ 0x7efebbd84809 
> mesos::internal::slave::validation::executor::call::validate()
> @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor()
> @ 0x7efebbc773b8 
> _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_
> @ 0x7efebbcb5808 
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_
> @ 0x7efebbfb2aea std::function<>::operator()()
> @ 0x7efebcb158b8 
> _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb
> @ 0x7efebcb1a10a 
> _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv
> @ 0x7efebcb1c5f8 
> _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7efebb5ce8ca std::function<>::operator()()
> @ 0x7efebb5c4b27 
> _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_
> @ 0x7efebb5d4e1e 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7efebcb30baf std::function<>::operator()()
> @ 0x7efebcb13fd6 process::ProcessBase::visit()
> @ 0x7efebcb1f3c8 process::DispatchEvent::visit()
> @ 0x7efebb3ab2ea process::ProcessBase::serve()
> @ 0x7efebcb0fe8a process::ProcessManager::resume()
> @ 0x7efebcb0c5a3 
> _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
> @ 0x7efebcb1ea34 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7efebcb1e98a 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
> @ 0x7efebcb1e91a 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7efeb6980c80 (unknown)
> @ 0x7efeb649c6ba start_thread
> @ 0x7efeb61d282d (unknown)
> Aborted (core dumped)
> {code}
> https://reviews.apache.org/r/55480/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6944) Mesos - AD integration Process / Example

2017-01-19 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830048#comment-15830048
 ] 

Alexander Rojas commented on MESOS-6944:


Apache Mesos doesn't provide integration with LDAP or AD out of the box. It 
does provide authentication based on configuration files which uses the Basic 
authentication scheme. It is left to Mesos users to build their own modules to 
extend the basic Mesos feature set. Some companies have created products that 
give you proprietary LDAP integrations (DC/OS Enterprise being an example).

> Mesos - AD integration Process / Example
> 
>
> Key: MESOS-6944
> URL: https://issues.apache.org/jira/browse/MESOS-6944
> Project: Mesos
>  Issue Type: Task
>  Components: modules
>Reporter: Rahul Bhardwaj
>  Labels: mesosphere
>
> Hi Team,
> We are trying to configure AD authentication with Mesos for HTTP endpoints 
> (only UI). 
> But we couldnt find any clear documentation or exmaple on your site  
> http://mesos.apache.org/ that shows the process of integration with AD 
> (ldap).  Also we could not find reference to any existing Ldap library to use 
> with Mesos on the Module page.
> Authentication doc: 
> http://mesos.apache.org/documentation/latest/authentication/. 
> Module doc:http://mesos.apache.org/documentation/latest/modules/ 
> (Authentication section).
> Can you please tell us if this feature is already available and an example 
> documentation will help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >