[jira] [Updated] (MESOS-6821) Override of automatic resources should be by exact match not substring
[ https://issues.apache.org/jira/browse/MESOS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Merry updated MESOS-6821: --- Description: The agent code for auto-detecting resources (cpus, mem, disk) assumes that, say, "cpus" has been specified if the string "cpus" appears anywhere in the resource string (see [here|https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79]). This means that using a custom resource called, say, "members", will disable auto-detection of the "mem" resource. (was: The agent code for auto-detecting resources (cpus, mem, disk) assumes that, say, "cpus" has been specified if the string "cpus" appears anywhere in the resource string (see [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)). This means that using a custom resource called, say, "members", will disable auto-detection of the "mem" resource.) > Override of automatic resources should be by exact match not substring > -- > > Key: MESOS-6821 > URL: https://issues.apache.org/jira/browse/MESOS-6821 > Project: Mesos > Issue Type: Improvement > Components: agent >Affects Versions: 1.1.0 > Environment: Ubuntu 16.04 x86_64 >Reporter: Bruce Merry >Priority: Minor > Labels: newbie > > The agent code for auto-detecting resources (cpus, mem, disk) assumes that, > say, "cpus" has been specified if the string "cpus" appears anywhere in the > resource string (see > [here|https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79]). > This means that using a custom resource called, say, "members", will disable > auto-detection of the "mem" resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6821) Override of automatic resources should be by exact match not substring
[ https://issues.apache.org/jira/browse/MESOS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831350#comment-15831350 ] Bruce Merry commented on MESOS-6821: I'm going to look into this now. > Override of automatic resources should be by exact match not substring > -- > > Key: MESOS-6821 > URL: https://issues.apache.org/jira/browse/MESOS-6821 > Project: Mesos > Issue Type: Improvement > Components: agent >Affects Versions: 1.1.0 > Environment: Ubuntu 16.04 x86_64 >Reporter: Bruce Merry >Priority: Minor > Labels: newbie > > The agent code for auto-detecting resources (cpus, mem, disk) assumes that, > say, "cpus" has been specified if the string "cpus" appears anywhere in the > resource string (see > [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)). > This means that using a custom resource called, say, "members", will disable > auto-detection of the "mem" resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6821) Override of automatic resources should be by exact match not substring
[ https://issues.apache.org/jira/browse/MESOS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Merry updated MESOS-6821: --- Description: The agent code for auto-detecting resources (cpus, mem, disk) assumes that, say, "cpus" has been specified if the string "cpus" appears anywhere in the resource string (see [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)). This means that using a custom resource called, say, "members", will disable auto-detection of the "mem" resource. (was: The agent code for auto-detecting resources (cpus, mem, disk) assumes that, say, "cpus" has been specified in the string "cpus" appears anywhere in the resource string (see [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)). This means that using a custom resource called, say, "members", will disable auto-detection of the "mem" resource.) > Override of automatic resources should be by exact match not substring > -- > > Key: MESOS-6821 > URL: https://issues.apache.org/jira/browse/MESOS-6821 > Project: Mesos > Issue Type: Improvement > Components: agent >Affects Versions: 1.1.0 > Environment: Ubuntu 16.04 x86_64 >Reporter: Bruce Merry >Priority: Minor > Labels: newbie > > The agent code for auto-detecting resources (cpus, mem, disk) assumes that, > say, "cpus" has been specified if the string "cpus" appears anywhere in the > resource string (see > [here](https://github.com/apache/mesos/blob/1.1.0/src/slave/containerizer/containerizer.cpp#L79)). > This means that using a custom resource called, say, "members", will disable > auto-detection of the "mem" resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task was stuck at staging state almost 6hours even after slave executor is terminated on the slave. Since the task was stuck at staging, framework have not received update from mesos-master. The issue got fixed after slave restart and the task was moved from staging to task lost state. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task was stuck at staging state almost 6hours even after slave executor is terminated on the slave. Since the task was stuck at staging, framework have not received update from mesos-master. The issue got fixed after slave restart and the task was moved from staging to task lost state. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task was stuck at staging state almost 6hours even after slave executor is terminated on the slave. Since the task was stuck at staging, framework have not received update from mesos-master. The issue got fixed after slave restart and the task was moved from staging to task lost state. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task was stuck at staging state almost 6hours even after slave executor is terminated on the slave. Since the task was stuck at staging, framework have not received update from mesos-master. The issue got fixed after slave restart and the task was removed from staging to task lost state. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework
[jira] [Commented] (MESOS-6944) Mesos - AD integration Process / Example
[ https://issues.apache.org/jira/browse/MESOS-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831294#comment-15831294 ] Rahul Bhardwaj commented on MESOS-6944: --- Hi [~klueska], [~arojas], Thanks a lot for sharing these details. We will look for our options. It would be good if in future you provide us this option with Mesos just like Marathon does. Thanks Rahul > Mesos - AD integration Process / Example > > > Key: MESOS-6944 > URL: https://issues.apache.org/jira/browse/MESOS-6944 > Project: Mesos > Issue Type: Task > Components: modules >Reporter: Rahul Bhardwaj > Labels: mesosphere > > Hi Team, > We are trying to configure AD authentication with Mesos for HTTP endpoints > (only UI). > But we couldnt find any clear documentation or exmaple on your site > http://mesos.apache.org/ that shows the process of integration with AD > (ldap). Also we could not find reference to any existing Ldap library to use > with Mesos on the Module page. > Authentication doc: > http://mesos.apache.org/documentation/latest/authentication/. > Module doc:http://mesos.apache.org/documentation/latest/modules/ > (Authentication section). > Can you please tell us if this feature is already available and an example > documentation will help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6940) Do not send offers to MULTI_ROLE schedulers if agent does not have MULTI_ROLE capability.
[ https://issues.apache.org/jira/browse/MESOS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Guo reassigned MESOS-6940: -- Assignee: Jay Guo > Do not send offers to MULTI_ROLE schedulers if agent does not have MULTI_ROLE > capability. > - > > Key: MESOS-6940 > URL: https://issues.apache.org/jira/browse/MESOS-6940 > Project: Mesos > Issue Type: Task > Components: allocation, master >Reporter: Benjamin Mahler >Assignee: Jay Guo > > Old agents that do not have the MULTI_ROLE capability cannot correctly > receive tasks from schedulers that have the MULTI_ROLE capability *and are > using multiple roles*. In this case, we should not send the offer to the > scheduler, rather than sending an offer but rejecting the scheduler's > operations. > Note also that since we allow a single role scheduler to upgrade into having > the MULTI_ROLE capability (use of the {{FrameworkInfo.roles}} field) so long > as they continue to use a single role (in phase 1 of multi-role support the > roles cannot be changed), we could continue sending offers if the scheduler > is MULTI_ROLE capable but only uses a single role. > In phase 2 of multi-role support, we cannot safely allow a MULTI_ROLE > scheduler to receive resources from a non-MULTI_ROLE agent, so it seems we > should simply disallow MULTI_ROLE schedulers from receiving offers from > non-MULTI_ROLE agents, regardless of how many roles the scheduler is using. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6902) Add support for agent capabilities
[ https://issues.apache.org/jira/browse/MESOS-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831216#comment-15831216 ] Jay Guo commented on MESOS-6902: https://reviews.apache.org/r/55710/ Add agent capabilities to v0 master API /state > Add support for agent capabilities > -- > > Key: MESOS-6902 > URL: https://issues.apache.org/jira/browse/MESOS-6902 > Project: Mesos > Issue Type: Improvement > Components: agent >Reporter: Neil Conway >Assignee: Jay Guo > Labels: mesosphere > > Similarly to how we might add support for master capabilities (MESOS-5675), > agent capabilities would also make sense: in a mixed cluster, the master > might have support for features that are not present on certain agents, and > vice versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6876) Default "Accept" type for LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_OUTPUT should be streaming type
[ https://issues.apache.org/jira/browse/MESOS-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6876: -- Sprint: Mesosphere Sprint 50 > Default "Accept" type for LAUNCH_NESTED_CONTAINER_SESSION and > ATTACH_CONTAINER_OUTPUT should be streaming type > -- > > Key: MESOS-6876 > URL: https://issues.apache.org/jira/browse/MESOS-6876 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Anand Mazumdar >Priority: Blocker > > Right now the default "Accept" type in the HTTP response to > LAUNCH_NESTED_CONTAINER_SESSION and ATTACH_CONTAINER_OUTPUT is > "application/json". This should be instead "application/json+recordio" or > whatever we decide the streaming type should be in MESOS-3601. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`
[ https://issues.apache.org/jira/browse/MESOS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830997#comment-15830997 ] Adam B commented on MESOS-6553: --- Can you (or your shepherd, [~jieyu]?) add the commits and close with appropriate fixVersion? > Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` > to launcher->fork()` > > > Key: MESOS-6553 > URL: https://issues.apache.org/jira/browse/MESOS-6553 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > Labels: tech-debt > > Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of > our isolators and extract information from them, which we pass one by one to > our {{launcher->fork()}} call in separate parameters. > Instead, we should construct a new {{ContainerLaunchInfo}} which is the > concatenation of the ones returned by each isolator, and pass this new one > down to {{launcher->fork()}} instead of building up individual arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6639) Update 'io::redirect()' to take an optional vector of callback hooks.
[ https://issues.apache.org/jira/browse/MESOS-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830996#comment-15830996 ] Adam B commented on MESOS-6639: --- [~jieyu] Want to close this with FixVersion=1.2.0 and the appropriate commit(s) in a comment? > Update 'io::redirect()' to take an optional vector of callback hooks. > - > > Key: MESOS-6639 > URL: https://issues.apache.org/jira/browse/MESOS-6639 > Project: Mesos > Issue Type: Improvement >Reporter: Kevin Klues >Assignee: Kevin Klues > > These callback hooks should be invoked before passing any data read from > the 'from' file descriptor on to the 'to' file descriptor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6714) Port `slave_tests.cpp`
[ https://issues.apache.org/jira/browse/MESOS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830972#comment-15830972 ] Joseph Wu commented on MESOS-6714: -- Some progress: {code} commit d56139556ae41d3f47fb5b391e071d409832edb9 Author: Alex ClemmerDate: Wed Jan 18 14:49:24 2017 -0800 Windows: Added more agent tests. These tests can pass with some minor scripting changes (changing the sleep command to a Windows compatible command) and due fixing subprocess lifecycles with Job Objects. Review: https://reviews.apache.org/r/55314/ {code} > Port `slave_tests.cpp` > -- > > Key: MESOS-6714 > URL: https://issues.apache.org/jira/browse/MESOS-6714 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: microsoft, windows-mvp > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6707) Port `gc_tests.cpp`
[ https://issues.apache.org/jira/browse/MESOS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830967#comment-15830967 ] Joseph Wu edited comment on MESOS-6707 at 1/20/17 1:17 AM: --- {code} commit 5b52217f34197a459fffe3c09be9167046be9df6 Author: Alex ClemmerDate: Wed Jan 18 14:53:38 2017 -0800 Windows: Fixed hanging symlink bug in `os::rmdir`. The Windows implementation of `os::rmdir` will fail to delete "hanging" symlinks (i.e., symlinks whose targets do not exist). Note that on Windows this bug is specific to symlinks whose targets are _deleted_, since it is impossible to create a symlink whose target does not exist. The primary issue that causes this problem is that it is very difficult to tell whether a symlink points at a directory or a file unless you resolve the symlink and determine whether the target is a directory or a file. In situations where the target does not exist, we can't use this information, and so `os::rmdir` occasionally mis-routes a symlink to (what was) a directory to a `::remove` call, which will fail with a cryptic error. To fix this behavior, this commit will introduce code that simply tries to remove the reparse point with both `RemoveDirectory` and `DeleteFile`, and if either succeeds, we report success for the operation. This represents a "best effort"; in the case that the reparse point represents something more exotic than a symlink, we will still fail, but by choosing not to verify whether the target is a directory or a file, we simplify the code and still obtain the outcome of having deleted the directory. This commit is the primary blocker for MESOS-6707, as deleting the Agent sandbox will sometimes cause us to delete the latest run directory for the executor before the symlinked `latest` directory itself. This causes the delete to fail, and then the GC tests to fail, since they tend to assert the directory does not exist. Review: https://reviews.apache.org/r/55327/ {code} {code} commit 08e5cd2580a142977b2d8a3abf2a70a398147f01 Author: Alex Clemmer Date: Wed Jan 18 14:59:17 2017 -0800 Windows: Added GC tests to the build. These tests are fixed by the fix to `os::rmdir` in review #55327. The tests were failing to delete sandbox folders when the sandbox was deleted before deleting the symlink to the sandbox. Review: https://reviews.apache.org/r/55328/ {code} was (Author: kaysoky): {code} commit 08e5cd2580a142977b2d8a3abf2a70a398147f01 Author: Alex Clemmer Date: Wed Jan 18 14:59:17 2017 -0800 Windows: Added GC tests to the build. These tests are fixed by the fix to `os::rmdir` in review #55327. The tests were failing to delete sandbox folders when the sandbox was deleted before deleting the symlink to the sandbox. Review: https://reviews.apache.org/r/55328/ {code} > Port `gc_tests.cpp` > --- > > Key: MESOS-6707 > URL: https://issues.apache.org/jira/browse/MESOS-6707 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: microsoft, windows-mvp > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.
[ https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6357: -- Target Version/s: (was: 1.1.1, 1.2.0) > `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8. > > > Key: MESOS-6357 > URL: https://issues.apache.org/jira/browse/MESOS-6357 > Project: Mesos > Issue Type: Bug > Components: tests >Affects Versions: 1.1.0 > Environment: Debian 8 with SSL enabled >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: flaky-test > > {noformat} > [00:21:51] : [Step 10/10] [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit > [00:21:51]W: [Step 10/10] I1008 00:21:51.357839 23530 > containerizer.cpp:202] Using isolation: > cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image > [00:21:51]W: [Step 10/10] I1008 00:21:51.361143 23530 > linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [00:21:51]W: [Step 10/10] I1008 00:21:51.366930 23547 > containerizer.cpp:557] Recovering containerizer > [00:21:51]W: [Step 10/10] I1008 00:21:51.367962 23551 provisioner.cpp:253] > Provisioner recovery complete > [00:21:51]W: [Step 10/10] I1008 00:21:51.368253 23549 > containerizer.cpp:954] Starting container > 42589936-56b2-4e41-86d8-447bfaba4666 for executor 'executor' of framework > [00:21:51]W: [Step 10/10] I1008 00:21:51.368577 23548 cgroups.cpp:404] > Creating cgroup at > '/sys/fs/cgroup/cpu,cpuacct/mesos_test_458f8018-67e7-4cc6-8126-a535974db35d/42589936-56b2-4e41-86d8-447bfaba4666' > for container 42589936-56b2-4e41-86d8-447bfaba4666 > [00:21:51]W: [Step 10/10] I1008 00:21:51.369863 23544 cpu.cpp:103] Updated > 'cpu.shares' to 1024 (cpus 1) for container > 42589936-56b2-4e41-86d8-447bfaba4666 > [00:21:51]W: [Step 10/10] I1008 00:21:51.370384 23545 > containerizer.cpp:1443] Launching 'mesos-containerizer' with flags > '--command="{"shell":true,"value":"read key <&30"}" --help="false" > --pipe_read="30" --pipe_write="34" > --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" > --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666" > --unshare_namespace_mnt="false" > --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0"' > [00:21:51]W: [Step 10/10] I1008 00:21:51.370483 23544 > linux_launcher.cpp:421] Launching container > 42589936-56b2-4e41-86d8-447bfaba4666 and cloning with namespaces CLONE_NEWNS > | CLONE_NEWPID > [00:21:51]W: [Step 10/10] I1008 00:21:51.374867 23545 > containerizer.cpp:1480] Checkpointing container's forked pid 14139 to > '/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_gzjeKG/meta/slaves/frameworks/executors/executor/runs/42589936-56b2-4e41-86d8-447bfaba4666/pids/forked.pid' > [00:21:51]W: [Step 10/10] I1008 00:21:51.376519 23551 > containerizer.cpp:1648] Starting nested container > 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8 > [00:21:51]W: [Step 10/10] I1008 00:21:51.377296 23549 > containerizer.cpp:1443] Launching 'mesos-containerizer' with flags > '--command="{"shell":true,"value":"sleep 1000"}" --help="false" > --pipe_read="30" --pipe_write="34" > --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" > --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8" > --unshare_namespace_mnt="false" > --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"' > [00:21:51]W: [Step 10/10] I1008 00:21:51.377424 23548 > linux_launcher.cpp:421] Launching nested container > 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8 and > cloning with namespaces CLONE_NEWNS | CLONE_NEWPID > [00:21:51] : [Step 10/10] Executing pre-exec command >
[jira] [Commented] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.
[ https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830960#comment-15830960 ] Adam B commented on MESOS-6357: --- No progress in over a month. Dropping from the 1.2 release until somebody updates otherwise. > `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8. > > > Key: MESOS-6357 > URL: https://issues.apache.org/jira/browse/MESOS-6357 > Project: Mesos > Issue Type: Bug > Components: tests >Affects Versions: 1.1.0 > Environment: Debian 8 with SSL enabled >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: flaky-test > > {noformat} > [00:21:51] : [Step 10/10] [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit > [00:21:51]W: [Step 10/10] I1008 00:21:51.357839 23530 > containerizer.cpp:202] Using isolation: > cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image > [00:21:51]W: [Step 10/10] I1008 00:21:51.361143 23530 > linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [00:21:51]W: [Step 10/10] I1008 00:21:51.366930 23547 > containerizer.cpp:557] Recovering containerizer > [00:21:51]W: [Step 10/10] I1008 00:21:51.367962 23551 provisioner.cpp:253] > Provisioner recovery complete > [00:21:51]W: [Step 10/10] I1008 00:21:51.368253 23549 > containerizer.cpp:954] Starting container > 42589936-56b2-4e41-86d8-447bfaba4666 for executor 'executor' of framework > [00:21:51]W: [Step 10/10] I1008 00:21:51.368577 23548 cgroups.cpp:404] > Creating cgroup at > '/sys/fs/cgroup/cpu,cpuacct/mesos_test_458f8018-67e7-4cc6-8126-a535974db35d/42589936-56b2-4e41-86d8-447bfaba4666' > for container 42589936-56b2-4e41-86d8-447bfaba4666 > [00:21:51]W: [Step 10/10] I1008 00:21:51.369863 23544 cpu.cpp:103] Updated > 'cpu.shares' to 1024 (cpus 1) for container > 42589936-56b2-4e41-86d8-447bfaba4666 > [00:21:51]W: [Step 10/10] I1008 00:21:51.370384 23545 > containerizer.cpp:1443] Launching 'mesos-containerizer' with flags > '--command="{"shell":true,"value":"read key <&30"}" --help="false" > --pipe_read="30" --pipe_write="34" > --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" > --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666" > --unshare_namespace_mnt="false" > --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0"' > [00:21:51]W: [Step 10/10] I1008 00:21:51.370483 23544 > linux_launcher.cpp:421] Launching container > 42589936-56b2-4e41-86d8-447bfaba4666 and cloning with namespaces CLONE_NEWNS > | CLONE_NEWPID > [00:21:51]W: [Step 10/10] I1008 00:21:51.374867 23545 > containerizer.cpp:1480] Checkpointing container's forked pid 14139 to > '/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_gzjeKG/meta/slaves/frameworks/executors/executor/runs/42589936-56b2-4e41-86d8-447bfaba4666/pids/forked.pid' > [00:21:51]W: [Step 10/10] I1008 00:21:51.376519 23551 > containerizer.cpp:1648] Starting nested container > 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8 > [00:21:51]W: [Step 10/10] I1008 00:21:51.377296 23549 > containerizer.cpp:1443] Launching 'mesos-containerizer' with flags > '--command="{"shell":true,"value":"sleep 1000"}" --help="false" > --pipe_read="30" --pipe_write="34" > --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount > -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" > --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8" > --unshare_namespace_mnt="false" > --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"' > [00:21:51]W: [Step 10/10] I1008 00:21:51.377424 23548 > linux_launcher.cpp:421] Launching nested container > 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8 and > cloning with namespaces CLONE_NEWNS | CLONE_NEWPID > [00:21:51] : [Step 10/10] Executing pre-exec command >
[jira] [Commented] (MESOS-6339) Support docker registry that requires basic auth.
[ https://issues.apache.org/jira/browse/MESOS-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830956#comment-15830956 ] Adam B commented on MESOS-6339: --- If this isn't In Progress yet, I doubt it'll land in time for 1.2, right? Shall we drop/defer it? > Support docker registry that requires basic auth. > - > > Key: MESOS-6339 > URL: https://issues.apache.org/jira/browse/MESOS-6339 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu >Assignee: Gilbert Song > > Currently, we assume Bearer auth (in Mesos containerizer) because it's what > docker hub uses. We also need to support basic auth for some private registry > that people deploys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6506) Show framework info in /state and /frameworks for frameworks that have orphan tasks
[ https://issues.apache.org/jira/browse/MESOS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6506: -- Target Version/s: (was: 1.2.0) > Show framework info in /state and /frameworks for frameworks that have orphan > tasks > --- > > Key: MESOS-6506 > URL: https://issues.apache.org/jira/browse/MESOS-6506 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Vinod Kone >Assignee: Vinod Kone > > Since Mesos 1.0, the master has access to FrameworkInfo of frameworks that > have orphan tasks. So we could expose this information in /state and > /frameworks endpoints. Note that this information is already present in the > v1 operator API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`
[ https://issues.apache.org/jira/browse/MESOS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830954#comment-15830954 ] Kevin Klues commented on MESOS-6553: This one is already completed too. > Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` > to launcher->fork()` > > > Key: MESOS-6553 > URL: https://issues.apache.org/jira/browse/MESOS-6553 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > Labels: tech-debt > > Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of > our isolators and extract information from them, which we pass one by one to > our {{launcher->fork()}} call in separate parameters. > Instead, we should construct a new {{ContainerLaunchInfo}} which is the > concatenation of the ones returned by each isolator, and pass this new one > down to {{launcher->fork()}} instead of building up individual arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6506) Show framework info in /state and /frameworks for frameworks that have orphan tasks
[ https://issues.apache.org/jira/browse/MESOS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830953#comment-15830953 ] Adam B commented on MESOS-6506: --- No progress in over a month. Dropping from the 1.2 release until somebody updates otherwise. > Show framework info in /state and /frameworks for frameworks that have orphan > tasks > --- > > Key: MESOS-6506 > URL: https://issues.apache.org/jira/browse/MESOS-6506 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Vinod Kone >Assignee: Vinod Kone > > Since Mesos 1.0, the master has access to FrameworkInfo of frameworks that > have orphan tasks. So we could expose this information in /state and > /frameworks endpoints. Note that this information is already present in the > v1 operator API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6843) Fetcher should not assume stdout/stderr in the sandbox.
[ https://issues.apache.org/jira/browse/MESOS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6843: -- Target Version/s: 1.3.0 (was: 1.2.0) > Fetcher should not assume stdout/stderr in the sandbox. > --- > > Key: MESOS-6843 > URL: https://issues.apache.org/jira/browse/MESOS-6843 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.0.2, 1.1.0 >Reporter: Jie Yu >Priority: Critical > Labels: mesosphere > > If container logger is used, this assumption might not be true. For instance, > a journald logger might redirect all task logs to journald. So in theory, the > fetcher log should go to journald as well, rather than writing to > sandbox/stdout and sandbox/stderr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6035) Add non-recursive version of cgroups::get
[ https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6035: -- Target Version/s: (was: 1.2.0) > Add non-recursive version of cgroups::get > - > > Key: MESOS-6035 > URL: https://issues.apache.org/jira/browse/MESOS-6035 > Project: Mesos > Issue Type: Improvement > Components: cgroups >Reporter: haosdent >Assignee: haosdent >Priority: Minor > > In some cases, we only need to get the top level cgroups instead of to get > all cgroups recursively. Add a non-recursive version could help to avoid > unnecessary paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6540: -- Target Version/s: (was: 1.2.0) > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > In order to do this properly, we should pull the "init" process out of the > container and update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6540) Pass the forked pid from `containerizer launch` to the agent and checkpoint it.
[ https://issues.apache.org/jira/browse/MESOS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830935#comment-15830935 ] Adam B commented on MESOS-6540: --- No progress in a month. Dropping from the 1.2 release until somebody updates otherwise. > Pass the forked pid from `containerizer launch` to the agent and checkpoint > it. > --- > > Key: MESOS-6540 > URL: https://issues.apache.org/jira/browse/MESOS-6540 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: debugging, mesosphere > > Right now the agent only knows about the pid of the "init" process forked by > {{launcher->fork()}}. However, in order to properly enter the namespaces of a > task for a nested container, we actually need the pid of the process that > gets launched by the {{containerizer launch}} binary. > Using this pid, isolators can properly enter the namespaces of the actual > *task* or *executor* launched by the {{containerizer launch}} binary instead > of just the namespaces of the "init" process (which may be different). > In order to do this properly, we should pull the "init" process out of the > container and update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6639) Update 'io::redirect()' to take an optional vector of callback hooks.
[ https://issues.apache.org/jira/browse/MESOS-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830934#comment-15830934 ] Adam B commented on MESOS-6639: --- No progress in a month. Dropping from the 1.2 release until somebody updates otherwise. > Update 'io::redirect()' to take an optional vector of callback hooks. > - > > Key: MESOS-6639 > URL: https://issues.apache.org/jira/browse/MESOS-6639 > Project: Mesos > Issue Type: Improvement >Reporter: Kevin Klues >Assignee: Kevin Klues > > These callback hooks should be invoked before passing any data read from > the 'from' file descriptor on to the 'to' file descriptor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6667) Update vendored ZooKeeper to 3.4.9
[ https://issues.apache.org/jira/browse/MESOS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6667: -- Target Version/s: 1.3.0 (was: 1.2.0) > Update vendored ZooKeeper to 3.4.9 > -- > > Key: MESOS-6667 > URL: https://issues.apache.org/jira/browse/MESOS-6667 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway > Labels: mesosphere > > 3.4.9 has a few notable fixes for the C client library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6553) Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` to launcher->fork()`
[ https://issues.apache.org/jira/browse/MESOS-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6553: -- Target Version/s: (was: 1.2.0) > Update `MesosContainerizerProcess::_launch()` to pass `ContainerLaunchInfo` > to launcher->fork()` > > > Key: MESOS-6553 > URL: https://issues.apache.org/jira/browse/MESOS-6553 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > Labels: tech-debt > > Currently, we receive a bunch of {{ContainerLaunchInfo}} structs from each of > our isolators and extract information from them, which we pass one by one to > our {{launcher->fork()}} call in separate parameters. > Instead, we should construct a new {{ContainerLaunchInfo}} which is the > concatenation of the ones returned by each isolator, and pass this new one > down to {{launcher->fork()}} instead of building up individual arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6542) Pull the current "init" process for a container out of the container.
[ https://issues.apache.org/jira/browse/MESOS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6542: -- Target Version/s: (was: 1.2.0) > Pull the current "init" process for a container out of the container. > - > > Key: MESOS-6542 > URL: https://issues.apache.org/jira/browse/MESOS-6542 > Project: Mesos > Issue Type: Task >Reporter: Kevin Klues > > Currently the mesos agent is in control of the "init" process launched inside > of a container. However, in order to properly support things like > systemd-in-a-container, we need to allow users to control the init process > that ultimately gets launched. > We will still need to fork a process equivalent to the current "init" > process, but it shouldn't be placed inside the container itself (instead, it > should be the parent process of whatever init process it is directed to > launch). > In order to do this properly, we will need to rework some of the logic in > {{launcher->fork()}} to allow this new parent process to do the namespace > entering / cloning instead of {{launcher->fork()}} itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.
[ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830923#comment-15830923 ] Adam B commented on MESOS-6743: --- No progress in a month. Dropping from the 1.2 release until somebody updates otherwise. > Docker executor hangs forever if `docker stop` fails. > - > > Key: MESOS-6743 > URL: https://issues.apache.org/jira/browse/MESOS-6743 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.1, 1.1.0 >Reporter: Alexander Rukletsov > Labels: mesosphere > > If {{docker stop}} finishes with an error status, the executor should catch > this and react instead of indefinitely waiting for {{reaped}} to return. > An interesting question is _how_ to react. Here are possible solutions. > 1. Retry {{docker stop}}. In this case it is unclear how many times to retry > and what to do if {{docker stop}} continues to fail. > 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. > However, in this case it is unclear what status updates we should send: > {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill > a task? or set a specific reason in {{TASK_KILLING}}? > 3. Clean up and exit. In this case we should make sure the task container is > killed or notify the framework and the operator that the container may still > be running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6683) Return error from recordio::Reader if data is still buffered when EOF reached.
[ https://issues.apache.org/jira/browse/MESOS-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830925#comment-15830925 ] Adam B commented on MESOS-6683: --- No progress in a month. Dropping from the 1.2 release until somebody updates otherwise. > Return error from recordio::Reader if data is still buffered when EOF reached. > -- > > Key: MESOS-6683 > URL: https://issues.apache.org/jira/browse/MESOS-6683 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Anand Mazumdar > Labels: bug, mesosphere > > Right now, whenever EOF is reached a {{None()}} is returned to indicate that > no more records will be read. > However, we should only return {{None()}} if we reach EOF and there are no > bytes in the readers internal data buffer. If there are bytes in the buffer, > that indicates that a *partial* record has been read, but EOF was reached > before reading a full record. We should return an error in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6622) NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage is flaky
[ https://issues.apache.org/jira/browse/MESOS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6622: -- Target Version/s: (was: 1.2.0) > NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage is flaky > -- > > Key: MESOS-6622 > URL: https://issues.apache.org/jira/browse/MESOS-6622 > Project: Mesos > Issue Type: Bug > Components: flaky, tests >Affects Versions: 1.1.0 >Reporter: Joseph Wu >Assignee: Kevin Klues >Priority: Minor > Labels: mesosphere, newbie > Attachments: gpu-test.log > > > This test occasionally times out after one minute: > {code} > I1122 02:07:25.721348 2328 slave.cpp:4263] Received ping from > slave-observer(563)@172.16.10.39:45772 > I1122 02:07:25.728559 2324 slave.cpp:5122] Terminating executor > ''b5a3a115-27da-4b81-902e-b99602f902a6' of framework > 42a4cb0e-aea9-4b9d-8bab-3279ee5a7b8b-' because it did not register within > 1mins > I1122 02:07:25.728667 2330 containerizer.cpp:2038] Destroying container > b4711187-157c-421e-a6d9-9fa32a6e263c in PROVISIONING state > I1122 02:07:25.728734 2330 containerizer.cpp:2093] Waiting for the > provisioner to complete provisioning before destroying container > b4711187-157c-421e-a6d9-9fa32a6e263c > {code} > The test itself has a future that waits for 2 minutes for the executor to > start up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6641) Remove deprecated hooks from our module API.
[ https://issues.apache.org/jira/browse/MESOS-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6641: -- Target Version/s: 1.3.0 (was: 1.2.0) > Remove deprecated hooks from our module API. > > > Key: MESOS-6641 > URL: https://issues.apache.org/jira/browse/MESOS-6641 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Till Toenshoff >Priority: Minor > Labels: deprecation, hooks, tech-debt > > By now we have at least one deprecated hook in our modules API which is > {{slavePreLaunchDockerHook}}. > There is a new one coming in now which is deprecating > {{slavePreLaunchDockerEnvironmentDecorator}}. > We need to actually remove those deprecations while making the community > aware - this ticket is meant for tracking this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6827) Fix the order in which "self.hpp" is included in "self.cpp".
[ https://issues.apache.org/jira/browse/MESOS-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6827: -- Target Version/s: (was: 1.2.0) > Fix the order in which "self.hpp" is included in "self.cpp". > > > Key: MESOS-6827 > URL: https://issues.apache.org/jira/browse/MESOS-6827 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Priority: Minor > Labels: newbie > > According to our > [styleguide|https://github.com/apache/mesos/blob/master/docs/c%2B%2B-style-guide.md#order-of-includes], > each {{.cpp}} file should include the related {{.hpp}} first to ensure that > a header file always includes all symbols it requires. However, our codebase > does not follow this rule strictly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6959) Separate the mesos-containerizer binary into a static binary, which only depends on stout
[ https://issues.apache.org/jira/browse/MESOS-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-6959: - Component/s: (was: ke) > Separate the mesos-containerizer binary into a static binary, which only > depends on stout > - > > Key: MESOS-6959 > URL: https://issues.apache.org/jira/browse/MESOS-6959 > Project: Mesos > Issue Type: Task > Components: cmake >Reporter: Joseph Wu > Labels: cmake, mesosphere, microsoft > > The {{mesos-containerizer}} binary currently has [three > commands|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/main.cpp#L46-L48]: > * > [MesosContainerizerLaunch|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/launch.cpp] > * > [MesosContainerizerMount|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/mount.cpp] > * > [NetworkCniIsolatorSetup|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1776-L1997] > These commands are all heavily dependent on stout, and have no need to be > linked to libprocess. In fact, adding an erroneous call to > {{process::initialize}} (either explicitly, or by accidentally using a > libprocess method) will break {{mesos-containerizer}} can cause several Mesos > containerizer tests to fail. (The tasks fail to launch, saying {{Failed to > synchronize with agent (it's probably exited)}}). > Because this binary only depends on stout, we can separate it from the other > source files and make this a static binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6959) Separate the mesos-containerizer binary into a static binary, which only depends on stout
Joseph Wu created MESOS-6959: Summary: Separate the mesos-containerizer binary into a static binary, which only depends on stout Key: MESOS-6959 URL: https://issues.apache.org/jira/browse/MESOS-6959 Project: Mesos Issue Type: Task Components: ke, cmake Reporter: Joseph Wu The {{mesos-containerizer}} binary currently has [three commands|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/main.cpp#L46-L48]: * [MesosContainerizerLaunch|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/launch.cpp] * [MesosContainerizerMount|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/mount.cpp] * [NetworkCniIsolatorSetup|https://github.com/apache/mesos/blob/6cf3a94a52e87a593c9cba373bf433cfc4178639/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1776-L1997] These commands are all heavily dependent on stout, and have no need to be linked to libprocess. In fact, adding an erroneous call to {{process::initialize}} (either explicitly, or by accidentally using a libprocess method) will break {{mesos-containerizer}} can cause several Mesos containerizer tests to fail. (The tasks fail to launch, saying {{Failed to synchronize with agent (it's probably exited)}}). Because this binary only depends on stout, we can separate it from the other source files and make this a static binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3542) Separate libmesos into compiling from many binaries.
[ https://issues.apache.org/jira/browse/MESOS-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-3542: - Epic Name: lib-breakdown > Separate libmesos into compiling from many binaries. > > > Key: MESOS-3542 > URL: https://issues.apache.org/jira/browse/MESOS-3542 > Project: Mesos > Issue Type: Epic > Components: cmake >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: cmake, mesosphere, microsoft, windows-mvp > > Historically libmesos is built as a huge monolithic binary. Another idea > would be to build it from a bunch of smaller libraries (_e.g._, libagent, > _etc_.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3542) Separate libmesos into compiling from many binaries.
[ https://issues.apache.org/jira/browse/MESOS-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-3542: - Issue Type: Epic (was: Task) > Separate libmesos into compiling from many binaries. > > > Key: MESOS-3542 > URL: https://issues.apache.org/jira/browse/MESOS-3542 > Project: Mesos > Issue Type: Epic > Components: cmake >Reporter: Alex Clemmer >Assignee: Alex Clemmer > Labels: cmake, mesosphere, microsoft, windows-mvp > > Historically libmesos is built as a huge monolithic binary. Another idea > would be to build it from a bunch of smaller libraries (_e.g._, libagent, > _etc_.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6858) network/cni isolator generates incomplete resolv.conf
[ https://issues.apache.org/jira/browse/MESOS-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-6858: -- Assignee: James Peach > network/cni isolator generates incomplete resolv.conf > - > > Key: MESOS-6858 > URL: https://issues.apache.org/jira/browse/MESOS-6858 > Project: Mesos > Issue Type: Bug > Components: isolation, network >Reporter: James Peach >Assignee: James Peach > > The CNI [network > configuration|https://github.com/containernetworking/cni/blob/master/SPEC.md#network-configuration] > dictionary contains entries for the {{/etc/resolv.conf}}, {{nameservers}}, > {{domain}}, {{search}} and {{options}} fields. > In {{NetworkCniIsolatorProcess::_isolate()}}, the {{network/cni}} isolator > only emits the {{nameservers}} and ignores the remaining fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.
[ https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6654: Priority: Critical (was: Blocker) > Duplicate image layer ids may make the backend failed to mount rootfs. > -- > > Key: MESOS-6654 > URL: https://issues.apache.org/jira/browse/MESOS-6654 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Critical > Labels: aufs, backend, containerizer > > Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in > manifest, which may cause some backends unable to mount the rootfs (e.g., > 'aufs' backend). We should make sure that each layer path returned in > 'ImageInfo' is unique. > Here is an example manifest from 'mesosphere/inky': > {noformat} > [20:13:08]W: [Step 10/10]"name": "mesosphere/inky", > [20:13:08]W: [Step 10/10]"tag": "latest", > [20:13:08]W: [Step 10/10]"architecture": "amd64", > [20:13:08]W: [Step 10/10]"fsLayers": [ > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] } > [20:13:08]W: [Step 10/10]], > [20:13:08]W: [Step 10/10]"history": [ > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "v1Compatibility": > "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop) > ENTRYPOINT > [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "v1Compatibility": >
[jira] [Updated] (MESOS-6958) Support linux filesystem type detection.
[ https://issues.apache.org/jira/browse/MESOS-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6958: Priority: Critical (was: Blocker) > Support linux filesystem type detection. > > > Key: MESOS-6958 > URL: https://issues.apache.org/jira/browse/MESOS-6958 > Project: Mesos > Issue Type: Bug >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Critical > Labels: filesystem, linux > > We should support detecting a linux filesystem type (e.g., xfs, extfs) and > its filesystem id mapping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.
[ https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6653: Priority: Critical (was: Blocker) > Overlayfs backend may fail to mount the rootfs if both container image and > image volume are specified. > -- > > Key: MESOS-6653 > URL: https://issues.apache.org/jira/browse/MESOS-6653 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Critical > Labels: backend, containerizer, overlayfs > > Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting > arguments. However, if more than one image need to be provisioned (e.g., a > container image is specified while image volumes are specified for the same > container), the symlink .../backends/overlay/links would fail to be created > since it exists already. > Here is a simple log when we hard code overlayfs as our default backend: > {noformat} > [07:02:45] : [Step 10/10] [ RUN ] > Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0 > [07:02:46] : [Step 10/10] I1127 07:02:46.416021 2919 > containerizer.cpp:207] Using isolation: > filesystem/linux,volume/image,docker/runtime,network/cni > [07:02:46] : [Step 10/10] I1127 07:02:46.419312 2919 > linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [07:02:46] : [Step 10/10] E1127 07:02:46.425336 2919 shell.hpp:107] > Command 'hadoop version 2>&1' failed; this is the output: > [07:02:46] : [Step 10/10] sh: 1: hadoop: not found > [07:02:46] : [Step 10/10] I1127 07:02:46.425379 2919 fetcher.cpp:69] > Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to > create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was > either not found or exited with a non-zero exit status: 127 > [07:02:46] : [Step 10/10] I1127 07:02:46.425452 2919 local_puller.cpp:94] > Creating local puller with docker registry '/tmp/R6OUei/registry' > [07:02:46] : [Step 10/10] I1127 07:02:46.427258 2934 > containerizer.cpp:956] Starting container > 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of > framework > [07:02:46] : [Step 10/10] I1127 07:02:46.427592 2938 > metadata_manager.cpp:167] Looking for image 'test_image_rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.427774 2936 local_puller.cpp:147] > Untarring image 'test_image_rootfs' from > '/tmp/R6OUei/registry/test_image_rootfs.tar' to > '/tmp/R6OUei/store/staging/9krDz2' > [07:02:46] : [Step 10/10] I1127 07:02:46.512070 2933 local_puller.cpp:167] > The repositories JSON file for image 'test_image_rootfs' is > '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}' > [07:02:46] : [Step 10/10] I1127 07:02:46.512279 2933 local_puller.cpp:295] > Extracting layer tar ball > '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar > to rootfs > '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.617442 2937 > metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.617908 2938 provisioner.cpp:286] > Image layers: 1 > [07:02:46] : [Step 10/10] I1127 07:02:46.617925 2938 provisioner.cpp:296] > Should hit here > [07:02:46] : [Step 10/10] I1127 07:02:46.617949 2938 provisioner.cpp:315] > : bind > [07:02:46] : [Step 10/10] I1127 07:02:46.617959 2938 provisioner.cpp:315] > : overlay > [07:02:46] : [Step 10/10] I1127 07:02:46.617967 2938 provisioner.cpp:315] > : copy > [07:02:46] : [Step 10/10] I1127 07:02:46.617974 2938 provisioner.cpp:318] > Provisioning image rootfs > '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7' > for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend > [07:02:46] : [Step 10/10] I1127 07:02:46.618408 2936 overlay.cpp:175] > Created symlink > '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links' > -> '/tmp/DQ3blT' > [07:02:46] : [Step 10/10] I1127 07:02:46.618472 2936 overlay.cpp:203] > Provisioning image rootfs with overlayfs: >
[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.
[ https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6001: Priority: Critical (was: Blocker) > Aufs backend cannot support the image with numerous layers. > --- > > Key: MESOS-6001 > URL: https://issues.apache.org/jira/browse/MESOS-6001 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 14, Ubuntu 12 > Or any other os with aufs module >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Critical > Labels: aufs, backend, containerizer > > This issue was exposed in this unit test > `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually > specifying the `bind` backend. Most likely mounting the aufs with specific > options is limited by string length. > {noformat} > [20:13:07] : [Step 10/10] [ RUN ] > DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller > [20:13:07]W: [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] > Creating default 'local' authorizer > [20:13:07]W: [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] > Opened db in 8.148813ms > [20:13:07]W: [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] > Compacted db in 3.126629ms > [20:13:07]W: [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] > Created db iterator in 4410ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] > Seeked to beginning of db in 763ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] > Iterated through 0 keys in the db in 491ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [20:13:07]W: [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] > Starting replica recovery > [20:13:07]W: [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] > Replica is in EMPTY status > [20:13:07]W: [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > __req_res__(5852)@172.30.2.138:44256 > [20:13:07]W: [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] > Received a recover response from a replica in EMPTY status > [20:13:07]W: [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] > Updating replica status to STARTING > [20:13:07]W: [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] > Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) > started on 172.30.2.138:44256 > [20:13:07]W: [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags > at startup: --acls="" --agent_ping_timeout="15secs" > --agent_reregister_timeout="10mins" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --authenticate_http_readwrite="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" > --registry_strict="true" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs" > [20:13:07]W: [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] > Master only allowing authenticated frameworks to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] > Master only allowing authenticated agents to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] > Master only allowing authenticated HTTP frameworks to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] > Loading credentials for authentication from '/tmp/OZHDIQ/credentials' > [20:13:07]W: [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using > default 'crammd5' authenticator > [20:13:07]W: [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using > default 'basic' HTTP authenticator for realm 'mesos-master-readonly' > [20:13:07]W: [Step 10/10] I0805 20:13:07.629114 23432 http.cpp:883] Using > default 'basic' HTTP authenticator for realm
[jira] [Updated] (MESOS-6913) AgentAPIStreamingTest.AttachInputToNestedContainerSession fails on Mac OS.
[ https://issues.apache.org/jira/browse/MESOS-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6913: -- Fix Version/s: 1.3.0 > AgentAPIStreamingTest.AttachInputToNestedContainerSession fails on Mac OS. > -- > > Key: MESOS-6913 > URL: https://issues.apache.org/jira/browse/MESOS-6913 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: Mac OS 10.11.6 with Apple clang-703.0.31 >Reporter: Alexander Rukletsov >Assignee: Kevin Klues >Priority: Critical > Labels: mesosphere > > {noformat} > [ RUN ] > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 > make[3]: *** [check-local] Illegal instruction: 4 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6954) Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent
[ https://issues.apache.org/jira/browse/MESOS-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues reassigned MESOS-6954: -- Assignee: Kevin Klues > Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent > > > Key: MESOS-6954 > URL: https://issues.apache.org/jira/browse/MESOS-6954 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Kevin Klues >Priority: Blocker > Labels: debugging, mesosphere > > Attempting to run {{LAUNCH_NESTED_CONTAINER}} with a parent container that > was launched with the docker containerizer causes the agent to crash as > below. We should add a safeguard in the handler to fail gracefully instead. > {noformat} > I0119 21:41:42.438295 3281 http.cpp:304] HTTP POST for /slave(1)/api/v1 from > 10.0.7.194:46700 with User-Agent='python-requests/2.12.4' with > X-Forwarded-For='10.0.6.162' > I0119 21:41:42.441571 3281 http.cpp:465] Processing call > LAUNCH_NESTED_CONTAINER_SESSION > W0119 21:41:42.442286 3281 http.cpp:2251] Failed to launch nested container > 62a16556-9c3b-48f2-aa1e-ba1d70093637.09a9d3b0-a245-4aa1-94f1-d10a13526b9b: > Unsupported > F0119 21:41:42.442371 3282 docker.cpp:2013] Check failed: > !containerId.has_parent() > *** Check failure stack trace: *** > @ 0x7f539aca01ad google::LogMessage::Fail() > @ 0x7f539aca1fdd google::LogMessage::SendToLog() > @ 0x7f539ac9fd9c google::LogMessage::Flush() > @ 0x7f539aca28d9 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f539a46e2cd > mesos::internal::slave::DockerContainerizerProcess::destroy() > @ 0x7f539a48a8a7 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIbN5mesos8internal5slave26DockerContainerizerProcessERKNS5_11ContainerIDEbS9_bEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSG_FSE_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f539ac14ca1 process::ProcessManager::resume() > @ 0x7f539ac1dba7 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f53990a5d73 (unknown) > @ 0x7f5398ba652c (unknown) > @ 0x7f53988e41dd (unknown) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
[ https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6780: --- Target Version/s: (was: 1.2.0) > ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably > -- > > Key: MESOS-6780 > URL: https://issues.apache.org/jira/browse/MESOS-6780 > Project: Mesos > Issue Type: Bug > Environment: Mac OS 10.12, clang version 4.0.0 > (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) > (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), > libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46 >Reporter: Benjamin Bannier >Assignee: Kevin Klues >Priority: Critical > Labels: mesosphere > Attachments: attach_container_input_no_ssl.log > > > The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both > {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized > build. > {code} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ContentType/AgentAPIStreamingTest > [ RUN ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0 > I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' > authorizer > I1212 17:11:12.393844 17362944 master.cpp:380] Master > c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on > 172.18.8.114:51059 > I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master" > --zk_session_timeout="10secs" > I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing > authenticated frameworks to register > I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing > authenticated agents to register > I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials' > I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' > authenticator > I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL > I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled > I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master! > I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar > I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the > registry (0B) in 4.131072ms > I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations in > 27us; attempting to update the registry > I1212 17:11:12.421799 14143488 registrar.cpp:506] Successfully updated the > registry in 4.10496ms > I1212 17:11:12.421835 14143488 registrar.cpp:392] Successfully recovered > registrar > I1212 17:11:12.421998 17362944 master.cpp:1684] Recovered 0 agents from the > registry (136B); allowing 10mins for agents to re-register > I1212 17:11:12.422780 3971208128 containerizer.cpp:220] Using isolation: >
[jira] [Comment Edited] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
[ https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830838#comment-15830838 ] Kevin Klues edited comment on MESOS-6780 at 1/19/17 11:44 PM: -- I'm changing this bug to critical rather than blocker for 1.2 because: 1) I'm 99% percent positive this is a test bug, not an actual API bug 2) If it is an API bug, it should only affect top-level containers launched with a {{tty}} and then to attached to via the {{ATTACH_CONTAINER_OUTPUT}} and {{ATTACH_CONTAINER_INPUT}} agent API calls. We don't have any external tooling that exercises these paths at the moment, so these APIs will mostly go unused in this release. was (Author: klueska): I'm changing this bug to critical rather than blocker for 1.2 because: 1) I'm 99% percent positive this is a test bug, not an actual API bug 2) If it is an API bug, it should only affect top-level containers launched with a {{tty}} and then to attached to via the {{ATTACH_CONTAINER_OUTPUT}} and {[ATTACH_CONTAINER_INPUT}} agent API calls. We don't have any external tooling that exercises these paths at the moment, so these APIs will mostly go unused in this release. > ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably > -- > > Key: MESOS-6780 > URL: https://issues.apache.org/jira/browse/MESOS-6780 > Project: Mesos > Issue Type: Bug > Environment: Mac OS 10.12, clang version 4.0.0 > (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) > (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), > libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46 >Reporter: Benjamin Bannier >Assignee: Kevin Klues >Priority: Critical > Labels: mesosphere > Attachments: attach_container_input_no_ssl.log > > > The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both > {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized > build. > {code} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ContentType/AgentAPIStreamingTest > [ RUN ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0 > I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' > authorizer > I1212 17:11:12.393844 17362944 master.cpp:380] Master > c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on > 172.18.8.114:51059 > I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master" > --zk_session_timeout="10secs" > I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing > authenticated frameworks to register > I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing > authenticated agents to register > I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials' > I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' > authenticator > I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL > I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm
[jira] [Updated] (MESOS-6780) ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably
[ https://issues.apache.org/jira/browse/MESOS-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6780: --- Priority: Critical (was: Blocker) I'm changing this bug to critical rather than blocker for 1.2 because: 1) I'm 99% percent positive this is a test bug, not an actual API bug 2) If it is an API bug, it should only affect top-level containers launched with a {{tty}} and then to attached to via the {{ATTACH_CONTAINER_OUTPUT}} and {[ATTACH_CONTAINER_INPUT}} agent API calls. We don't have any external tooling that exercises these paths at the moment, so these APIs will mostly go unused in this release. > ContentType/AgentAPIStreamingTest.AttachContainerInput test fails reliably > -- > > Key: MESOS-6780 > URL: https://issues.apache.org/jira/browse/MESOS-6780 > Project: Mesos > Issue Type: Bug > Environment: Mac OS 10.12, clang version 4.0.0 > (http://llvm.org/git/clang 88800602c0baafb8739cb838c2fa3f5fb6cc6968) > (http://llvm.org/git/llvm 25801f0f22e178343ee1eadfb4c6cc058628280e), > libc++-513447dbb91dd555ea08297dbee6a1ceb6abdc46 >Reporter: Benjamin Bannier >Assignee: Kevin Klues >Priority: Critical > Labels: mesosphere > Attachments: attach_container_input_no_ssl.log > > > The test {{ContentType/AgentAPIStreamingTest.AttachContainerInput}} (both > {{/0}} and {{/1}}) fail consistently for me in an SSL-enabled, optimized > build. > {code} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ContentType/AgentAPIStreamingTest > [ RUN ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0 > I1212 17:11:12.371175 3971208128 cluster.cpp:160] Creating default 'local' > authorizer > I1212 17:11:12.393844 17362944 master.cpp:380] Master > c752777c-d947-4a86-b382-643463866472 (172.18.8.114) started on > 172.18.8.114:51059 > I1212 17:11:12.393899 17362944 master.cpp:382] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/master" > --zk_session_timeout="10secs" > I1212 17:11:12.394670 17362944 master.cpp:432] Master only allowing > authenticated frameworks to register > I1212 17:11:12.394682 17362944 master.cpp:446] Master only allowing > authenticated agents to register > I1212 17:11:12.394691 17362944 master.cpp:459] Master only allowing > authenticated HTTP frameworks to register > I1212 17:11:12.394701 17362944 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/6t/yp_xgc8d6k32rpp0bsbfqm9mgp/T/F46yYV/credentials' > I1212 17:11:12.394959 17362944 master.cpp:504] Using default 'crammd5' > authenticator > I1212 17:11:12.394996 17362944 authenticator.cpp:519] Initializing server SASL > I1212 17:11:12.411406 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1212 17:11:12.411571 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1212 17:11:12.411682 17362944 http.cpp:922] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1212 17:11:12.411775 17362944 master.cpp:584] Authorization enabled > I1212 17:11:12.413318 16289792 master.cpp:2045] Elected as the leading master! > I1212 17:11:12.413377 16289792 master.cpp:1568] Recovering from registrar > I1212 17:11:12.417582 14143488 registrar.cpp:362] Successfully fetched the > registry (0B) in 4.131072ms > I1212 17:11:12.417667 14143488 registrar.cpp:461] Applied 1 operations
[jira] [Updated] (MESOS-6948) AgentAPITest.LaunchNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6948: --- Priority: Critical (was: Blocker) I'm moving this to a Critical bug rather than a blocker because: 1) It only happens very rarely ([~greggomann] can get it to trigger periodically inside his CentOS vagrant image, but no where else) 2) We haven't seen it manifest in practice with the CLI tool we built around these APIs (e.g. I have no problem doing a quick `dcos task exec printf output` and getting the output back). 3) Even if there is an error in the wild, it's very rare and only happens at connection time. After the connection is established, thing should run smoothly. > AgentAPITest.LaunchNestedContainerSession is flaky > -- > > Key: MESOS-6948 > URL: https://issues.apache.org/jira/browse/MESOS-6948 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 7 VM, libevent and SSL enabled >Reporter: Greg Mann >Assignee: Kevin Klues >Priority: Critical > Labels: debugging, tests > Attachments: AgentAPITest.LaunchNestedContainerSession.txt > > > This was observed in a CentOS 7 VM, with libevent and SSL enabled: > {code} > I0118 22:17:23.528846 2887 http.cpp:464] Processing call > LAUNCH_NESTED_CONTAINER_SESSION > I0118 22:17:23.530452 2887 containerizer.cpp:1807] Starting nested container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.532265 2887 containerizer.cpp:1831] Trying to chown > '/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e' > to user 'vagrant' > I0118 22:17:23.535213 2887 switchboard.cpp:570] Launching > 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" > --help="false" > --socket_address="/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430" > --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" > --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" > --wait_for_connection="true"' for container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.537210 2887 switchboard.cpp:600] Created I/O switchboard > server (pid: 3335) listening on socket file > '/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430' for > container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.543665 2887 containerizer.cpp:1540] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"command":{"shell":true,"value":"printf output && printf > error > 1>&2"},"environment":{},"err":{"fd":16,"type":"FD"},"in":{"fd":11,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"vagrant"}" > --pipe_read="12" --pipe_write="13" > --runtime_directory="/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_QVZGrY/containers/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e" > --unshare_namespace_mnt="false"' > I0118 22:17:23.556032 2887 launcher.cpp:133] Forked child with pid '3337' > for container > '492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e' > I0118 22:17:23.563900 2887 fetcher.cpp:349] Starting to fetch URIs for > container: > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e, > directory: > /tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.962441 2887 containerizer.cpp:2481] Container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e has > exited > I0118 22:17:23.962484 2887 containerizer.cpp:2118] Destroying container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e in > RUNNING state > I0118 22:17:23.962715 2887 launcher.cpp:149] Asked to destroy container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.977562 2887 process.cpp:3733] Failed to process request for > '/slave(69)/api/v1': Container has or is being destroyed > W0118 22:17:23.978216 2887 http.cpp:2734] Failed to attach to nested > container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e: > Container has or is being destroyed > I0118 22:17:23.978330 2887 process.cpp:1435] Returning '500 Internal
[jira] [Updated] (MESOS-5931) Support auto backend in Unified Containerizer.
[ https://issues.apache.org/jira/browse/MESOS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-5931: Story Points: 8 (was: 3) > Support auto backend in Unified Containerizer. > -- > > Key: MESOS-5931 > URL: https://issues.apache.org/jira/browse/MESOS-5931 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: backend, containerizer, mesosphere > > Currently in Unified Containerizer, copy backend will be selected by default. > This is not ideal, especially for production environment. It would take a > long time to prepare an huge container image to copy it from the store to > provisioner. > Ideally, we should support `auto backend`, which would > automatically/intelligently select the best/optimal backend for image > provisioner if user does not specify one from the agent flag. > We should have a logic design first in this ticket, to determine how we want > to choose the right backend (e.g., overlayfs or aufs should be preferred if > available from the kernel). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.
[ https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6653: Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50 (was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49) > Overlayfs backend may fail to mount the rootfs if both container image and > image volume are specified. > -- > > Key: MESOS-6653 > URL: https://issues.apache.org/jira/browse/MESOS-6653 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: backend, containerizer, overlayfs > > Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting > arguments. However, if more than one image need to be provisioned (e.g., a > container image is specified while image volumes are specified for the same > container), the symlink .../backends/overlay/links would fail to be created > since it exists already. > Here is a simple log when we hard code overlayfs as our default backend: > {noformat} > [07:02:45] : [Step 10/10] [ RUN ] > Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0 > [07:02:46] : [Step 10/10] I1127 07:02:46.416021 2919 > containerizer.cpp:207] Using isolation: > filesystem/linux,volume/image,docker/runtime,network/cni > [07:02:46] : [Step 10/10] I1127 07:02:46.419312 2919 > linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [07:02:46] : [Step 10/10] E1127 07:02:46.425336 2919 shell.hpp:107] > Command 'hadoop version 2>&1' failed; this is the output: > [07:02:46] : [Step 10/10] sh: 1: hadoop: not found > [07:02:46] : [Step 10/10] I1127 07:02:46.425379 2919 fetcher.cpp:69] > Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to > create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was > either not found or exited with a non-zero exit status: 127 > [07:02:46] : [Step 10/10] I1127 07:02:46.425452 2919 local_puller.cpp:94] > Creating local puller with docker registry '/tmp/R6OUei/registry' > [07:02:46] : [Step 10/10] I1127 07:02:46.427258 2934 > containerizer.cpp:956] Starting container > 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of > framework > [07:02:46] : [Step 10/10] I1127 07:02:46.427592 2938 > metadata_manager.cpp:167] Looking for image 'test_image_rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.427774 2936 local_puller.cpp:147] > Untarring image 'test_image_rootfs' from > '/tmp/R6OUei/registry/test_image_rootfs.tar' to > '/tmp/R6OUei/store/staging/9krDz2' > [07:02:46] : [Step 10/10] I1127 07:02:46.512070 2933 local_puller.cpp:167] > The repositories JSON file for image 'test_image_rootfs' is > '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}' > [07:02:46] : [Step 10/10] I1127 07:02:46.512279 2933 local_puller.cpp:295] > Extracting layer tar ball > '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar > to rootfs > '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.617442 2937 > metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.617908 2938 provisioner.cpp:286] > Image layers: 1 > [07:02:46] : [Step 10/10] I1127 07:02:46.617925 2938 provisioner.cpp:296] > Should hit here > [07:02:46] : [Step 10/10] I1127 07:02:46.617949 2938 provisioner.cpp:315] > : bind > [07:02:46] : [Step 10/10] I1127 07:02:46.617959 2938 provisioner.cpp:315] > : overlay > [07:02:46] : [Step 10/10] I1127 07:02:46.617967 2938 provisioner.cpp:315] > : copy > [07:02:46] : [Step 10/10] I1127 07:02:46.617974 2938 provisioner.cpp:318] > Provisioning image rootfs > '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7' > for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend > [07:02:46] : [Step 10/10] I1127 07:02:46.618408 2936 overlay.cpp:175] > Created symlink > '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links' > -> '/tmp/DQ3blT' > [07:02:46] : [Step 10/10] I1127 07:02:46.618472 2936 overlay.cpp:203] > Provisioning image rootfs with
[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.
[ https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6001: Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50 (was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49) > Aufs backend cannot support the image with numerous layers. > --- > > Key: MESOS-6001 > URL: https://issues.apache.org/jira/browse/MESOS-6001 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 14, Ubuntu 12 > Or any other os with aufs module >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: aufs, backend, containerizer > > This issue was exposed in this unit test > `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually > specifying the `bind` backend. Most likely mounting the aufs with specific > options is limited by string length. > {noformat} > [20:13:07] : [Step 10/10] [ RUN ] > DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller > [20:13:07]W: [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] > Creating default 'local' authorizer > [20:13:07]W: [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] > Opened db in 8.148813ms > [20:13:07]W: [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] > Compacted db in 3.126629ms > [20:13:07]W: [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] > Created db iterator in 4410ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] > Seeked to beginning of db in 763ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] > Iterated through 0 keys in the db in 491ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [20:13:07]W: [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] > Starting replica recovery > [20:13:07]W: [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] > Replica is in EMPTY status > [20:13:07]W: [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > __req_res__(5852)@172.30.2.138:44256 > [20:13:07]W: [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] > Received a recover response from a replica in EMPTY status > [20:13:07]W: [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] > Updating replica status to STARTING > [20:13:07]W: [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] > Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) > started on 172.30.2.138:44256 > [20:13:07]W: [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags > at startup: --acls="" --agent_ping_timeout="15secs" > --agent_reregister_timeout="10mins" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --authenticate_http_readwrite="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" > --registry_strict="true" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs" > [20:13:07]W: [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] > Master only allowing authenticated frameworks to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] > Master only allowing authenticated agents to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] > Master only allowing authenticated HTTP frameworks to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] > Loading credentials for authentication from '/tmp/OZHDIQ/credentials' > [20:13:07]W: [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using > default 'crammd5' authenticator > [20:13:07]W: [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using > default 'basic' HTTP authenticator for realm 'mesos-master-readonly' > [20:13:07]W:
[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.
[ https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6654: Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50 (was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49) > Duplicate image layer ids may make the backend failed to mount rootfs. > -- > > Key: MESOS-6654 > URL: https://issues.apache.org/jira/browse/MESOS-6654 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: aufs, backend, containerizer > > Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in > manifest, which may cause some backends unable to mount the rootfs (e.g., > 'aufs' backend). We should make sure that each layer path returned in > 'ImageInfo' is unique. > Here is an example manifest from 'mesosphere/inky': > {noformat} > [20:13:08]W: [Step 10/10]"name": "mesosphere/inky", > [20:13:08]W: [Step 10/10]"tag": "latest", > [20:13:08]W: [Step 10/10]"architecture": "amd64", > [20:13:08]W: [Step 10/10]"fsLayers": [ > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] } > [20:13:08]W: [Step 10/10]], > [20:13:08]W: [Step 10/10]"history": [ > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "v1Compatibility": > "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop) > ENTRYPOINT > [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "v1Compatibility": >
[jira] [Updated] (MESOS-6001) Aufs backend cannot support the image with numerous layers.
[ https://issues.apache.org/jira/browse/MESOS-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6001: Priority: Blocker (was: Major) > Aufs backend cannot support the image with numerous layers. > --- > > Key: MESOS-6001 > URL: https://issues.apache.org/jira/browse/MESOS-6001 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 14, Ubuntu 12 > Or any other os with aufs module >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: aufs, backend, containerizer > > This issue was exposed in this unit test > `ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller` by manually > specifying the `bind` backend. Most likely mounting the aufs with specific > options is limited by string length. > {noformat} > [20:13:07] : [Step 10/10] [ RUN ] > DockerRuntimeIsolatorTest.ROOT_CURL_INTERNET_DockerDefaultEntryptRegistryPuller > [20:13:07]W: [Step 10/10] I0805 20:13:07.615844 23416 cluster.cpp:155] > Creating default 'local' authorizer > [20:13:07]W: [Step 10/10] I0805 20:13:07.624106 23416 leveldb.cpp:174] > Opened db in 8.148813ms > [20:13:07]W: [Step 10/10] I0805 20:13:07.627252 23416 leveldb.cpp:181] > Compacted db in 3.126629ms > [20:13:07]W: [Step 10/10] I0805 20:13:07.627275 23416 leveldb.cpp:196] > Created db iterator in 4410ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627282 23416 leveldb.cpp:202] > Seeked to beginning of db in 763ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627287 23416 leveldb.cpp:271] > Iterated through 0 keys in the db in 491ns > [20:13:07]W: [Step 10/10] I0805 20:13:07.627301 23416 replica.cpp:776] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [20:13:07]W: [Step 10/10] I0805 20:13:07.627563 23434 recover.cpp:451] > Starting replica recovery > [20:13:07]W: [Step 10/10] I0805 20:13:07.627800 23437 recover.cpp:477] > Replica is in EMPTY status > [20:13:07]W: [Step 10/10] I0805 20:13:07.628113 23431 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > __req_res__(5852)@172.30.2.138:44256 > [20:13:07]W: [Step 10/10] I0805 20:13:07.628243 23430 recover.cpp:197] > Received a recover response from a replica in EMPTY status > [20:13:07]W: [Step 10/10] I0805 20:13:07.628365 23437 recover.cpp:568] > Updating replica status to STARTING > [20:13:07]W: [Step 10/10] I0805 20:13:07.628744 23432 master.cpp:375] > Master dd755a55-0dd1-4d2d-9a49-812a666015cb (ip-172-30-2-138.mesosphere.io) > started on 172.30.2.138:44256 > [20:13:07]W: [Step 10/10] I0805 20:13:07.628758 23432 master.cpp:377] Flags > at startup: --acls="" --agent_ping_timeout="15secs" > --agent_reregister_timeout="10mins" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --authenticate_http_readwrite="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/OZHDIQ/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" > --registry_strict="true" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/OZHDIQ/master" --zk_session_timeout="10secs" > [20:13:07]W: [Step 10/10] I0805 20:13:07.628893 23432 master.cpp:427] > Master only allowing authenticated frameworks to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628900 23432 master.cpp:441] > Master only allowing authenticated agents to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628902 23432 master.cpp:454] > Master only allowing authenticated HTTP frameworks to register > [20:13:07]W: [Step 10/10] I0805 20:13:07.628906 23432 credentials.hpp:37] > Loading credentials for authentication from '/tmp/OZHDIQ/credentials' > [20:13:07]W: [Step 10/10] I0805 20:13:07.628999 23432 master.cpp:499] Using > default 'crammd5' authenticator > [20:13:07]W: [Step 10/10] I0805 20:13:07.629041 23432 http.cpp:883] Using > default 'basic' HTTP authenticator for realm 'mesos-master-readonly' > [20:13:07]W: [Step 10/10] I0805 20:13:07.629114 23432 http.cpp:883] Using > default 'basic' HTTP authenticator for realm
[jira] [Updated] (MESOS-6654) Duplicate image layer ids may make the backend failed to mount rootfs.
[ https://issues.apache.org/jira/browse/MESOS-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6654: Priority: Blocker (was: Major) > Duplicate image layer ids may make the backend failed to mount rootfs. > -- > > Key: MESOS-6654 > URL: https://issues.apache.org/jira/browse/MESOS-6654 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: aufs, backend, containerizer > > Some images (e.g., 'mesosphere/inky') may contain duplicate layer ids in > manifest, which may cause some backends unable to mount the rootfs (e.g., > 'aufs' backend). We should make sure that each layer path returned in > 'ImageInfo' is unique. > Here is an example manifest from 'mesosphere/inky': > {noformat} > [20:13:08]W: [Step 10/10]"name": "mesosphere/inky", > [20:13:08]W: [Step 10/10]"tag": "latest", > [20:13:08]W: [Step 10/10]"architecture": "amd64", > [20:13:08]W: [Step 10/10]"fsLayers": [ > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:1db09adb5ddd7f1a07b6d585a7db747a51c7bd17418d47e91f901bdf420abd66" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "blobSum": > "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" > [20:13:08]W: [Step 10/10] } > [20:13:08]W: [Step 10/10]], > [20:13:08]W: [Step 10/10]"history": [ > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "v1Compatibility": > "{\"id\":\"e28617c6dd2169bfe2b10017dfaa04bd7183ff840c4f78ebe73fca2a89effeb6\",\"parent\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"created\":\"2014-08-15T00:31:36.407713553Z\",\"container\":\"5d55401ff99c7508c9d546926b711c78e3ccb36e39a848024b623b2aef4c2c06\",\"container_config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"/bin/sh\",\"-c\",\"#(nop) > ENTRYPOINT > [echo]\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"docker_version\":\"1.1.2\",\"author\":\"supp...@mesosphere.io\",\"config\":{\"Hostname\":\"f7d939e68b5a\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"PortSpecs\":null,\"ExposedPorts\":null,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"HOME=/\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Cmd\":[\"inky\"],\"Image\":\"be4ce2753831b8952a5b797cf45b2230e1befead6f5db0630bcb24a5f554255e\",\"Volumes\":null,\"VolumeDriver\":\"\",\"WorkingDir\":\"\",\"Entrypoint\":[\"echo\"],\"NetworkDisabled\":false,\"MacAddress\":\"\",\"OnBuild\":[],\"Labels\":null},\"architecture\":\"amd64\",\"os\":\"linux\",\"Size\":0}\n" > [20:13:08]W: [Step 10/10] }, > [20:13:08]W: [Step 10/10] { > [20:13:08]W: [Step 10/10] "v1Compatibility": >
[jira] [Updated] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.
[ https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6653: Priority: Blocker (was: Major) > Overlayfs backend may fail to mount the rootfs if both container image and > image volume are specified. > -- > > Key: MESOS-6653 > URL: https://issues.apache.org/jira/browse/MESOS-6653 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: backend, containerizer, overlayfs > > Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting > arguments. However, if more than one image need to be provisioned (e.g., a > container image is specified while image volumes are specified for the same > container), the symlink .../backends/overlay/links would fail to be created > since it exists already. > Here is a simple log when we hard code overlayfs as our default backend: > {noformat} > [07:02:45] : [Step 10/10] [ RUN ] > Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0 > [07:02:46] : [Step 10/10] I1127 07:02:46.416021 2919 > containerizer.cpp:207] Using isolation: > filesystem/linux,volume/image,docker/runtime,network/cni > [07:02:46] : [Step 10/10] I1127 07:02:46.419312 2919 > linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [07:02:46] : [Step 10/10] E1127 07:02:46.425336 2919 shell.hpp:107] > Command 'hadoop version 2>&1' failed; this is the output: > [07:02:46] : [Step 10/10] sh: 1: hadoop: not found > [07:02:46] : [Step 10/10] I1127 07:02:46.425379 2919 fetcher.cpp:69] > Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to > create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was > either not found or exited with a non-zero exit status: 127 > [07:02:46] : [Step 10/10] I1127 07:02:46.425452 2919 local_puller.cpp:94] > Creating local puller with docker registry '/tmp/R6OUei/registry' > [07:02:46] : [Step 10/10] I1127 07:02:46.427258 2934 > containerizer.cpp:956] Starting container > 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of > framework > [07:02:46] : [Step 10/10] I1127 07:02:46.427592 2938 > metadata_manager.cpp:167] Looking for image 'test_image_rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.427774 2936 local_puller.cpp:147] > Untarring image 'test_image_rootfs' from > '/tmp/R6OUei/registry/test_image_rootfs.tar' to > '/tmp/R6OUei/store/staging/9krDz2' > [07:02:46] : [Step 10/10] I1127 07:02:46.512070 2933 local_puller.cpp:167] > The repositories JSON file for image 'test_image_rootfs' is > '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}' > [07:02:46] : [Step 10/10] I1127 07:02:46.512279 2933 local_puller.cpp:295] > Extracting layer tar ball > '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar > to rootfs > '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.617442 2937 > metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs' > [07:02:46] : [Step 10/10] I1127 07:02:46.617908 2938 provisioner.cpp:286] > Image layers: 1 > [07:02:46] : [Step 10/10] I1127 07:02:46.617925 2938 provisioner.cpp:296] > Should hit here > [07:02:46] : [Step 10/10] I1127 07:02:46.617949 2938 provisioner.cpp:315] > : bind > [07:02:46] : [Step 10/10] I1127 07:02:46.617959 2938 provisioner.cpp:315] > : overlay > [07:02:46] : [Step 10/10] I1127 07:02:46.617967 2938 provisioner.cpp:315] > : copy > [07:02:46] : [Step 10/10] I1127 07:02:46.617974 2938 provisioner.cpp:318] > Provisioning image rootfs > '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/rootfses/c71e83d2-5dbe-4eb7-a2fc-b8cc826771f7' > for container 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 using overlay backend > [07:02:46] : [Step 10/10] I1127 07:02:46.618408 2936 overlay.cpp:175] > Created symlink > '/mnt/teamcity/temp/buildTmp/Nesting_VolumeImageIsolatorTest_ROOT_ImageInVolumeWithRootFilesystem_0_1fMo0c/provisioner/containers/9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330/backends/overlay/links' > -> '/tmp/DQ3blT' > [07:02:46] : [Step 10/10] I1127 07:02:46.618472 2936 overlay.cpp:203] > Provisioning image rootfs with overlayfs: >
[jira] [Updated] (MESOS-6504) Use 'geteuid()' for the root privileges check.
[ https://issues.apache.org/jira/browse/MESOS-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-6504: Sprint: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 50 (was: Mesosphere Sprint 47, Mesosphere Sprint 48, Mesosphere Sprint 49) > Use 'geteuid()' for the root privileges check. > -- > > Key: MESOS-6504 > URL: https://issues.apache.org/jira/browse/MESOS-6504 > Project: Mesos > Issue Type: Bug > Components: isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: backend, isolator, mesosphere, user > > Currently, parts of code in Mesos check the root privileges using os::user() > to compare to "root", which is not sufficient, since it compares the real > user. When people change the mesos binary by 'setuid root', the process may > not have the right permission to execute. > We should check the effective user id instead in our code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.
[ https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830821#comment-15830821 ] Adam B commented on MESOS-6904: --- Marking it "Critical" for 1.2 so it's not lost in the pool of "Major"s (default). We'll keep an eye on it, but won't hold the rc1 for it if all the real release-blockers are resolved. I can't imagine we'll be down to 0 Blockers before Tuesday. > Perform batching of allocations to reduce allocator queue backlogging. > -- > > Key: MESOS-6904 > URL: https://issues.apache.org/jira/browse/MESOS-6904 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Jacob Janco >Assignee: Jacob Janco >Priority: Critical > Labels: allocator > > Per MESOS-3157: > {quote} > Our deployment environments have a lot of churn, with many short-live > frameworks that often revive offers. Running the allocator takes a long time > (from seconds up to minutes). > In this situation, event-triggered allocation causes the event queue in the > allocator process to get very long, and the allocator effectively becomes > unresponsive (eg. a revive offers message takes too long to come to the head > of the queue). > {quote} > To remedy the above scenario, it is proposed to perform batching of the > enqueued allocation operations so that a single allocation operation can > satisfy N enqueued allocations. This should reduce the potential for > backlogging in the allocator. See the discussion > [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377] > in MESOS-3157. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6958) Support linux filesystem type detection.
Gilbert Song created MESOS-6958: --- Summary: Support linux filesystem type detection. Key: MESOS-6958 URL: https://issues.apache.org/jira/browse/MESOS-6958 Project: Mesos Issue Type: Bug Reporter: Gilbert Song Assignee: Gilbert Song Priority: Blocker We should support detecting a linux filesystem type (e.g., xfs, extfs) and its filesystem id mapping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.
[ https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6904: -- Priority: Critical (was: Major) > Perform batching of allocations to reduce allocator queue backlogging. > -- > > Key: MESOS-6904 > URL: https://issues.apache.org/jira/browse/MESOS-6904 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Jacob Janco >Assignee: Jacob Janco >Priority: Critical > Labels: allocator > > Per MESOS-3157: > {quote} > Our deployment environments have a lot of churn, with many short-live > frameworks that often revive offers. Running the allocator takes a long time > (from seconds up to minutes). > In this situation, event-triggered allocation causes the event queue in the > allocator process to get very long, and the allocator effectively becomes > unresponsive (eg. a revive offers message takes too long to come to the head > of the queue). > {quote} > To remedy the above scenario, it is proposed to perform batching of the > enqueued allocation operations so that a single allocation operation can > satisfy N enqueued allocations. This should reduce the potential for > backlogging in the allocator. See the discussion > [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377] > in MESOS-3157. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6900) Add test for framework upgrading to multi-role capability.
[ https://issues.apache.org/jira/browse/MESOS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830754#comment-15830754 ] Benjamin Mahler commented on MESOS-6900: {noformat} commit 052fb4414e2cce2b550ce0644f039b6d4a1876fa Author: Benjamin BannierDate: Thu Jan 19 14:25:48 2017 -0800 Added a test for framework upgrading to MULTI_ROLE capability. Review: https://reviews.apache.org/r/55381/ {noformat} [~bbannier] do you want to add another test that ensures that frameworks can upgrade even when tasks are running, and that new tasks can be launched? We can do this in a separate ticket as we get closer to having a working implementation. > Add test for framework upgrading to multi-role capability. > -- > > Key: MESOS-6900 > URL: https://issues.apache.org/jira/browse/MESOS-6900 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > Frameworks can upgrade to multi-role capability as long as the framework's > role remains the same. > We consider the framework roles unchanged if > * a framework previously didn't specify a {{role}} now has {{roles=()}}, or > * a framework which previously had {{role=A}} and now has {{roles=(A)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6957) timestamp based Task reconcillation
Shi Lu created MESOS-6957: - Summary: timestamp based Task reconcillation Key: MESOS-6957 URL: https://issues.apache.org/jira/browse/MESOS-6957 Project: Mesos Issue Type: Task Components: master Reporter: Shi Lu If mesos master supports timestamp based task reconciliation, e.g. client sends reconcile request with a list of tasklDs and time T, and master streams back task changes that is after T. This can reduce the overhead f task reconciliation a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4119) Add support for enabling --3way to apply-reviews.py.
[ https://issues.apache.org/jira/browse/MESOS-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830749#comment-15830749 ] Zhitao Li commented on MESOS-4119: -- https://reviews.apache.org/r/55732/ > Add support for enabling --3way to apply-reviews.py. > > > Key: MESOS-4119 > URL: https://issues.apache.org/jira/browse/MESOS-4119 > Project: Mesos > Issue Type: Task >Reporter: Artem Harutyunyan > Labels: mesosphere, newbie > > Currently if {{git apply}} fails during apply-reviews, then the change must > be rebased and re-uploaded to reviewboard in order for apply-reviews to > succeed. > However, it is often the case that {{git apply --3way}} will succeed since > the blob information is included in the diff. Even if it doesn't succeed it > will leave conflict markers, which allows the committer to do a manual > conflict resolution if desired, or abort if conflict resolution is too > difficult. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6956) Out of band Task reconcillation
Shi Lu created MESOS-6956: - Summary: Out of band Task reconcillation Key: MESOS-6956 URL: https://issues.apache.org/jira/browse/MESOS-6956 Project: Mesos Issue Type: Task Reporter: Shi Lu Can we add capability in mesos master to have out of band task reconcillation? Like the client can send a request to master with a list of taskIDs that it want to reconcile and the mesos master returns the state of those tasks in the response, instead of sending back via the subscribed connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6955) Add capability to batch acknowledge task updates
[ https://issues.apache.org/jira/browse/MESOS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shi Lu updated MESOS-6955: -- Shepherd: Zhitao Li > Add capability to batch acknowledge task updates > - > > Key: MESOS-6955 > URL: https://issues.apache.org/jira/browse/MESOS-6955 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Shi Lu >Priority: Critical > > We are building a high task throughput framework, and we are not getting > offers fast enough, because we have to ack all the task updates, each one > need a single HTTP call to the meso master. If mesos master can support batch > ack task updates that would be great -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6955) Add capability to batch acknowledge task updates
Shi Lu created MESOS-6955: - Summary: Add capability to batch acknowledge task updates Key: MESOS-6955 URL: https://issues.apache.org/jira/browse/MESOS-6955 Project: Mesos Issue Type: Task Components: master Reporter: Shi Lu Priority: Critical We are building a high task throughput framework, and we are not getting offers fast enough, because we have to ack all the task updates, each one need a single HTTP call to the meso master. If mesos master can support batch ack task updates that would be great -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6953) A compromised mesos-master node can execute code as root on agents.
[ https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830550#comment-15830550 ] Anindya Sinha edited comment on MESOS-6953 at 1/19/17 10:00 PM: To mitigate this, we can add an optional arg in mesos-agent called {{whitelisted_users}} which is a list of users who are authorized to run tasks on the agent. If this list contains the task user or if this list is empty (or the arg is missing), we allow the task to be launched on the agent. Otherwise, agent shall not let the task be launched, and send a {{TASK_FAILED}} StatusUpdate with a new {{Reason}} denoting that the user is not authorized to run the task. was (Author: anindya.sinha): To mitigate this, we can add an optional arg in mesos-agent called {{whitelisted-users}} which is a list of users who are authorized to run tasks on the agent. If this list contains the task user or if this list is empty (or the arg is missing), we allow the task to be launched on the agent. Otherwise, agent shall not let the task be launched, and send a {{TASK_FAILED}} StatusUpdate with a new {{Reason}} denoting that the user is not authorized to run the task. > A compromised mesos-master node can execute code as root on agents. > --- > > Key: MESOS-6953 > URL: https://issues.apache.org/jira/browse/MESOS-6953 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: security, slave > > mesos-master has a `--[no-]root_submissions` flag that controls whether > frameworks with `root` user are admitted to the cluster. > However, if a mesos-master node is compromised, it can attempt to schedule > tasks on agent as the `root` user. Since mesos-agent has no check against > tasks running on the agent for specific users, tasks can get run with `root` > privileges can get run within the container on the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task is stuck at staging almost 6hours in stage even after slave executor is terminated. Mesos master keeps the task state in staging state. Since the task is stuck at staging framework have not got the update from mesos-master The issue got fixed after slave restart. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated full Log of slave mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task is stuck at staging almost 6hours in stage even after slave executor is terminated. Mesos master keeps the task state in staging state. Since the task is stuck at staging framework have not got the update from mesos-master The issue got fixed after slave restart. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated {noformat} full Log of slave {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task is stuck at staging almost 6hours in stage even after slave executor is terminated. Mesos master keeps the task state in staging state. Since the task is stuck at staging framework have not got the update from mesos-master The issue got fixed after slave restart. I can see in the slave logs Asked to run task ' which is terminating/terminated {noformat} mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated full Log of slave mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436 ] Sathish Kumar edited comment on MESOS-6952 at 1/19/17 9:59 PM: --- In below log we can see that after 14:42:17. its staged and until we restarted around 20.53 it was in staging state. FYI no slave reboot/leader election happened Please find the attached master logs {noformat} I0119 14:41:13.023109 29504 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:13.023146 29504 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: a3b53759-3c7e-408c-aec9-80b048e38938) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: c65c736c-a00b-4ef3-beb6-589793169edb) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.893996 29499 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736646 29495 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736698 29495 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 46408395-d5f5-4db2-babf-cefc2145f7f4) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.230947 29494 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862722 29500 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862761 29500 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051
[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436 ] Sathish Kumar edited comment on MESOS-6952 at 1/19/17 9:57 PM: --- In below log we can see that after 14:42:17. its staged and until we restarted around 20.53 it was in staging state. FYI no slave reboot/leader election happened Please find the attached master logs \{noformat\} I0119 14:41:13.023109 29504 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:13.023146 29504 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: a3b53759-3c7e-408c-aec9-80b048e38938) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: c65c736c-a00b-4ef3-beb6-589793169edb) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.893996 29499 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736646 29495 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736698 29495 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 46408395-d5f5-4db2-babf-cefc2145f7f4) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.230947 29494 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862722 29500 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862761 29500 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051
[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436 ] Sathish Kumar edited comment on MESOS-6952 at 1/19/17 9:56 PM: --- In below log we can see that after 14:42:17. its staged and until we restarted around 20.53 it was in staging state. FYI no slave reboot/leader election happened Please find the attached master logs \{noformat\} I0119 14:41:13.023109 29504 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:13.023146 29504 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: a3b53759-3c7e-408c-aec9-80b048e38938) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: c65c736c-a00b-4ef3-beb6-589793169edb) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.893996 29499 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736646 29495 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736698 29495 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 46408395-d5f5-4db2-babf-cefc2145f7f4) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.230947 29494 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862722 29500 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862761 29500 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051
[jira] [Created] (MESOS-6954) Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent
Kevin Klues created MESOS-6954: -- Summary: Running LAUNCH_NESTED_CONTAINER with a docker container id crashes the agent Key: MESOS-6954 URL: https://issues.apache.org/jira/browse/MESOS-6954 Project: Mesos Issue Type: Bug Reporter: Kevin Klues Priority: Blocker Attempting to run {{LAUNCH_NESTED_CONTAINER}} with a parent container that was launched with the docker containerizer causes the agent to crash as below. We should add a safeguard in the handler to fail gracefully instead. {noformat} I0119 21:41:42.438295 3281 http.cpp:304] HTTP POST for /slave(1)/api/v1 from 10.0.7.194:46700 with User-Agent='python-requests/2.12.4' with X-Forwarded-For='10.0.6.162' I0119 21:41:42.441571 3281 http.cpp:465] Processing call LAUNCH_NESTED_CONTAINER_SESSION W0119 21:41:42.442286 3281 http.cpp:2251] Failed to launch nested container 62a16556-9c3b-48f2-aa1e-ba1d70093637.09a9d3b0-a245-4aa1-94f1-d10a13526b9b: Unsupported F0119 21:41:42.442371 3282 docker.cpp:2013] Check failed: !containerId.has_parent() *** Check failure stack trace: *** @ 0x7f539aca01ad google::LogMessage::Fail() @ 0x7f539aca1fdd google::LogMessage::SendToLog() @ 0x7f539ac9fd9c google::LogMessage::Flush() @ 0x7f539aca28d9 google::LogMessageFatal::~LogMessageFatal() @ 0x7f539a46e2cd mesos::internal::slave::DockerContainerizerProcess::destroy() @ 0x7f539a48a8a7 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIbN5mesos8internal5slave26DockerContainerizerProcessERKNS5_11ContainerIDEbS9_bEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSG_FSE_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f539ac14ca1 process::ProcessManager::resume() @ 0x7f539ac1dba7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f53990a5d73 (unknown) @ 0x7f5398ba652c (unknown) @ 0x7f53988e41dd (unknown) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.
[ https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6904: -- Target Version/s: 1.2.0 [~adam-mesos] trying to land this in the next couple of days to get it into 1.2. Should it be a blocker? (It doesn't have to go in but it would be nice if we could) > Perform batching of allocations to reduce allocator queue backlogging. > -- > > Key: MESOS-6904 > URL: https://issues.apache.org/jira/browse/MESOS-6904 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Jacob Janco >Assignee: Jacob Janco > Labels: allocator > > Per MESOS-3157: > {quote} > Our deployment environments have a lot of churn, with many short-live > frameworks that often revive offers. Running the allocator takes a long time > (from seconds up to minutes). > In this situation, event-triggered allocation causes the event queue in the > allocator process to get very long, and the allocator effectively becomes > unresponsive (eg. a revive offers message takes too long to come to the head > of the queue). > {quote} > To remedy the above scenario, it is proposed to perform batching of the > enqueued allocation operations so that a single allocation operation can > satisfy N enqueued allocations. This should reduce the potential for > backlogging in the allocator. See the discussion > [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377] > in MESOS-3157. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6953) A compromised mesos-master node can execute code as root on agents.
[ https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anindya Sinha updated MESOS-6953: - Summary: A compromised mesos-master node can execute code as root on agents. (was: A compromised mesos-Master can execute code as root on agents.) > A compromised mesos-master node can execute code as root on agents. > --- > > Key: MESOS-6953 > URL: https://issues.apache.org/jira/browse/MESOS-6953 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: security, slave > > mesos-master has a `--[no-]root_submissions` flag that controls whether > frameworks with `root` user are admitted to the cluster. > However, if a mesos-master node is compromised, it can attempt to schedule > tasks on agent as the `root` user. Since mesos-agent has no check against > tasks running on the agent for specific users, tasks can get run with `root` > privileges can get run within the container on the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6953) A compromised mesos-Master can execute code as root on agents.
[ https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830550#comment-15830550 ] Anindya Sinha edited comment on MESOS-6953 at 1/19/17 8:27 PM: --- To mitigate this, we can add an optional arg in mesos-agent called {{whitelisted-users}} which is a list of users who are authorized to run tasks on the agent. If this list contains the task user or if this list is empty (or the arg is missing), we allow the task to be launched on the agent. Otherwise, agent shall not let the task be launched, and send a {{TASK_FAILED}} StatusUpdate with a new {{Reason}} denoting that the user is not authorized to run the task. was (Author: anindya.sinha): To mitigate this, we can add an optional arg in mesos-agent called `whitelisted-users` which is a list of users who are authorized to run tasks on the agent. If this list contains the task user or if this list is empty (or the arg is missing), we allow the task to be launched on the agent. Otherwise, agent shall not let the task be launched, and send a `TASK_FAILED` StatusUpdate with a new `Reason` denoting that the user is not authorized to run the task. > A compromised mesos-Master can execute code as root on agents. > -- > > Key: MESOS-6953 > URL: https://issues.apache.org/jira/browse/MESOS-6953 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: security, slave > > mesos-master has a `--[no-]root_submissions` flag that controls whether > frameworks with `root` user are admitted to the cluster. > However, if a mesos-master node is compromised, it can attempt to schedule > tasks on agent as the `root` user. Since mesos-agent has no check against > tasks running on the agent for specific users, tasks can get run with `root` > privileges can get run within the container on the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6953) A compromised mesos-Master can execute code as root on agents.
[ https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830550#comment-15830550 ] Anindya Sinha commented on MESOS-6953: -- To mitigate this, we can add an optional arg in mesos-agent called `whitelisted-users` which is a list of users who are authorized to run tasks on the agent. If this list contains the task user or if this list is empty (or the arg is missing), we allow the task to be launched on the agent. Otherwise, agent shall not let the task be launched, and send a `TASK_FAILED` StatusUpdate with a new `Reason` denoting that the user is not authorized to run the task. > A compromised mesos-Master can execute code as root on agents. > -- > > Key: MESOS-6953 > URL: https://issues.apache.org/jira/browse/MESOS-6953 > Project: Mesos > Issue Type: Bug > Components: security >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: security, slave > > mesos-master has a `--[no-]root_submissions` flag that controls whether > frameworks with `root` user are admitted to the cluster. > However, if a mesos-master node is compromised, it can attempt to schedule > tasks on agent as the `root` user. Since mesos-agent has no check against > tasks running on the agent for specific users, tasks can get run with `root` > privileges can get run within the container on the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6953) A compromised mesos-Master can execute code as root on agents.
Anindya Sinha created MESOS-6953: Summary: A compromised mesos-Master can execute code as root on agents. Key: MESOS-6953 URL: https://issues.apache.org/jira/browse/MESOS-6953 Project: Mesos Issue Type: Bug Components: security Reporter: Anindya Sinha Assignee: Anindya Sinha mesos-master has a `--[no-]root_submissions` flag that controls whether frameworks with `root` user are admitted to the cluster. However, if a mesos-master node is compromised, it can attempt to schedule tasks on agent as the `root` user. Since mesos-agent has no check against tasks running on the agent for specific users, tasks can get run with `root` privileges can get run within the container on the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830529#comment-15830529 ] Kevin Klues commented on MESOS-6952: Can you pelase edit the logs you pasted to make it more readable. Just put tags around them like: \{noformat\} LOGS \{noformat\} > Mesos task state was stuck in staging even after executor terminated > > > Key: MESOS-6952 > URL: https://issues.apache.org/jira/browse/MESOS-6952 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 0.28.2 > Environment: ubuntu 14.04 >Reporter: Sathish Kumar > > Task is stuck at staging almost 6hours in stage even after slave executor is > terminated. > Mesos master keeps the task state in staging state. Since the task is stuck > at staging framework have not got the update from mesos-master > The issue got fixed after slave restart. > I can see in the slave logs Asked to run task ' which is > terminating/terminated > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for > task ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097193 107774 slave.cpp:1361] Got assigned task > ct:148481682:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097453 107774 slave.cpp:1480] Launching task > ct:148481682:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 > 14:42:17.097527 107774 slave.cpp:1673] Asked to run task > 'ct:148481682:0:foocare_zendesk_round_robin:' for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor > 'ct:148481682:0:foocare_zendesk_round_robin:' which is > terminating/terminated > full Log of slave > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update > TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE > for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) > for task ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status > update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update > acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID:
[jira] [Updated] (MESOS-6948) AgentAPITest.LaunchNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-6948: -- Assignee: Kevin Klues > AgentAPITest.LaunchNestedContainerSession is flaky > -- > > Key: MESOS-6948 > URL: https://issues.apache.org/jira/browse/MESOS-6948 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 7 VM, libevent and SSL enabled >Reporter: Greg Mann >Assignee: Kevin Klues >Priority: Blocker > Labels: debugging, tests > Attachments: AgentAPITest.LaunchNestedContainerSession.txt > > > This was observed in a CentOS 7 VM, with libevent and SSL enabled: > {code} > I0118 22:17:23.528846 2887 http.cpp:464] Processing call > LAUNCH_NESTED_CONTAINER_SESSION > I0118 22:17:23.530452 2887 containerizer.cpp:1807] Starting nested container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.532265 2887 containerizer.cpp:1831] Trying to chown > '/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e' > to user 'vagrant' > I0118 22:17:23.535213 2887 switchboard.cpp:570] Launching > 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" > --help="false" > --socket_address="/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430" > --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" > --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" > --wait_for_connection="true"' for container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.537210 2887 switchboard.cpp:600] Created I/O switchboard > server (pid: 3335) listening on socket file > '/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430' for > container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.543665 2887 containerizer.cpp:1540] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"command":{"shell":true,"value":"printf output && printf > error > 1>&2"},"environment":{},"err":{"fd":16,"type":"FD"},"in":{"fd":11,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"vagrant"}" > --pipe_read="12" --pipe_write="13" > --runtime_directory="/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_QVZGrY/containers/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e" > --unshare_namespace_mnt="false"' > I0118 22:17:23.556032 2887 launcher.cpp:133] Forked child with pid '3337' > for container > '492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e' > I0118 22:17:23.563900 2887 fetcher.cpp:349] Starting to fetch URIs for > container: > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e, > directory: > /tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.962441 2887 containerizer.cpp:2481] Container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e has > exited > I0118 22:17:23.962484 2887 containerizer.cpp:2118] Destroying container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e in > RUNNING state > I0118 22:17:23.962715 2887 launcher.cpp:149] Asked to destroy container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.977562 2887 process.cpp:3733] Failed to process request for > '/slave(69)/api/v1': Container has or is being destroyed > W0118 22:17:23.978216 2887 http.cpp:2734] Failed to attach to nested > container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e: > Container has or is being destroyed > I0118 22:17:23.978330 2887 process.cpp:1435] Returning '500 Internal Server > Error' for '/slave(69)/api/v1' (Container has or is being destroyed) > ../../src/tests/api_tests.cpp:3960: Failure > Value of: (response).get().status > Actual: "500 Internal Server Error" > Expected: http::OK().status > Which is: "200 OK" > {code} > Find attached the full log from a failed run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"
[ https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-6804: -- Priority: Critical (was: Blocker) > Running 'tty' inside a debug container that has a tty reports "Not a tty" > - > > Key: MESOS-6804 > URL: https://issues.apache.org/jira/browse/MESOS-6804 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Kevin Klues >Priority: Critical > Labels: debugging, mesosphere > > We need to inject `/dev/console` into the container and map it to the slave > end of the TTY we are attached to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436 ] Sathish Kumar edited comment on MESOS-6952 at 1/19/17 7:26 PM: --- In below log we can see that after 14:42:17. its staged and until we restarted around 20.53 it was in staging state. FYI no slave reboot/leader election happened Please find the attached master logs I0119 14:41:13.023109 29504 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:13.023146 29504 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: a3b53759-3c7e-408c-aec9-80b048e38938) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: c65c736c-a00b-4ef3-beb6-589793169edb) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.893996 29499 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736646 29495 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736698 29495 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 46408395-d5f5-4db2-babf-cefc2145f7f4) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.230947 29494 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862722 29500 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862761 29500 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051
[jira] [Commented] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830436#comment-15830436 ] Sathish Kumar commented on MESOS-6952: -- In below log we can see that after 14:42:17. its staged and until we restarted around 20.53 it was in staging state. Please find the attached master logs I0119 14:41:13.023109 29504 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:13.023146 29504 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:14.037518 29508 master.cpp:4763] Status update TASK_RUNNING (UUID: a3b53759-3c7e-408c-aec9-80b048e38938) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.829838 29508 master.cpp:4763] Status update TASK_FAILED (UUID: c65c736c-a00b-4ef3-beb6-589793169edb) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:53.893996 29499 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736646 29495 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:54.736698 29495 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:41:55.708216 29509 master.cpp:4763] Status update TASK_RUNNING (UUID: 46408395-d5f5-4db2-babf-cefc2145f7f4) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.185272 29494 master.cpp:4763] Status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.230947 29494 master.cpp:6487] Removing task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862722 29500 master.hpp:177] Adding task ct:148481682:0:foocare_zendesk_round_robin: with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:15.862761 29500 master.cpp:3589] Launching task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 (chronos-2.4.0) at scheduler-ebdbe062-3bc9-4683-94ce-96d7003a7fcc@10.14.23.221:55368 with resources cpus(*):0.4; mem(*):1024; disk(*):256 on slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:16.312088 29504 master.cpp:4763] Status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from slave 22c4f06b-d107-4cf4-86b1-81a6cce5441a-S56 at slave(1)@10.14.38.239:5051 (distancematrix8.prod-foo-dcos.foobar.net) I0119 14:42:17.093446 29504 master.cpp:6487] Removing task
[jira] [Updated] (MESOS-6906) Introduce a general non-interpreting task check.
[ https://issues.apache.org/jira/browse/MESOS-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6906: --- Sprint: Mesosphere Sprint 50 (was: Mesosphere Sprint 49) > Introduce a general non-interpreting task check. > > > Key: MESOS-6906 > URL: https://issues.apache.org/jira/browse/MESOS-6906 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: health-check, mesosphere > > In addition to result-interpreting, killing health check, there is a > requirement from Mesos framework authors for a general check that can execute > an arbitrary command or send an HTTP request and pass the result to the > scheduler without interpreting it. > This ticket aims to implement this functionality by introducing a new class > of a check in Mesos. Design doc: > https://docs.google.com/document/d/1VLdaH7i7UDT3_38aOlzTOtH7lwH-laB8dCwNzte0DkU -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6908) Zero health check timeout is interpreted literally.
[ https://issues.apache.org/jira/browse/MESOS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6908: --- Sprint: Mesosphere Sprint 50 (was: Mesosphere Sprint 49) > Zero health check timeout is interpreted literally. > --- > > Key: MESOS-6908 > URL: https://issues.apache.org/jira/browse/MESOS-6908 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.2, 1.1.0 >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Minor > Labels: health-check, mesosphere > > Currently zero health check timeout is interpreted literally, which is not > very helpful since a health check does not even get a chance to finish. We > suggest to fixe this behaviour by interpreting zero as {{Duration::max()}} > effectively rendering the timeout infinite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6944) Mesos - AD integration Process / Example
[ https://issues.apache.org/jira/browse/MESOS-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830297#comment-15830297 ] Kevin Klues commented on MESOS-6944: Here is a link to what Alexander is referring to: https://docs.mesosphere.com/1.8/administration/id-and-access-mgt/ldap/ > Mesos - AD integration Process / Example > > > Key: MESOS-6944 > URL: https://issues.apache.org/jira/browse/MESOS-6944 > Project: Mesos > Issue Type: Task > Components: modules >Reporter: Rahul Bhardwaj > Labels: mesosphere > > Hi Team, > We are trying to configure AD authentication with Mesos for HTTP endpoints > (only UI). > But we couldnt find any clear documentation or exmaple on your site > http://mesos.apache.org/ that shows the process of integration with AD > (ldap). Also we could not find reference to any existing Ldap library to use > with Mesos on the Module page. > Authentication doc: > http://mesos.apache.org/documentation/latest/authentication/. > Module doc:http://mesos.apache.org/documentation/latest/modules/ > (Authentication section). > Can you please tell us if this feature is already available and an example > documentation will help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1582) Improve build time.
[ https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830295#comment-15830295 ] Alex Clemmer commented on MESOS-1582: - Fortunately, [~vinodkone], it happens to be the case that this is already roadmap-adjacent for [~kaysoky] and I. It would be natural to tackle this when we also tackle the "break libmesos up into many binaries" issue: https://issues.apache.org/jira/browse/MESOS-3542 I think the next step is to write up a little design doc about the plan. > Improve build time. > --- > > Key: MESOS-1582 > URL: https://issues.apache.org/jira/browse/MESOS-1582 > Project: Mesos > Issue Type: Epic > Components: build >Reporter: Benjamin Hindman > Labels: microsoft, tech-debt > > The build takes a ridiculously long time unless you have a large, parallel > machine. This is a combination of many factors, all of which we'd like to > discuss and track here. > I'd also love to actually track build times so we can get an appreciation of > the improvements. Please leave a comment below with your build times! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830280#comment-15830280 ] Vinod Kone commented on MESOS-6952: --- Can you paste the corresponding master logs? > Mesos task state was stuck in staging even after executor terminated > > > Key: MESOS-6952 > URL: https://issues.apache.org/jira/browse/MESOS-6952 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 0.28.2 > Environment: ubuntu 14.04 >Reporter: Sathish Kumar > > Task is stuck at staging almost 6hours in stage even after slave executor is > terminated. > Mesos master keeps the task state in staging state. Since the task is stuck > at staging framework have not got the update from mesos-master > The issue got fixed after slave restart. > I can see in the slave logs Asked to run task ' which is > terminating/terminated > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for > task ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097193 107774 slave.cpp:1361] Got assigned task > ct:148481682:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097453 107774 slave.cpp:1480] Launching task > ct:148481682:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 > 14:42:17.097527 107774 slave.cpp:1673] Asked to run task > 'ct:148481682:0:foocare_zendesk_round_robin:' for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor > 'ct:148481682:0:foocare_zendesk_round_robin:' which is > terminating/terminated > full Log of slave > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update > TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE > for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) > for task ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status > update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update > acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for > task
[jira] [Updated] (MESOS-6355) Improvements to task group support.
[ https://issues.apache.org/jira/browse/MESOS-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-6355: -- Sprint: Mesosphere Sprint 49 > Improvements to task group support. > --- > > Key: MESOS-6355 > URL: https://issues.apache.org/jira/browse/MESOS-6355 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone > Labels: mesosphere > > This is a follow up epic to MESOS-2249 to capture further improvements and > changes that need to be made to the MVP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6864) Container Exec should be possible with tasks belonging to a task group
[ https://issues.apache.org/jira/browse/MESOS-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6864: -- Story Points: 5 > Container Exec should be possible with tasks belonging to a task group > -- > > Key: MESOS-6864 > URL: https://issues.apache.org/jira/browse/MESOS-6864 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman >Priority: Blocker > Labels: debugging, mesosphere > > {{LaunchNestedContainerSession}} currently requires the parent container to > be an Executor > (https://github.com/apache/mesos/blob/f89f28724f5837ff414dc6cc84e1afb63f3306e5/src/slave/http.cpp#L2189-L2211). > This works for command tasks, because the task container id is the same as > the executor container id. > But it won't work for pod tasks whose container id is different from > executor’s container id. > In order to resolve this ticket, we need to allow launching a child container > at an arbitrary level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6864) Container Exec should be possible with tasks belonging to a task group
[ https://issues.apache.org/jira/browse/MESOS-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821809#comment-15821809 ] Gastón Kleiman edited comment on MESOS-6864 at 1/19/17 4:42 PM: https://reviews.apache.org/r/55676/ https://reviews.apache.org/r/55722/ https://reviews.apache.org/r/55677/ https://reviews.apache.org/r/55678/ https://reviews.apache.org/r/55679/ https://reviews.apache.org/r/55464/ was (Author: gkleiman): https://reviews.apache.org/r/55464/ > Container Exec should be possible with tasks belonging to a task group > -- > > Key: MESOS-6864 > URL: https://issues.apache.org/jira/browse/MESOS-6864 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman >Priority: Blocker > Labels: debugging, mesosphere > > {{LaunchNestedContainerSession}} currently requires the parent container to > be an Executor > (https://github.com/apache/mesos/blob/f89f28724f5837ff414dc6cc84e1afb63f3306e5/src/slave/http.cpp#L2189-L2211). > This works for command tasks, because the task container id is the same as > the executor container id. > But it won't work for pod tasks whose container id is different from > executor’s container id. > In order to resolve this ticket, we need to allow launching a child container > at an arbitrary level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6947) Fix pailer XSS vulnerability
[ https://issues.apache.org/jira/browse/MESOS-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-6947: Description: There exists an XSS vulnerability in pailer.html. {{window.name}} can be set to an external domain serving js which is wrapped in
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Description: Task is stuck at staging almost 6hours in stage even after slave executor is terminated. Mesos master keeps the task state in staging state. Since the task is stuck at staging framework have not got the update from mesos-master The issue got fixed after slave restart. I can see in the slave logs Asked to run task ' which is terminating/terminated mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated full Log of slave mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858762 107759 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
[jira] [Commented] (MESOS-1582) Improve build time.
[ https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830103#comment-15830103 ] Vinod Kone commented on MESOS-1582: --- [~hausdorff] I'm huge +1 to fixing this. Unfortunately, I don't have cycles to shepherd this myself, but I'm hoping we can find one from our ever growing committer pool. > Improve build time. > --- > > Key: MESOS-1582 > URL: https://issues.apache.org/jira/browse/MESOS-1582 > Project: Mesos > Issue Type: Epic > Components: build >Reporter: Benjamin Hindman > Labels: microsoft, tech-debt > > The build takes a ridiculously long time unless you have a large, parallel > machine. This is a combination of many factors, all of which we'd like to > discuss and track here. > I'd also love to actually track build times so we can get an appreciation of > the improvements. Please leave a comment below with your build times! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6952) Mesos task state was stuck in staging even after executor terminated
[ https://issues.apache.org/jira/browse/MESOS-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish Kumar updated MESOS-6952: - Summary: Mesos task state was stuck in staging even after executor terminated (was: Mesos task state was stuck in staging inspite) > Mesos task state was stuck in staging even after executor terminated > > > Key: MESOS-6952 > URL: https://issues.apache.org/jira/browse/MESOS-6952 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 0.28.2 > Environment: ubuntu 14.04 >Reporter: Sathish Kumar > > Task is stuck at staging stage even after slave executor is terminated. > Mesos master keeps the task state in staging state. Since the task is stuck > at staging framework have not got the update from mesos-master > The issue got fixed after slave restart. > I can see in the slave logs Asked to run task ' which is > terminating/terminated > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for > task ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097193 107774 slave.cpp:1361] Got assigned task > ct:148481682:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:17.097453 107774 slave.cpp:1480] Launching task > ct:148481682:0:foocare_zendesk_round_robin: for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 > 14:42:17.097527 107774 slave.cpp:1673] Asked to run task > 'ct:148481682:0:foocare_zendesk_round_robin:' for framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor > 'ct:148481682:0:foocare_zendesk_round_robin:' which is > terminating/terminated > full Log of slave > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update > TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE > for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) > for task ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED > (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status > update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update > acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task > ct:148481682:0:foocare_zendesk_round_robin: of framework > 19393553-2061-4d2f-8c05-a0ba688334f4-0001 > mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 > 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for > status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for > task
[jira] [Created] (MESOS-6952) Mesos task state was stuck in staging inspite
Sathish Kumar created MESOS-6952: Summary: Mesos task state was stuck in staging inspite Key: MESOS-6952 URL: https://issues.apache.org/jira/browse/MESOS-6952 Project: Mesos Issue Type: Bug Components: executor Affects Versions: 0.28.2 Environment: ubuntu 14.04 Reporter: Sathish Kumar Task is stuck at staging stage even after slave executor is terminated. Mesos master keeps the task state in staging state. Since the task is stuck at staging framework have not got the update from mesos-master The issue got fixed after slave restart. I can see in the slave logs Asked to run task ' which is terminating/terminated mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.089251 107759 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 247bbeed-1d60-4d33-ac1e-9282266c54ee) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097193 107774 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:17.097453 107774 slave.cpp:1480] Launching task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:W0119 14:42:17.097527 107774 slave.cpp:1673] Asked to run task 'ct:148481682:0:foocare_zendesk_round_robin:' for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 with executor 'ct:148481682:0:foocare_zendesk_round_robin:' which is terminating/terminated full Log of slave mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.066277 107763 slave.cpp:3012] Handling status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 from executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134692 107766 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.134753 107766 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142010 107767 slave.cpp:3410] Forwarding the update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to master@10.14.23.181:5050 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.142119 107767 slave.cpp:3320] Sending acknowledgement for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 to executor(1)@10.14.38.239:43937 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226682 107761 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.226759 107761 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e4147e8-f11c-4950-ba7b-c4e7f8bc5932) for task ct:148481682:0:foocare_zendesk_round_robin: of framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001 mesos-slave.distancematrix8.prod-foo-dcos.foobar.net.invalid-user.log.INFO.20161004-154315.107733:I0119 14:42:15.858510 107759 slave.cpp:1361] Got assigned task ct:148481682:0:foocare_zendesk_round_robin: for framework 19393553-2061-4d2f-8c05-a0ba688334f4-0001
[jira] [Updated] (MESOS-6946) Make wait status checks consistent.
[ https://issues.apache.org/jira/browse/MESOS-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-6946: -- Labels: tech-debt (was: ) > Make wait status checks consistent. > --- > > Key: MESOS-6946 > URL: https://issues.apache.org/jira/browse/MESOS-6946 > Project: Mesos > Issue Type: Bug >Reporter: James Peach >Assignee: James Peach >Priority: Trivial > Labels: tech-debt > > There are various places that test the {{wait(2)}} exit status in different > ways. Clean this up to be consistent and use {{WSTRINGIFY}} to format error > messages where appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6917) Segfault when the executor sets an invalid UUID when sending a status update.
[ https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-6917: -- Fix Version/s: 1.0.3 > Segfault when the executor sets an invalid UUID when sending a status update. > -- > > Key: MESOS-6917 > URL: https://issues.apache.org/jira/browse/MESOS-6917 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0 >Reporter: Aaron Wood >Assignee: Aaron Wood >Priority: Blocker > Labels: mesosphere > Fix For: 1.1.1, 1.2.0, 1.0.3 > > > A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and > sends it off to the agent: > {code} > ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state > == ERROR: Not a valid UUID > *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are > using GNU date *** > PC: @ 0x7efeb6101428 (unknown) > *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID > 14007; stack trace: *** > @ 0x7efeb64a6390 (unknown) > @ 0x7efeb6101428 (unknown) > @ 0x7efeb610302a (unknown) > @ 0x560df739fa6e _Abort() > @ 0x560df739fa9c _Abort() > @ 0x7efebb53a5ad Try<>::get() > @ 0x7efebb5363d6 Try<>::get() > @ 0x7efebbd84809 > mesos::internal::slave::validation::executor::call::validate() > @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() > @ 0x7efebbc773b8 > _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ > @ 0x7efebbcb5808 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ > @ 0x7efebbfb2aea std::function<>::operator()() > @ 0x7efebcb158b8 > _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb > @ 0x7efebcb1a10a > _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv > @ 0x7efebcb1c5f8 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7efebb5ce8ca std::function<>::operator()() > @ 0x7efebb5c4b27 > _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_ > @ 0x7efebb5d4e1e > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ > @ 0x7efebcb30baf std::function<>::operator()() > @ 0x7efebcb13fd6 process::ProcessBase::visit() > @ 0x7efebcb1f3c8 process::DispatchEvent::visit() > @ 0x7efebb3ab2ea process::ProcessBase::serve() > @ 0x7efebcb0fe8a process::ProcessManager::resume() > @ 0x7efebcb0c5a3 > _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > @ 0x7efebcb1ea34 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x7efebcb1e98a > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv > @ 0x7efebcb1e91a > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7efeb6980c80 (unknown) > @ 0x7efeb649c6ba start_thread > @ 0x7efeb61d282d (unknown) > Aborted (core dumped) > {code} > https://reviews.apache.org/r/55480/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6944) Mesos - AD integration Process / Example
[ https://issues.apache.org/jira/browse/MESOS-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830048#comment-15830048 ] Alexander Rojas commented on MESOS-6944: Apache Mesos doesn't provide integration with LDAP or AD out of the box. It does provide authentication based on configuration files which uses the Basic authentication scheme. It is left to Mesos users to build their own modules to extend the basic Mesos feature set. Some companies have created products that give you proprietary LDAP integrations (DC/OS Enterprise being an example). > Mesos - AD integration Process / Example > > > Key: MESOS-6944 > URL: https://issues.apache.org/jira/browse/MESOS-6944 > Project: Mesos > Issue Type: Task > Components: modules >Reporter: Rahul Bhardwaj > Labels: mesosphere > > Hi Team, > We are trying to configure AD authentication with Mesos for HTTP endpoints > (only UI). > But we couldnt find any clear documentation or exmaple on your site > http://mesos.apache.org/ that shows the process of integration with AD > (ldap). Also we could not find reference to any existing Ldap library to use > with Mesos on the Module page. > Authentication doc: > http://mesos.apache.org/documentation/latest/authentication/. > Module doc:http://mesos.apache.org/documentation/latest/modules/ > (Authentication section). > Can you please tell us if this feature is already available and an example > documentation will help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)