[jira] [Commented] (MESOS-4930) Update example frameworks in Mesos codebase to assign proper TaskId in order to be sorted correctly in WebUI
[ https://issues.apache.org/jira/browse/MESOS-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194826#comment-15194826 ] Jay Guo commented on MESOS-4930: review board link: https://reviews.apache.org/r/44836/ > Update example frameworks in Mesos codebase to assign proper TaskId in order > to be sorted correctly in WebUI > > > Key: MESOS-4930 > URL: https://issues.apache.org/jira/browse/MESOS-4930 > Project: Mesos > Issue Type: Improvement > Components: framework, webui >Reporter: Jay Guo >Assignee: Jay Guo >Priority: Trivial > > Frameworks should assign fixed number of digits to tasks as the TaskId, which > will be lexically sorted by WebUI in correct order. > For instance, `1`, `2`, `10`, `11` will be sorted to `1`, `10`, `11`, `2`. > But `001`, `002`, `010`, `011` will be sorted in ascending order. > /src/examples/long_lived_framework.cpp should be updated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4930) Update example frameworks in Mesos codebase to assign proper TaskId in order to be sorted correctly in WebUI
[ https://issues.apache.org/jira/browse/MESOS-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Guo reassigned MESOS-4930: -- Assignee: Jay Guo > Update example frameworks in Mesos codebase to assign proper TaskId in order > to be sorted correctly in WebUI > > > Key: MESOS-4930 > URL: https://issues.apache.org/jira/browse/MESOS-4930 > Project: Mesos > Issue Type: Improvement > Components: framework, webui >Reporter: Jay Guo >Assignee: Jay Guo >Priority: Trivial > > Frameworks should assign fixed number of digits to tasks as the TaskId, which > will be lexically sorted by WebUI in correct order. > For instance, `1`, `2`, `10`, `11` will be sorted to `1`, `10`, `11`, `2`. > But `001`, `002`, `010`, `011` will be sorted in ascending order. > /src/examples/long_lived_framework.cpp should be updated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4946) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand flaky on Debian 8
Zhitao Li created MESOS-4946: Summary: ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand flaky on Debian 8 Key: MESOS-4946 URL: https://issues.apache.org/jira/browse/MESOS-4946 Project: Mesos Issue Type: Bug Components: containerization, test Environment: Debian 8 (VM) Kernel version: 3.18.27 Reporter: Zhitao Li While testing 0.28.1-rc2, this test fails when running as root on my debian 8 EC2 instance. Verbose log: https://gist.github.com/zhitaoli/95436f4ea2df13c4b137 It seems like the second call to {{os::su()}} in {{src/launcher/executor.cpp}} failed: {quote} if (user.isSome()) { Try su = os::su(user.get()); if (su.isError()) { cerr << "Failed to change user to '" << user.get() << "': " << su.error() << endl; abort(); } } {quote} Additional debug logging suggests that {{getpwnam_r(3)}} failed with errno {{ENOENT}} in {{os::getgid}}. Not sure what could the cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2198) Document that TaskIDs should not be reused
[ https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-2198: --- Shepherd: Jie Yu > Document that TaskIDs should not be reused > -- > > Key: MESOS-2198 > URL: https://issues.apache.org/jira/browse/MESOS-2198 > Project: Mesos > Issue Type: Bug > Components: documentation, framework >Reporter: Robert Lacroix >Assignee: Neil Conway > Labels: documentation > > Let's update the documentation for TaskID to indicate that reuse is not > recommended, as per the discussion below. > - > Old Summary: Scheduler#statusUpdate should not be called multiple times for > the same status update > Currently Scheduler#statusUpdate can be called multiple times for the same > status update, for example when the slave retransmits a status update because > it's not acknowledged in time. Especially for terminal status updates this > can lead to unexpected scheduler behavior when task id's are being reused. > Consider this scenario: > * Scheduler schedules task > * Task fails, slave sends TASK_FAILED > * Scheduler is busy and libmesos doesn't acknowledge update in time > * Slave retransmits TASK_FAILED > * Scheduler eventually receives first TASK_FAILED and reschedules task > * Second TASK_FAILED triggers statusUpdate again and the scheduler can't > determine if the TASK_FAILED belongs to the first or second run of the task. > It would be a lot better if libmesos would dedupe status updates and only > call Scheduler#statusUpdate once per status update it received. Retries with > the same UUID shouldn't cause Scheduler#statusUpdate to be executed again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2198) Document that TaskIDs should not be reused
[ https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-2198: -- Assignee: Neil Conway (was: Qian Zhang) > Document that TaskIDs should not be reused > -- > > Key: MESOS-2198 > URL: https://issues.apache.org/jira/browse/MESOS-2198 > Project: Mesos > Issue Type: Bug > Components: documentation, framework >Reporter: Robert Lacroix >Assignee: Neil Conway > Labels: documentation > > Let's update the documentation for TaskID to indicate that reuse is not > recommended, as per the discussion below. > - > Old Summary: Scheduler#statusUpdate should not be called multiple times for > the same status update > Currently Scheduler#statusUpdate can be called multiple times for the same > status update, for example when the slave retransmits a status update because > it's not acknowledged in time. Especially for terminal status updates this > can lead to unexpected scheduler behavior when task id's are being reused. > Consider this scenario: > * Scheduler schedules task > * Task fails, slave sends TASK_FAILED > * Scheduler is busy and libmesos doesn't acknowledge update in time > * Slave retransmits TASK_FAILED > * Scheduler eventually receives first TASK_FAILED and reschedules task > * Second TASK_FAILED triggers statusUpdate again and the scheduler can't > determine if the TASK_FAILED belongs to the first or second run of the task. > It would be a lot better if libmesos would dedupe status updates and only > call Scheduler#statusUpdate once per status update it received. Retries with > the same UUID shouldn't cause Scheduler#statusUpdate to be executed again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3902) The Location header when non-leading master redirects to leading master is incomplete.
[ https://issues.apache.org/jira/browse/MESOS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Murthy reassigned MESOS-3902: Assignee: Ashwin Murthy (was: Ashwin Murthy) > The Location header when non-leading master redirects to leading master is > incomplete. > -- > > Key: MESOS-3902 > URL: https://issues.apache.org/jira/browse/MESOS-3902 > Project: Mesos > Issue Type: Bug > Components: HTTP API, master >Affects Versions: 0.25.0 > Environment: 3 masters, 10 slaves >Reporter: Ben Whitehead >Assignee: Ashwin Murthy > Labels: mesosphere > > The master now sets a location header, but it's incomplete. The path of the > URL isn't set. Consider an example: > {code} > > cat /tmp/subscribe-1072944352375841456 | httpp POST > > 127.1.0.3:5050/api/v1/scheduler Content-Type:application/x-protobuf > POST /api/v1/scheduler HTTP/1.1 > Accept: application/json > Accept-Encoding: gzip, deflate > Connection: keep-alive > Content-Length: 123 > Content-Type: application/x-protobuf > Host: 127.1.0.3:5050 > User-Agent: HTTPie/0.9.0 > +-+ > | NOTE: binary data not shown in terminal | > +-+ > HTTP/1.1 307 Temporary Redirect > Content-Length: 0 > Date: Fri, 26 Feb 2016 00:54:41 GMT > Location: //127.1.0.1:5050 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4945) Garbage collect unused docker layers in the store.
[ https://issues.apache.org/jira/browse/MESOS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194616#comment-15194616 ] Guangya Liu commented on MESOS-4945: This may have some relationship with "pre fetch" feature, the "pre fetch" will fetch some large images before they are used; This ticket is planning to remove some layers/images. > Garbage collect unused docker layers in the store. > -- > > Key: MESOS-4945 > URL: https://issues.apache.org/jira/browse/MESOS-4945 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu > > Right now, we don't have any garbage collection in place for docker layers. > It's not straightforward to implement because we don't know what container is > currently using the layer. We probably need a way to track the current usage > of layers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4744) mesos-execute should allow setting role
[ https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194573#comment-15194573 ] Shuai Lin commented on MESOS-4744: -- Hi [~qiujian], not sure whether it's worth another ticket, after all it's a rather small change. > mesos-execute should allow setting role > --- > > Key: MESOS-4744 > URL: https://issues.apache.org/jira/browse/MESOS-4744 > Project: Mesos > Issue Type: Bug > Components: cli >Reporter: Jian Qiu >Assignee: Jian Qiu >Priority: Minor > > It will be quite useful if we can set role when running mesos-execute -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4923) Treat revocable resources as a separate pool when considering fairness
[ https://issues.apache.org/jira/browse/MESOS-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangya Liu updated MESOS-4923: --- Description: The current logic of roleSorter is that when it do role sorter, the resources in it will include both regular resources and revocable resources, and this may not accurate for some cases, take the following case as an instance: 1) framework1 and framework2. 2) framework1 got 1 reserved cpu and 9 revocable cpu. cpu(r1):1;cpu(*){REV}:9 3) framework2 got 9 reserved cpus: cpu(r1):9 When allocator allocate resources in next cycle, framework2 will be handled first as it has less SCALAR resources than framework1, but this may not be right for some cases as framework1 is using only 1 reserved resources and other resources are revocable which can be easily got evicted. A proposal here is treat revocable resources as a separate pool when considering fairness, this can be achieved by introducing a new sorter for revocable resources so as to distinguish the sorter for regular resources and revocable resources. To the built in allocator, the logic would be as this: 1) Quota Role Sorter 2) non-revocable Role Sorter 3) Revocable Role Sorter was: The current logic of roleSorter is that when it do role sorter, the resources in it will include both regular resources and revocable resources, and this may not accurate for some cases, take the following case as an instance: 1) framework1 and framework2. 2) framework1 got 1 reserved cpu and 9 revocable cpu. cpu(r1):1;cpu(*){REV}:9 3) framework2 got 9 reserved cpus: cpu(r1):9 When allocator allocate resources in next cycle, framework2 will be handled first as it has less SCALAR resources than framework1, but this may not be right for some cases as framework1 is using only 1 reserved resources and other resources are revocable which can be easily got evicted. A proposal here is introducing a new sorter for revocable resources so as to distinguish the sorter for regular resources and revocable resources. To the built in allocator, the logic would be as this: 1) Quota Role Sorter 2) non-revocable Role Sorter 3) Revocable Role Sorter > Treat revocable resources as a separate pool when considering fairness > -- > > Key: MESOS-4923 > URL: https://issues.apache.org/jira/browse/MESOS-4923 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Klaus Ma > > The current logic of roleSorter is that when it do role sorter, the resources > in it will include both regular resources and revocable resources, and this > may not accurate for some cases, take the following case as an instance: > 1) framework1 and framework2. > 2) framework1 got 1 reserved cpu and 9 revocable cpu. cpu(r1):1;cpu(*){REV}:9 > 3) framework2 got 9 reserved cpus: cpu(r1):9 > When allocator allocate resources in next cycle, framework2 will be handled > first as it has less SCALAR resources than framework1, but this may not be > right for some cases as framework1 is using only 1 reserved resources and > other resources are revocable which can be easily got evicted. > A proposal here is treat revocable resources as a separate pool when > considering fairness, this can be achieved by introducing a new sorter for > revocable resources so as to distinguish the sorter for regular resources and > revocable resources. To the built in allocator, the logic would be as this: > 1) Quota Role Sorter > 2) non-revocable Role Sorter > 3) Revocable Role Sorter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4944) Improve overlay backend so that it's writable
[ https://issues.apache.org/jira/browse/MESOS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194538#comment-15194538 ] Shuai Lin edited comment on MESOS-4944 at 3/15/16 2:01 AM: --- bq. We can use an empty directory from the container sandbox to act as the upper layer so that it's writable. -With the same workaround we can also make overlay backend work for images which has only 1 layer. Currently using the overlay backend means giving up on using 1-layer images like alpine.- This would also make overlay backend usable for 1-layer images like alpine. was (Author: lins05): bq. We can use an empty directory from the container sandbox to act as the upper layer so that it's writable. With the same workaround we can also make overlay backend work for images which has only 1 layer. Currently using the overlay backend means giving up on using 1-layer images like alpine. > Improve overlay backend so that it's writable > - > > Key: MESOS-4944 > URL: https://issues.apache.org/jira/browse/MESOS-4944 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Shuai Lin > > Currently, the overlay backend will provision a read-only FS. We can use an > empty directory from the container sandbox to act as the upper layer so that > it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4923) Treat revocable resources as a separate pool when considering fairness
[ https://issues.apache.org/jira/browse/MESOS-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangya Liu updated MESOS-4923: --- Summary: Treat revocable resources as a separate pool when considering fairness (was: Add a new sorter for revocable resources in allocator) > Treat revocable resources as a separate pool when considering fairness > -- > > Key: MESOS-4923 > URL: https://issues.apache.org/jira/browse/MESOS-4923 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Klaus Ma > > The current logic of roleSorter is that when it do role sorter, the resources > in it will include both regular resources and revocable resources, and this > may not accurate for some cases, take the following case as an instance: > 1) framework1 and framework2. > 2) framework1 got 1 reserved cpu and 9 revocable cpu. cpu(r1):1;cpu(*){REV}:9 > 3) framework2 got 9 reserved cpus: cpu(r1):9 > When allocator allocate resources in next cycle, framework2 will be handled > first as it has less SCALAR resources than framework1, but this may not be > right for some cases as framework1 is using only 1 reserved resources and > other resources are revocable which can be easily got evicted. > A proposal here is introducing a new sorter for revocable resources so as to > distinguish the sorter for regular resources and revocable resources. To the > built in allocator, the logic would be as this: > 1) Quota Role Sorter > 2) non-revocable Role Sorter > 3) Revocable Role Sorter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4744) mesos-execute should allow setting role
[ https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194552#comment-15194552 ] Jian Qiu commented on MESOS-4744: - Thanks [~lins05], may be we should create a separate ticket for adding uris in commendInfo for mesos-execute? > mesos-execute should allow setting role > --- > > Key: MESOS-4744 > URL: https://issues.apache.org/jira/browse/MESOS-4744 > Project: Mesos > Issue Type: Bug > Components: cli >Reporter: Jian Qiu >Assignee: Jian Qiu >Priority: Minor > > It will be quite useful if we can set role when running mesos-execute -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4944) Improve overlay backend so that it's writable
[ https://issues.apache.org/jira/browse/MESOS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Lin reassigned MESOS-4944: Assignee: Shuai Lin > Improve overlay backend so that it's writable > - > > Key: MESOS-4944 > URL: https://issues.apache.org/jira/browse/MESOS-4944 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Shuai Lin > > Currently, the overlay backend will provision a read-only FS. We can use an > empty directory from the container sandbox to act as the upper layer so that > it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4944) Improve overlay backend so that it's writable
[ https://issues.apache.org/jira/browse/MESOS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194538#comment-15194538 ] Shuai Lin commented on MESOS-4944: -- bq. We can use an empty directory from the container sandbox to act as the upper layer so that it's writable. With the same workaround we can also make overlay backend work for images which has only 1 layer. Currently using the overlay backend means giving up on using 1-layer images like alpine. > Improve overlay backend so that it's writable > - > > Key: MESOS-4944 > URL: https://issues.apache.org/jira/browse/MESOS-4944 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu > > Currently, the overlay backend will provision a read-only FS. We can use an > empty directory from the container sandbox to act as the upper layer so that > it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4744) mesos-execute should allow setting role
[ https://issues.apache.org/jira/browse/MESOS-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Qiu updated MESOS-4744: Shepherd: Michael Park > mesos-execute should allow setting role > --- > > Key: MESOS-4744 > URL: https://issues.apache.org/jira/browse/MESOS-4744 > Project: Mesos > Issue Type: Bug > Components: cli >Reporter: Jian Qiu >Assignee: Jian Qiu >Priority: Minor > > It will be quite useful if we can set role when running mesos-execute -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4945) Garbage collect unused docker layers in the store.
Jie Yu created MESOS-4945: - Summary: Garbage collect unused docker layers in the store. Key: MESOS-4945 URL: https://issues.apache.org/jira/browse/MESOS-4945 Project: Mesos Issue Type: Improvement Reporter: Jie Yu Right now, we don't have any garbage collection in place for docker layers. It's not straightforward to implement because we don't know what container is currently using the layer. We probably need a way to track the current usage of layers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4944) Improve overlay backend so that it's writable
Jie Yu created MESOS-4944: - Summary: Improve overlay backend so that it's writable Key: MESOS-4944 URL: https://issues.apache.org/jira/browse/MESOS-4944 Project: Mesos Issue Type: Task Reporter: Jie Yu Currently, the overlay backend will provision a read-only FS. We can use an empty directory from the container sandbox to act as the upper layer so that it's writable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4943) Reduce the size of LinuxRootfs in tests.
Jie Yu created MESOS-4943: - Summary: Reduce the size of LinuxRootfs in tests. Key: MESOS-4943 URL: https://issues.apache.org/jira/browse/MESOS-4943 Project: Mesos Issue Type: Improvement Reporter: Jie Yu Right now, LinuxRootfs copies files from the host filesystem to construct a chroot-able rootfs. We copy a lot of unnecessary files, making it very large. We can potentially strip a lot files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4886) Support mesos containerizer force_pull_image option.
[ https://issues.apache.org/jira/browse/MESOS-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4886: -- Assignee: Guangya Liu > Support mesos containerizer force_pull_image option. > > > Key: MESOS-4886 > URL: https://issues.apache.org/jira/browse/MESOS-4886 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Gilbert Song >Assignee: Guangya Liu > Labels: containerizer > > Currently for unified containerizer, images that are already cached by > metadata manager cannot be updated. User has to delete corresponding images > in store if an update is need. We should support `force_pull_image` option > for unified containerizer, to provide override option if existed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4355) Implement isolator for Docker volume
[ https://issues.apache.org/jira/browse/MESOS-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4355: -- Labels: mesosphere (was: ) > Implement isolator for Docker volume > > > Key: MESOS-4355 > URL: https://issues.apache.org/jira/browse/MESOS-4355 > Project: Mesos > Issue Type: Improvement > Components: docker, isolation >Reporter: Qian Zhang >Assignee: Guangya Liu > Labels: mesosphere > > In Docker, user can create a volume with Docker CLI, e.g., {{docker volume > create --name my-volume}}, we need to implement an isolator to make the > container launched by MesosContainerizer can use such volume. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4942) Docker runtime isolator tests may cause disk issue.
Gilbert Song created MESOS-4942: --- Summary: Docker runtime isolator tests may cause disk issue. Key: MESOS-4942 URL: https://issues.apache.org/jira/browse/MESOS-4942 Project: Mesos Issue Type: Bug Components: containerization Reporter: Gilbert Song Assignee: Gilbert Song Fix For: 0.29.0 Currently slave working directory is used as docker store dir and archive dir, which is problematic. Because slave work dir is exactly `environment->mkdtemp()`, it will get cleaned up until the end of the whole test. But the runtime isolator local puller tests cp the host's rootfs, which size is relatively big. Cleanup has to be done by each test tear down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-3870: --- Shepherd: Benjamin Mahler > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3870) Prevent out-of-order libprocess message delivery
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-3870: -- Assignee: Neil Conway > Prevent out-of-order libprocess message delivery > > > Key: MESOS-3870 > URL: https://issues.apache.org/jira/browse/MESOS-3870 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere > > I was under the impression that {{send()}} provided in-order, unreliable > message delivery. So if P1 sends to P2, P2 might see <>, , , > or — but not . > I suspect much of the code makes a similar assumption. However, it appears > that this behavior is not guaranteed. slave.cpp:2217 has the following > comment: > {noformat} > // TODO(jieyu): Here we assume that CheckpointResourcesMessages are > // ordered (i.e., slave receives them in the same order master sends > // them). This should be true in most of the cases because TCP > // enforces in order delivery per connection. However, the ordering > // is technically not guaranteed because master creates multiple > // connections to the slave in some cases (e.g., persistent socket > // to slave breaks and master uses ephemeral socket). This could > // potentially be solved by using a version number and rejecting > // stale messages according to the version number. > {noformat} > We can improve this situation by _either_: (1) fixing libprocess to guarantee > ordered message delivery, e.g., by adding a sequence number, or (2) > clarifying that ordered message delivery is not guaranteed, and ideally > providing a tool to force messages to be delivered out-of-order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3505) Support specifying Docker image by Image ID.
[ https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangya Liu updated MESOS-3505: --- Assignee: (was: Guangya Liu) > Support specifying Docker image by Image ID. > > > Key: MESOS-3505 > URL: https://issues.apache.org/jira/browse/MESOS-3505 > Project: Mesos > Issue Type: Story >Reporter: Yan Xu > Labels: mesosphere > > A common way to specify a Docker image with the docker engine is through > {{repo:tag}}, which is convenient and sufficient for most people in most > scenarios. However this combination is neither precise nor immutable. > For this reason, it's possible when an image with a {{repo:tag}} already > cached locally on an agent host and a task requiring this {{repo:tag}} > arrives, it's using an image that's different than the one the user intended. > Docker CLI already supports referring to an image by {{repo@id}}, where the > ID can have two forms: > * v1 Image ID > * digest > Native Mesos provisioner should support the same for Docker images. IMO it's > fine if image discovery by ID is not supported (and thus still requiring > {{repo:tag}} to be specified) (looks like [v2 > registry|http://docs.docker.com/registry/spec/api/] does support it) but the > user can optionally specify an image ID and match it against the cached / > newly pulled image. If the ID doesn't match the cached image, the store can > re-pull it; if the ID doesn't match the newly pulled image (manifest), the > provisioner can fail the request without having the user unknowingly running > its task on the wrong image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2824) Support pre-fetching images
[ https://issues.apache.org/jira/browse/MESOS-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194360#comment-15194360 ] Guangya Liu commented on MESOS-2824: Yes, I was planning to work on this. My thinking for this issue is introducing one HTTP endpoint in slave to enable end user can per-fetch those images. The concern is that the end user may want to write some scipts to pre-fetch images if there are thousands of mesos agents. [~jieyu] any comments for this? > Support pre-fetching images > --- > > Key: MESOS-2824 > URL: https://issues.apache.org/jira/browse/MESOS-2824 > Project: Mesos > Issue Type: Improvement > Components: isolation >Affects Versions: 0.23.0 >Reporter: Ian Downes >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, twitter > > Default container images can be specified with the --default_container_info > flag to the slave. This may be a large image that will take a long time to > initially fetch/hash/extract when the first container is provisioned. Add > optional support to start fetching the image when the slave starts and > consider not registering until the fetch is complete. > To extend that, we should support an operator endpoint so that operators can > specify images to pre-fetch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4629) Implement fault tolerance tests for the HTTP Scheduler API.
[ https://issues.apache.org/jira/browse/MESOS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194333#comment-15194333 ] Anand Mazumdar commented on MESOS-4629: --- Review chain: https://reviews.apache.org/r/44729/ > Implement fault tolerance tests for the HTTP Scheduler API. > --- > > Key: MESOS-4629 > URL: https://issues.apache.org/jira/browse/MESOS-4629 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > > Currently, the HTTP V1 API does not have fault tolerance tests similar to the > one in {{src/tests/fault_tolerance_tests.cpp}}. > For more information see MESOS-3355. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4630) Implement partition tests for the HTTP Scheduler API.
[ https://issues.apache.org/jira/browse/MESOS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194332#comment-15194332 ] Anand Mazumdar commented on MESOS-4630: --- All the current tests in {{src/tests/partition_tests.cpp}} are around master <-> agent partitions except {{PartitionTest.PartitionedSlave}} that is around if a scheduler gets a slave lost message for a partitioned slave. It would be fine to add another test around this for HTTP based schedulers to the existing file. Other than that, we are at parity with the old driver based interface. Hence, no other tests are needed unlike MESOS-4629 where we had to add separate tests for the v1 HTTP API. > Implement partition tests for the HTTP Scheduler API. > - > > Key: MESOS-4630 > URL: https://issues.apache.org/jira/browse/MESOS-4630 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > > Currently, the HTTP V1 API does not have partition tests similar to the one > in src/tests/partition_tests.cpp. > For more information see MESOS-3355. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
[ https://issues.apache.org/jira/browse/MESOS-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song reassigned MESOS-4912: --- Assignee: Gilbert Song > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails. > -- > > Key: MESOS-4912 > URL: https://issues.apache.org/jira/browse/MESOS-4912 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.28.0 > Environment: CenOS 7, SSL >Reporter: Bernd Mathiske >Assignee: Gilbert Song > Labels: mesosphere > > Observed on our CI: > {noformat} > [09:34:15] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers > [09:34:19]W: [Step 11/11] I0309 09:34:19.906719 2357 linux.cpp:81] Making > '/tmp/MLVLnv' a shared mount > [09:34:19]W: [Step 11/11] I0309 09:34:19.923548 2357 > linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [09:34:19]W: [Step 11/11] I0309 09:34:19.924705 2376 > containerizer.cpp:666] Starting container > 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of > framework '' > [09:34:19]W: [Step 11/11] I0309 09:34:19.925355 2371 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:19]W: [Step 11/11] I0309 09:34:19.925881 2377 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image1' to rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > [09:34:30]W: [Step 11/11] I0309 09:34:30.835127 2376 linux.cpp:355] Bind > mounting work directory from > '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b' > to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.835392 2376 linux.cpp:683] > Changing the ownership of the persistent volume at > '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid > 0 > [09:34:30]W: [Step 11/11] I0309 09:34:30.840425 2376 linux.cpp:723] > Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' > for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of > container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.843878 2374 > linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS > [09:34:30]W: [Step 11/11] I0309 09:34:30.848302 2371 > containerizer.cpp:666] Starting container > 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of > framework '' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848758 2371 > containerizer.cpp:1392] Destroying container > 'da610f7f-a709-4de8-94d3-74f4a520619b' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848865 2373 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087 > [09:34:30]W: [Step 11/11] I0309 09:34:30.849449 2375 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image2' to rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > [09:34:30]W: [Step 11/11] I0309 09:34:30.854038 2374 cgroups.cpp:2427] > Freezing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.856693 2372 cgroups.cpp:1409] > Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after > 2.608128ms > [09:34:30]W: [Step 11/11] I0309 09:34:30.859237 2377 cgroups.cpp:2445] > Thawing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.861454 2377 cgroups.cpp:1438] > Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us > [09:34:30]W: [Step 11/11] I0309 09:34:30.934608 2378 > containerizer.cpp:1608] Executor for container > 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited > [09:34:30]W: [Step 11/11] I0309 09:34:30.937692 2372 linux.cpp:798] > Unmounting volume > '/tmp/MLVLnv/
[jira] [Updated] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
[ https://issues.apache.org/jira/browse/MESOS-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-4912: Shepherd: Jie Yu > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails. > -- > > Key: MESOS-4912 > URL: https://issues.apache.org/jira/browse/MESOS-4912 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.28.0 > Environment: CenOS 7, SSL >Reporter: Bernd Mathiske >Assignee: Gilbert Song > Labels: mesosphere > > Observed on our CI: > {noformat} > [09:34:15] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers > [09:34:19]W: [Step 11/11] I0309 09:34:19.906719 2357 linux.cpp:81] Making > '/tmp/MLVLnv' a shared mount > [09:34:19]W: [Step 11/11] I0309 09:34:19.923548 2357 > linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [09:34:19]W: [Step 11/11] I0309 09:34:19.924705 2376 > containerizer.cpp:666] Starting container > 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of > framework '' > [09:34:19]W: [Step 11/11] I0309 09:34:19.925355 2371 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:19]W: [Step 11/11] I0309 09:34:19.925881 2377 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image1' to rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > [09:34:30]W: [Step 11/11] I0309 09:34:30.835127 2376 linux.cpp:355] Bind > mounting work directory from > '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b' > to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.835392 2376 linux.cpp:683] > Changing the ownership of the persistent volume at > '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid > 0 > [09:34:30]W: [Step 11/11] I0309 09:34:30.840425 2376 linux.cpp:723] > Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' > for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of > container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.843878 2374 > linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS > [09:34:30]W: [Step 11/11] I0309 09:34:30.848302 2371 > containerizer.cpp:666] Starting container > 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of > framework '' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848758 2371 > containerizer.cpp:1392] Destroying container > 'da610f7f-a709-4de8-94d3-74f4a520619b' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848865 2373 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087 > [09:34:30]W: [Step 11/11] I0309 09:34:30.849449 2375 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image2' to rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > [09:34:30]W: [Step 11/11] I0309 09:34:30.854038 2374 cgroups.cpp:2427] > Freezing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.856693 2372 cgroups.cpp:1409] > Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after > 2.608128ms > [09:34:30]W: [Step 11/11] I0309 09:34:30.859237 2377 cgroups.cpp:2445] > Thawing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.861454 2377 cgroups.cpp:1438] > Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us > [09:34:30]W: [Step 11/11] I0309 09:34:30.934608 2378 > containerizer.cpp:1608] Executor for container > 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited > [09:34:30]W: [Step 11/11] I0309 09:34:30.937692 2372 linux.cpp:798] > Unmounting volume > '/tmp/MLVLnv/provisioner/c
[jira] [Created] (MESOS-4941) Support update existing quota
Zhitao Li created MESOS-4941: Summary: Support update existing quota Key: MESOS-4941 URL: https://issues.apache.org/jira/browse/MESOS-4941 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Zhitao Li Assignee: Zhitao Li We want to support updating an existing quota without the cycle of delete and recreate. This avoids the possible starvation risk of losing the quota between delete and recreate, and also makes the interface friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194050#comment-15194050 ] Kevin Cox commented on MESOS-2043: -- I've uploaded matching logs from the master and one slave. This occurred after restarting the slave. Note that the IPs have been changed. > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, > mesos-master.20141104-1606-1706.log, slave.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Cox updated MESOS-2043: - Attachment: slave.log Mesos slave log. > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, > mesos-master.20141104-1606-1706.log, slave.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Cox updated MESOS-2043: - Attachment: master.log Mesos Master log fragment. > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, > mesos-master.20141104-1606-1706.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3368) Add device support in cgroups abstraction
[ https://issues.apache.org/jira/browse/MESOS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194028#comment-15194028 ] Abhishek Dasgupta commented on MESOS-3368: -- Please review: https://reviews.apache.org/r/44439/ https://reviews.apache.org/r/44796/ https://reviews.apache.org/r/44797/ > Add device support in cgroups abstraction > - > > Key: MESOS-3368 > URL: https://issues.apache.org/jira/browse/MESOS-3368 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen >Assignee: Abhishek Dasgupta > > Add support for [device > cgroups|https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt] to > aid isolators controlling access to devices. > In the future, we could think about how to numerate and control access to > devices as resource or task/container policy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193909#comment-15193909 ] Kevin Cox commented on MESOS-2043: -- I haven't restarted my nodes for a while but if it occurs again I will be sure to grab some logs. They look very similar to the ones already attached. Also if I can find time I will try to trigger it on purpose and grab the logs but I have been incredibly busy lately. > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, > mesos-master.20141104-1606-1706.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193835#comment-15193835 ] haosdent edited comment on MESOS-4869 at 3/14/16 6:27 PM: -- No, I mean run {{mesos-health-check}} in shell by hand. Because I could not reproduce your problem in my machine. I want to make sure whether this problem is related to your environment. was (Author: haosd...@gmail.com): No, I mean run {{mesos-health-check}} in shell by hand. Because > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837
[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193835#comment-15193835 ] haosdent commented on MESOS-4869: - No, I mean run {{mesos-health-check}} in shell by hand. Because > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: > {noformat} > Mem:16047M used:13340M buffers:1139M cache:776M > {noformat} > If I exec into the container (*java:8* image), I can see correctly the
[jira] [Comment Edited] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193813#comment-15193813 ] Anthony Scalisi edited comment on MESOS-4869 at 3/14/16 6:17 PM: - Sorry [~haosd...@gmail.com] are you asking me to the run the curl by hand ? If so, we already did but doesn't do much to memory issues (the endpoint is just returns "ok"). was (Author: scalp42): Sorry [~haosd...@gmail.com] are you asking me to the run the curl by hand ? If so, we already did but doesn't do much to memory issues (the endpoint is just return "ok"). > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce
[jira] [Updated] (MESOS-4879) Update glog patch to support PowerPC LE
[ https://issues.apache.org/jira/browse/MESOS-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4879: --- Summary: Update glog patch to support PowerPC LE (was: Update glog patch to suport PowerPC LE) > Update glog patch to support PowerPC LE > --- > > Key: MESOS-4879 > URL: https://issues.apache.org/jira/browse/MESOS-4879 > Project: Mesos > Issue Type: Improvement >Reporter: Chen Zhiwei >Assignee: Chen Zhiwei > > This is a part of PowerPC LE porting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193813#comment-15193813 ] Anthony Scalisi commented on MESOS-4869: Sorry [~haosd...@gmail.com] are you asking me to the run the curl by hand ? If so, we already did but doesn't do much to memory issues (the endpoint is just return "ok"). > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: > {noformat} > Mem:16047M u
[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193805#comment-15193805 ] haosdent commented on MESOS-4869: - How about run this single command? Because health-check and task are located in different processes when running. > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: > {noformat} > Mem:16047M used:13340M buffers:1139M cache:776M > {noformat} > If I exec into the co
[jira] [Updated] (MESOS-3193) Implement AppC image discovery.
[ https://issues.apache.org/jira/browse/MESOS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3193: -- Assignee: (was: Jojy Varghese) > Implement AppC image discovery. > --- > > Key: MESOS-3193 > URL: https://issues.apache.org/jira/browse/MESOS-3193 > Project: Mesos > Issue Type: Task >Reporter: Yan Xu > Labels: mesosphere, twitter, unified-containerizer-mvp > > Appc spec specifies two image discovery mechanisms: simple and meta > discovery. We need to have an abstraction for image discovery in AppcStore. > For MVP, we can implement the simple discovery first. > https://reviews.apache.org/r/34139/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3427) Support image dependencies in Appc store.
[ https://issues.apache.org/jira/browse/MESOS-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3427: -- Assignee: Jojy Varghese > Support image dependencies in Appc store. > - > > Key: MESOS-3427 > URL: https://issues.apache.org/jira/browse/MESOS-3427 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Yan Xu >Assignee: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > Fix For: 0.28.0 > > > The current version of Appc store doesn't support image dependencies. We > should implement it and ideally we should parallelize fetching these > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4815) Test private registry with authentication.
[ https://issues.apache.org/jira/browse/MESOS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4815: -- Summary: Test private registry with authentication. (was: Implement private registry test with authentication.) > Test private registry with authentication. > -- > > Key: MESOS-4815 > URL: https://issues.apache.org/jira/browse/MESOS-4815 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song > Labels: containerizer > > Unified containerizer using docker images, with authentication to test > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.
[ https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193794#comment-15193794 ] Jie Yu commented on MESOS-3505: --- [~gyliu] are you working on this ticket? Unassign if you're no longer working on it. > Support specifying Docker image by Image ID. > > > Key: MESOS-3505 > URL: https://issues.apache.org/jira/browse/MESOS-3505 > Project: Mesos > Issue Type: Story >Reporter: Yan Xu >Assignee: Guangya Liu > Labels: mesosphere > > A common way to specify a Docker image with the docker engine is through > {{repo:tag}}, which is convenient and sufficient for most people in most > scenarios. However this combination is neither precise nor immutable. > For this reason, it's possible when an image with a {{repo:tag}} already > cached locally on an agent host and a task requiring this {{repo:tag}} > arrives, it's using an image that's different than the one the user intended. > Docker CLI already supports referring to an image by {{repo@id}}, where the > ID can have two forms: > * v1 Image ID > * digest > Native Mesos provisioner should support the same for Docker images. IMO it's > fine if image discovery by ID is not supported (and thus still requiring > {{repo:tag}} to be specified) (looks like [v2 > registry|http://docs.docker.com/registry/spec/api/] does support it) but the > user can optionally specify an image ID and match it against the cached / > newly pulled image. If the ID doesn't match the cached image, the store can > re-pull it; if the ID doesn't match the newly pulled image (manifest), the > provisioner can fail the request without having the user unknowingly running > its task on the wrong image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2824) Support pre-fetching images
[ https://issues.apache.org/jira/browse/MESOS-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193780#comment-15193780 ] Jie Yu commented on MESOS-2824: --- [~gyliu] Are you still working on this? Can you unassign it if you're no longer working on that. > Support pre-fetching images > --- > > Key: MESOS-2824 > URL: https://issues.apache.org/jira/browse/MESOS-2824 > Project: Mesos > Issue Type: Improvement > Components: isolation >Affects Versions: 0.23.0 >Reporter: Ian Downes >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, twitter > > Default container images can be specified with the --default_container_info > flag to the slave. This may be a large image that will take a long time to > initially fetch/hash/extract when the first container is provisioned. Add > optional support to start fetching the image when the slave starts and > consider not registering until the fetch is complete. > To extend that, we should support an operator endpoint so that operators can > specify images to pre-fetch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3193) Implement AppC image discovery.
[ https://issues.apache.org/jira/browse/MESOS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3193: -- Description: Appc spec specifies two image discovery mechanisms: simple and meta discovery. We need to have an abstraction for image discovery in AppcStore. For MVP, we can implement the simple discovery first. Update: simple discovery is removed from the spec. Meta discovery is the only discovery mechanism right now in the spec. Simple discovery is already shipped (we support an arbitrary operator specified [URI prefix|https://github.com/apache/mesos/blob/master/docs/container-image.md#appc-support-and-current-limitations]). So this ticket should focus on implementing Meta discovery. was: Appc spec specifies two image discovery mechanisms: simple and meta discovery. We need to have an abstraction for image discovery in AppcStore. For MVP, we can implement the simple discovery first. https://reviews.apache.org/r/34139/ > Implement AppC image discovery. > --- > > Key: MESOS-3193 > URL: https://issues.apache.org/jira/browse/MESOS-3193 > Project: Mesos > Issue Type: Task >Reporter: Yan Xu > Labels: mesosphere, twitter, unified-containerizer-mvp > > Appc spec specifies two image discovery mechanisms: simple and meta > discovery. We need to have an abstraction for image discovery in AppcStore. > For MVP, we can implement the simple discovery first. > Update: simple discovery is removed from the spec. Meta discovery is the only > discovery mechanism right now in the spec. Simple discovery is already > shipped (we support an arbitrary operator specified [URI > prefix|https://github.com/apache/mesos/blob/master/docs/container-image.md#appc-support-and-current-limitations]). > So this ticket should focus on implementing Meta discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193789#comment-15193789 ] Anthony Scalisi commented on MESOS-4869: I'd like to add that it's really due to the mesos-health-check process (slaves are fine when Marathon doing the health checks). > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: > {noformat} > Mem:16047M used:13340M buffers:1139M cache:776M > {nofor
[jira] [Comment Edited] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193774#comment-15193774 ] Anthony Scalisi edited comment on MESOS-4869 at 3/14/16 6:06 PM: - What do you mean ? Without having Mesos doing the health checks, on a host with 6 tasks for example: {noformat} scalp@mesos-slave-i-d00b6017 $ free -m total used free sharedbuffers cached Mem: 16047 15306740 0 3174 2547 -/+ buffers/cache: 9583 6463 Swap:0 0 0 root@mesos-slave-i-d00b6017 # docker stats --no-stream CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 33cb349404e13.23% 897.8 MB / 1.611 GB 55.74% 4.859 GB / 4.625 GB 53.25 kB / 61.44 kB 61eba49cf71d3.22% 1.166 GB / 1.611 GB 72.41% 5.49 GB / 5.155 GB106.5 kB / 118.8 kB 630739e120323.76% 1.163 GB / 1.611 GB 72.22% 3.891 GB / 3.657 GB 348.2 kB / 118.8 kB b5b9da9facfb2.84% 901.9 MB / 1.611 GB 55.99% 2.254 GB / 2.153 GB 0 B / 118.8 kB dcd2a73f71a93.55% 1.29 GB / 1.611 GB80.10% 2.726 GB / 2.672 GB 0 B / 118.8 kB de923d88a7813.17% 889.5 MB / 1.611 GB 55.23% 3.817 GB / 3.645 GB 36.86 kB / 61.44 kB {noformat} Or another with 11 tasks: {noformat} root@mesos-slave-i-0fe036d7 # free -m total used free sharedbuffers cached Mem: 16047 15189857 0 1347688 -/+ buffers/cache: 13153 2893 Swap:0 root@mesos-slave-i-0fe036d7 # docker stats --no-stream CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 1527ccec35620.39% 46.75 MB / 134.2 MB 34.83% 318.5 MB / 283.5 MB 634.9 kB / 0 B 16c0afe372f13.12% 1.139 GB / 1.611 GB 70.69% 5.443 GB / 5.139 GB 1.757 MB / 118.8 kB 2aaac6a34f3b3.50% 1.34 GB / 1.611 GB83.18% 9.928 GB / 9.006 GB 2.646 MB / 118.8 kB 4bda58242e662.57% 875.5 MB / 1.611 GB 54.36% 4.853 GB / 4.632 GB 135.2 kB / 61.44 kB 67ed575e6f442.14% 1.171 GB / 1.611 GB 72.73% 3.878 GB / 3.664 GB 4.739 MB / 118.8 kB 87010c4fa5474.23% 1.208 GB / 1.611 GB 74.99% 313.5 MB / 419.1 MB 213 kB / 94.21 kB 8ca7c160b1961.73% 730.4 MB / 1.611 GB 45.35% 305.6 MB / 447.7 MB 0 B / 61.44 kB cbac44b2663c4.66% 1.088 GB / 1.611 GB 67.53% 16.48 GB / 14.91 GB 262.1 kB / 61.44 kB d0fe165aecac3.02% 901.2 MB / 1.611 GB 55.95% 1.573 GB / 1.555 GB 106.5 kB / 61.44 kB df668f59a1493.57% 1.143 GB / 1.611 GB 70.98% 2.732 GB / 2.681 GB 1.888 MB / 118.8 kB e0fc97fa33cf3.43% 1.034 GB / 1.611 GB 64.21% 3.823 GB / 3.655 GB 2.433 MB / 61.44 kB {noformat} If you were referring to the actual Mesos processes: {noformat} root@mesos-slave-i-0fe036d7 # ps awwuxf | egrep "mesos-docker|mesos-slave" | egrep -v "grep|node" root 27470 0.3 0.3 962568 51020 ?Ssl Mar11 14:46 /usr/sbin/mesos-slave --master=zk://10.92.21.247:2181,10.92.31.170:2181,10.92.41.178:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --docker_stop_timeout=30secs --executor_registration_timeout=5mins --executor_shutdown_grace_period=90secs --gc_delay=1weeks --hostname=mesos-slave-i-0fe036d7.example.com --ip=10.92.22.241 --isolation=cgroups/cpu,cgroups/mem --logbufsecs=1 --recover=reconnect --strict=false --work_dir=/opt/mesos --attributes=az:us-west-2a --resources=cpus:4;mem:16047;ports:[31000-32000] root 27511 0.0 0.0 5916 596 ?SMar11 0:00 \_ logger -p user.info -t mesos-slave[27470] root 27512 0.0 0.0 5916 1884 ?SMar11 0:00 \_ logger -p user.err -t mesos-slave[27470] root 28907 0.1 0.0 802068 5360 ?Ssl Mar11 7:02 \_ mesos-docker-executor --container=mesos-29e183be-f611-41b4-824c-2d05b052231b-S3.f552977a-040c-41a2-bb60-0e441c6491ef --docker=docker --docker_socket=/var/run/docker.sock --help=false --launcher_dir=/usr/libexec/mesos --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/opt/mesos/slaves/29e183be-f611-41b4-824c-2d05b052231b-S3/frameworks/8ace1cd7-5a79-40f6-99cd-62c87ce2ef49-0001/executors/prod_talkk_metric-green.cac70614-e7d1-11e5-a617-02429957d388/runs/f552977a-040c-41a2-bb60-0e441c6491
[jira] [Comment Edited] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193774#comment-15193774 ] Anthony Scalisi edited comment on MESOS-4869 at 3/14/16 6:05 PM: - What do you mean ? Without having Mesos doing the health checks, on a host with 6 tasks for example: {noformat} scalp@mesos-slave-i-d00b6017 $ free -m total used free sharedbuffers cached Mem: 16047 15306740 0 3174 2547 -/+ buffers/cache: 9583 6463 Swap:0 0 0 root@mesos-slave-i-d00b6017 # docker stats --no-stream CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 33cb349404e13.23% 897.8 MB / 1.611 GB 55.74% 4.859 GB / 4.625 GB 53.25 kB / 61.44 kB 61eba49cf71d3.22% 1.166 GB / 1.611 GB 72.41% 5.49 GB / 5.155 GB106.5 kB / 118.8 kB 630739e120323.76% 1.163 GB / 1.611 GB 72.22% 3.891 GB / 3.657 GB 348.2 kB / 118.8 kB b5b9da9facfb2.84% 901.9 MB / 1.611 GB 55.99% 2.254 GB / 2.153 GB 0 B / 118.8 kB dcd2a73f71a93.55% 1.29 GB / 1.611 GB80.10% 2.726 GB / 2.672 GB 0 B / 118.8 kB de923d88a7813.17% 889.5 MB / 1.611 GB 55.23% 3.817 GB / 3.645 GB 36.86 kB / 61.44 kB {noformat} Or another with 11 tasks: {noformat} root@mesos-slave-i-0fe036d7 # free -m total used free sharedbuffers cached Mem: 16047 15189857 0 1347688 -/+ buffers/cache: 13153 2893 Swap:0 root@mesos-slave-i-0fe036d7 # docker stats --no-stream CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 1527ccec35620.39% 46.75 MB / 134.2 MB 34.83% 318.5 MB / 283.5 MB 634.9 kB / 0 B 16c0afe372f13.12% 1.139 GB / 1.611 GB 70.69% 5.443 GB / 5.139 GB 1.757 MB / 118.8 kB 2aaac6a34f3b3.50% 1.34 GB / 1.611 GB83.18% 9.928 GB / 9.006 GB 2.646 MB / 118.8 kB 4bda58242e662.57% 875.5 MB / 1.611 GB 54.36% 4.853 GB / 4.632 GB 135.2 kB / 61.44 kB 67ed575e6f442.14% 1.171 GB / 1.611 GB 72.73% 3.878 GB / 3.664 GB 4.739 MB / 118.8 kB 87010c4fa5474.23% 1.208 GB / 1.611 GB 74.99% 313.5 MB / 419.1 MB 213 kB / 94.21 kB 8ca7c160b1961.73% 730.4 MB / 1.611 GB 45.35% 305.6 MB / 447.7 MB 0 B / 61.44 kB cbac44b2663c4.66% 1.088 GB / 1.611 GB 67.53% 16.48 GB / 14.91 GB 262.1 kB / 61.44 kB d0fe165aecac3.02% 901.2 MB / 1.611 GB 55.95% 1.573 GB / 1.555 GB 106.5 kB / 61.44 kB df668f59a1493.57% 1.143 GB / 1.611 GB 70.98% 2.732 GB / 2.681 GB 1.888 MB / 118.8 kB e0fc97fa33cf3.43% 1.034 GB / 1.611 GB 64.21% 3.823 GB / 3.655 GB 2.433 MB / 61.44 kB {noformat} If you were referring to the actual Mesos processes: {noformat} root@mesos-slave-i-0fe036d7 # ps awwuxf | egrep "mesos-docker|mesos-slave" | egrep -v "grep|node" root 27470 0.3 0.3 962568 51020 ?Ssl Mar11 14:46 /usr/sbin/mesos-slave --master=zk://10.92.21.247:2181,10.92.31.170:2181,10.92.41.178:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --docker_stop_timeout=30secs --executor_registration_timeout=5mins --executor_shutdown_grace_period=90secs --gc_delay=1weeks --hostname=mesos-slave-i-0fe036d7.gz-prod.us-west-2a.gearzero.us --ip=10.92.22.241 --isolation=cgroups/cpu,cgroups/mem --logbufsecs=1 --recover=reconnect --strict=false --work_dir=/opt/mesos --attributes=az:us-west-2a --resources=cpus:4;mem:16047;ports:[31000-32000] root 27511 0.0 0.0 5916 596 ?SMar11 0:00 \_ logger -p user.info -t mesos-slave[27470] root 27512 0.0 0.0 5916 1884 ?SMar11 0:00 \_ logger -p user.err -t mesos-slave[27470] root 28907 0.1 0.0 802068 5360 ?Ssl Mar11 7:02 \_ mesos-docker-executor --container=mesos-29e183be-f611-41b4-824c-2d05b052231b-S3.f552977a-040c-41a2-bb60-0e441c6491ef --docker=docker --docker_socket=/var/run/docker.sock --help=false --launcher_dir=/usr/libexec/mesos --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/opt/mesos/slaves/29e183be-f611-41b4-824c-2d05b052231b-S3/frameworks/8ace1cd7-5a79-40f6-99cd-62c87ce2ef49-0001/executors/prod_talkk_metric-green.cac70614-e7d1-11e5-a617-02429957d388/runs/f552977a-040c-4
[jira] [Comment Edited] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193762#comment-15193762 ] Anthony Scalisi edited comment on MESOS-4869 at 3/14/16 6:06 PM: - We could see the increase immediately after launching a task. was (Author: scalp42): We could get the increase immediately after launching a task. > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: >
[jira] [Updated] (MESOS-4814) Test private registry with ssl enabled/disabled.
[ https://issues.apache.org/jira/browse/MESOS-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4814: -- Assignee: (was: Gilbert Song) > Test private registry with ssl enabled/disabled. > > > Key: MESOS-4814 > URL: https://issues.apache.org/jira/browse/MESOS-4814 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song > Labels: containerizer > > Test unified containerizer using docker images, have ssl enabled to test the > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4815) Implement private registry test with authentication.
[ https://issues.apache.org/jira/browse/MESOS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4815: -- Assignee: (was: Gilbert Song) > Implement private registry test with authentication. > > > Key: MESOS-4815 > URL: https://issues.apache.org/jira/browse/MESOS-4815 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song > Labels: containerizer > > Unified containerizer using docker images, with authentication to test > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193774#comment-15193774 ] Anthony Scalisi commented on MESOS-4869: What do you mean ? Without having Mesos doing the health checks, on a host with 6 tasks for example: {noformat} scalp@mesos-slave-i-d00b6017 $ free -m total used free sharedbuffers cached Mem: 16047 15306740 0 3174 2547 -/+ buffers/cache: 9583 6463 Swap:0 0 0 root@mesos-slave-i-d00b6017 # docker stats --no-stream CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 33cb349404e13.23% 897.8 MB / 1.611 GB 55.74% 4.859 GB / 4.625 GB 53.25 kB / 61.44 kB 61eba49cf71d3.22% 1.166 GB / 1.611 GB 72.41% 5.49 GB / 5.155 GB106.5 kB / 118.8 kB 630739e120323.76% 1.163 GB / 1.611 GB 72.22% 3.891 GB / 3.657 GB 348.2 kB / 118.8 kB b5b9da9facfb2.84% 901.9 MB / 1.611 GB 55.99% 2.254 GB / 2.153 GB 0 B / 118.8 kB dcd2a73f71a93.55% 1.29 GB / 1.611 GB80.10% 2.726 GB / 2.672 GB 0 B / 118.8 kB de923d88a7813.17% 889.5 MB / 1.611 GB 55.23% 3.817 GB / 3.645 GB 36.86 kB / 61.44 kB {noformat} Or another with 11 tasks: {noformat} root@mesos-slave-i-0fe036d7 # free -m total used free sharedbuffers cached Mem: 16047 15189857 0 1347688 -/+ buffers/cache: 13153 2893 Swap:0 root@mesos-slave-i-0fe036d7 # docker stats --no-stream CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 1527ccec35620.39% 46.75 MB / 134.2 MB 34.83% 318.5 MB / 283.5 MB 634.9 kB / 0 B 16c0afe372f13.12% 1.139 GB / 1.611 GB 70.69% 5.443 GB / 5.139 GB 1.757 MB / 118.8 kB 2aaac6a34f3b3.50% 1.34 GB / 1.611 GB83.18% 9.928 GB / 9.006 GB 2.646 MB / 118.8 kB 4bda58242e662.57% 875.5 MB / 1.611 GB 54.36% 4.853 GB / 4.632 GB 135.2 kB / 61.44 kB 67ed575e6f442.14% 1.171 GB / 1.611 GB 72.73% 3.878 GB / 3.664 GB 4.739 MB / 118.8 kB 87010c4fa5474.23% 1.208 GB / 1.611 GB 74.99% 313.5 MB / 419.1 MB 213 kB / 94.21 kB 8ca7c160b1961.73% 730.4 MB / 1.611 GB 45.35% 305.6 MB / 447.7 MB 0 B / 61.44 kB cbac44b2663c4.66% 1.088 GB / 1.611 GB 67.53% 16.48 GB / 14.91 GB 262.1 kB / 61.44 kB d0fe165aecac3.02% 901.2 MB / 1.611 GB 55.95% 1.573 GB / 1.555 GB 106.5 kB / 61.44 kB df668f59a1493.57% 1.143 GB / 1.611 GB 70.98% 2.732 GB / 2.681 GB 1.888 MB / 118.8 kB e0fc97fa33cf3.43% 1.034 GB / 1.611 GB 64.21% 3.823 GB / 3.655 GB 2.433 MB / 61.44 kB {noformat} > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8
[jira] [Commented] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193762#comment-15193762 ] Anthony Scalisi commented on MESOS-4869: We could get the increase immediately after launching a task. > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: > {noformat} > Mem:16047M used:13340M buffers:1139M cache:776M > {noformat} > If I exec into the container (*java:8* image), I can see cor
[jira] [Updated] (MESOS-4819) Add documentation for Appc image discovery.
[ https://issues.apache.org/jira/browse/MESOS-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4819: -- Assignee: (was: Jojy Varghese) > Add documentation for Appc image discovery. > --- > > Key: MESOS-4819 > URL: https://issues.apache.org/jira/browse/MESOS-4819 > Project: Mesos > Issue Type: Documentation > Components: containerization >Reporter: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > > Add documentation for the Appc image discovery feature that covers: > - Use case > - Implementation detail (Simple discovery). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4678) Upgrade vendored Protobuf to 2.6.1
[ https://issues.apache.org/jira/browse/MESOS-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4678: -- Sprint: Mesosphere Sprint 31 > Upgrade vendored Protobuf to 2.6.1 > -- > > Key: MESOS-4678 > URL: https://issues.apache.org/jira/browse/MESOS-4678 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Neil Conway >Assignee: Chen Zhiwei > Labels: 3rdParty, mesosphere, protobuf, tech-debt > > We currently vendor Protobuf 2.5.0. We should upgrade to Protobuf 2.6.1. This > introduces various bugfixes, performance improvements, and at least one new > feature we might want to eventually take advantage of ({{map}} data type). > AFAIK there should be no backward compatibility concerns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4803) Update vendored libev to 4.22
[ https://issues.apache.org/jira/browse/MESOS-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4803: -- Sprint: Mesosphere Sprint 31 > Update vendored libev to 4.22 > - > > Key: MESOS-4803 > URL: https://issues.apache.org/jira/browse/MESOS-4803 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Chen Zhiwei > > The motivation is that libev 4.22 has officially supported IBM Power > (ppc64le), so this is needed by > [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4802) Update leveldb patch file to suport PowerPC LE
[ https://issues.apache.org/jira/browse/MESOS-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4802: -- Sprint: Mesosphere Sprint 31 > Update leveldb patch file to suport PowerPC LE > -- > > Key: MESOS-4802 > URL: https://issues.apache.org/jira/browse/MESOS-4802 > Project: Mesos > Issue Type: Improvement >Reporter: Qian Zhang >Assignee: Chen Zhiwei > > See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / > bug fixes. > The motivation is that leveldb 1.18 has officially supported IBM Power > (ppc64le), so this is needed by > [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312]. > Update: Since someone updated leveldb to 1.4, so I only update the patch file > to support PowerPC LE. Because I don't think upgrade 3rdparty library > frequently is a good thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4886) Support mesos containerizer force_pull_image option.
[ https://issues.apache.org/jira/browse/MESOS-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4886: -- Assignee: (was: Gilbert Song) > Support mesos containerizer force_pull_image option. > > > Key: MESOS-4886 > URL: https://issues.apache.org/jira/browse/MESOS-4886 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Gilbert Song > Labels: containerizer > > Currently for unified containerizer, images that are already cached by > metadata manager cannot be updated. User has to delete corresponding images > in store if an update is need. We should support `force_pull_image` option > for unified containerizer, to provide override option if existed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4902) Add authentication to remaining agent endpoints
[ https://issues.apache.org/jira/browse/MESOS-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-4902: -- Shepherd: Adam B > Add authentication to remaining agent endpoints > --- > > Key: MESOS-4902 > URL: https://issues.apache.org/jira/browse/MESOS-4902 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Greg Mann > Labels: authentication, http, mesosphere, security > > In addition to the endpoints addressed by MESOS-4850, the following endpoints > would also benefit from HTTP authentication: > * {{/files/*}} > * {{/profiler/*}} > * {{/logging/toggle}} > * {{/metrics/snapshot}} > * {{/monitor/statistics}} > * {{/system/stats.json}} > Adding HTTP authentication to these endpoints is a bit more complicated: some > endpoints are defined at the libprocess level, while others are defined in > code that is shared by the master and agent. > While working on MESOS-4850, it became apparent that since our tests use the > same instance of libprocess for both master and agent, different default > authentication realms must be used for master/agent so that HTTP > authentication can be independently enabled/disabled for each. > We should establish a mechanism for making an endpoint authenticated that > allows us to: > 1) Install an endpoint like {{/files}}, whose code is shared by the master > and agent, with different authentication realms for the master and agent > 2) Avoid hard-coding a default authentication realm into libprocess, to > permit the use of different authentication realms for the master and agent > and to keep application-level concerns from leaking into libprocess > Another option would be to use a single default authentication realm and > always enable or disable HTTP authentication for *both* the master and agent > in tests. However, this wouldn't allow us to test scenarios where HTTP > authentication is enabled on one but disabled on the other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4939) Support specifying per-container docker registry.
Jie Yu created MESOS-4939: - Summary: Support specifying per-container docker registry. Key: MESOS-4939 URL: https://issues.apache.org/jira/browse/MESOS-4939 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Gilbert Song Currently, we only support a per agent flag to specify the docker registry. We should instead, allow people to specify the registry as part of the docker image name (like `docker pull` does). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4939) Support specifying per-container docker registry.
[ https://issues.apache.org/jira/browse/MESOS-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4939: -- Story Points: 3 > Support specifying per-container docker registry. > - > > Key: MESOS-4939 > URL: https://issues.apache.org/jira/browse/MESOS-4939 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Gilbert Song > > Currently, we only support a per agent flag to specify the docker registry. > We should instead, allow people to specify the registry as part of the docker > image name (like `docker pull` does). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4879) Update glog patch to suport PowerPC LE
[ https://issues.apache.org/jira/browse/MESOS-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193699#comment-15193699 ] Abhishek Dasgupta commented on MESOS-4879: -- Hi, I think we should update upstream glog to build on power and then we may use that version of glog in mesos. That way, it would be easier for maintenance in future also. Currently, I am working on upstreaming the glog patch to support powerpc in glog project. > Update glog patch to suport PowerPC LE > -- > > Key: MESOS-4879 > URL: https://issues.apache.org/jira/browse/MESOS-4879 > Project: Mesos > Issue Type: Improvement >Reporter: Chen Zhiwei >Assignee: Chen Zhiwei > > This is a part of PowerPC LE porting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")
[ https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4877: -- Story Points: 3 > Mesos containerizer can't handle top level docker image like "alpine" (must > use "library/alpine") > - > > Key: MESOS-4877 > URL: https://issues.apache.org/jira/browse/MESOS-4877 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.27.0, 0.27.1 >Reporter: Shuai Lin >Assignee: Shuai Lin > > This can be demonstrated with the {{mesos-execute}} command: > # Docker containerizer with image {{alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{alpine}}: failure > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{library/alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=library/alpine > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=localhost:5050 > {code} > In the slave logs: > {code} > ea-4460-83 > 9c-838da86af34c-0007' > I0306 16:32:41.418269 3403 metadata_manager.cpp:159] Looking for image > 'alpine:latest' > I0306 16:32:41.418699 3403 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test > /store/docker/staging/ka7MlQ' > E0306 16:32:43.098131 3400 slave.cpp:3773] Container > '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of > framework 4f055c6f-1bea-4460-839c-838da86af34c-0 > 007 failed to start: Collect failed: Unexpected HTTP response '401 > Unauthorized > {code} > curl command executed: > {code} > $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and > proc.name=curl >16:42:53.198998042 curl -s -S -L -D - > https://registry-1.docker.io:443/v2/alpine/manifests/latest > 16:42:53.784958541 curl -s -S -L -D - > https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull > 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer > eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw > https://registry-1.docker.io:443/v2/alpine/manifests/latest > {code} > Also got the same result with {{ubuntu}} docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4938) Support docker registry authentication
[ https://issues.apache.org/jira/browse/MESOS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4938: -- Labels: mesosphere (was: ) > Support docker registry authentication > -- > > Key: MESOS-4938 > URL: https://issues.apache.org/jira/browse/MESOS-4938 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Gilbert Song > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4938) Support docker registry authentication
[ https://issues.apache.org/jira/browse/MESOS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4938: -- Assignee: Gilbert Song Sprint: Mesosphere Sprint 31 We first need to implement the authentication (HTTP Basic) for getting the auth token. Then, we need to figure out how to pass the credentials. > Support docker registry authentication > -- > > Key: MESOS-4938 > URL: https://issues.apache.org/jira/browse/MESOS-4938 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Gilbert Song > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4938) Support docker registry authentication
[ https://issues.apache.org/jira/browse/MESOS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4938: -- Story Points: 5 > Support docker registry authentication > -- > > Key: MESOS-4938 > URL: https://issues.apache.org/jira/browse/MESOS-4938 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Gilbert Song > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4938) Support docker registry authentication
Jie Yu created MESOS-4938: - Summary: Support docker registry authentication Key: MESOS-4938 URL: https://issues.apache.org/jira/browse/MESOS-4938 Project: Mesos Issue Type: Task Reporter: Jie Yu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4877) Mesos containerizer can't handle top level docker image like "alpine" (must use "library/alpine")
[ https://issues.apache.org/jira/browse/MESOS-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4877: -- Sprint: Mesosphere Sprint 31 > Mesos containerizer can't handle top level docker image like "alpine" (must > use "library/alpine") > - > > Key: MESOS-4877 > URL: https://issues.apache.org/jira/browse/MESOS-4877 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.27.0, 0.27.1 >Reporter: Shuai Lin >Assignee: Shuai Lin > > This can be demonstrated with the {{mesos-execute}} command: > # Docker containerizer with image {{alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=docker > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{alpine}}: failure > {code} > sudo ./build/src/mesos-execute --docker_image=alpine --containerizer=mesos > --name=just-a-test --command="sleep 1000" --master=localhost:5050 > {code} > # Mesos containerizer with image {{library/alpine}}: success > {code} > sudo ./build/src/mesos-execute --docker_image=library/alpine > --containerizer=mesos --name=just-a-test --command="sleep 1000" > --master=localhost:5050 > {code} > In the slave logs: > {code} > ea-4460-83 > 9c-838da86af34c-0007' > I0306 16:32:41.418269 3403 metadata_manager.cpp:159] Looking for image > 'alpine:latest' > I0306 16:32:41.418699 3403 registry_puller.cpp:194] Pulling image > 'alpine:latest' from > 'docker-manifest://registry-1.docker.io:443alpine?latest#https' to > '/tmp/mesos-test > /store/docker/staging/ka7MlQ' > E0306 16:32:43.098131 3400 slave.cpp:3773] Container > '4bf9132d-9a57-4baa-a78c-e7164e93ace6' for executor 'just-a-test' of > framework 4f055c6f-1bea-4460-839c-838da86af34c-0 > 007 failed to start: Collect failed: Unexpected HTTP response '401 > Unauthorized > {code} > curl command executed: > {code} > $ sudo sysdig -A -p "*%evt.time %proc.cmdline" evt.type=execve and > proc.name=curl >16:42:53.198998042 curl -s -S -L -D - > https://registry-1.docker.io:443/v2/alpine/manifests/latest > 16:42:53.784958541 curl -s -S -L -D - > https://auth.docker.io/token?service=registry.docker.io&scope=repository:alpine:pull > 16:42:54.294192024 curl -s -S -L -D - -H Authorization: Bearer > eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVVRd1FnWURWUVFERXp0Uk5Gb3pPa2RYTjBrNldGUlFSRHBJVFRSUk9rOVVWRmc2TmtGRlF6cFNUVE5ET2tGU01rTTZUMFkzTnpwQ1ZrVkJPa2xHUlVrNlExazFTekFlRncweE5UQTJNalV4T1RVMU5EWmFGdzB4TmpBMk1qUXhPVFUxTkRaYU1FWXhSREJDQmdOVkJBTVRPMGhHU1UwNldGZFZWam8yUVZkSU9sWlpUVEk2TTFnMVREcFNWREkxT2s5VFNrbzZTMVExUmpwWVRsSklPbFJMTmtnNlMxUkxOanBCUVV0VU1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXl2UzIvdEI3T3JlMkVxcGRDeFdtS1NqV1N2VmJ2TWUrWGVFTUNVMDByQjI0akNiUVhreFdmOSs0MUxQMlZNQ29BK0RMRkIwVjBGZGdwajlOWU5rL2pxT0JzakNCcnpBT0JnTlZIUThCQWY4RUJBTUNBSUF3RHdZRFZSMGxCQWd3QmdZRVZSMGxBREJFQmdOVkhRNEVQUVE3U0VaSlRUcFlWMVZXT2paQlYwZzZWbGxOTWpveldEVk1PbEpVTWpVNlQxTktTanBMVkRWR09saE9Va2c2VkVzMlNEcExWRXMyT2tGQlMxUXdSZ1lEVlIwakJEOHdQWUE3VVRSYU16cEhWemRKT2xoVVVFUTZTRTAwVVRwUFZGUllPalpCUlVNNlVrMHpRenBCVWpKRE9rOUdOemM2UWxaRlFUcEpSa1ZKT2tOWk5Vc3dDZ1lJS29aSXpqMEVBd0lEU1FBd1JnSWhBTXZiT2h4cHhrTktqSDRhMFBNS0lFdXRmTjZtRDFvMWs4ZEJOVGxuWVFudkFpRUF0YVJGSGJSR2o4ZlVSSzZ4UVJHRURvQm1ZZ3dZelR3Z3BMaGJBZzNOUmFvPSJdfQ.eyJhY2Nlc3MiOltdLCJhdWQiOiJyZWdpc3RyeS5kb2NrZXIuaW8iLCJleHAiOjE0NTcyODI4NzQsImlhdCI6MTQ1NzI4MjU3NCwiaXNzIjoiYXV0aC5kb2NrZXIuaW8iLCJqdGkiOiJaOGtyNXZXNEJMWkNIRS1IcVJIaCIsIm5iZiI6MTQ1NzI4MjU3NCwic3ViIjoiIn0.C2wtJq_P-m0buPARhmQjDfh6ztIAhcvgN3tfWIZEClSgXlVQ_sAQXAALNZKwAQL2Chj7NpHX--0GW-aeL_28Aw > https://registry-1.docker.io:443/v2/alpine/manifests/latest > {code} > Also got the same result with {{ubuntu}} docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4814) Test private registry with ssl enabled/disabled.
[ https://issues.apache.org/jira/browse/MESOS-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4814: -- Sprint: (was: Mesosphere Sprint 30) > Test private registry with ssl enabled/disabled. > > > Key: MESOS-4814 > URL: https://issues.apache.org/jira/browse/MESOS-4814 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > > Test unified containerizer using docker images, have ssl enabled to test the > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4815) Implement private registry test with authentication.
[ https://issues.apache.org/jira/browse/MESOS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4815: -- Sprint: (was: Mesosphere Sprint 30) > Implement private registry test with authentication. > > > Key: MESOS-4815 > URL: https://issues.apache.org/jira/browse/MESOS-4815 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > > Unified containerizer using docker images, with authentication to test > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4937) Investigate container security options for Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4937: -- Story Points: 5 > Investigate container security options for Mesos containerizer > -- > > Key: MESOS-4937 > URL: https://issues.apache.org/jira/browse/MESOS-4937 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu >Assignee: Jojy Varghese > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer and come up with a list of features that we want to > support in MVP. > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4937) Investigate container security options for Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4937: -- Sprint: Mesosphere Sprint 31 > Investigate container security options for Mesos containerizer > -- > > Key: MESOS-4937 > URL: https://issues.apache.org/jira/browse/MESOS-4937 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer and come up with a list of features that we want to > support in MVP. > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4937) Investigate container security options for Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4937: -- Assignee: Jojy Varghese > Investigate container security options for Mesos containerizer > -- > > Key: MESOS-4937 > URL: https://issues.apache.org/jira/browse/MESOS-4937 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu >Assignee: Jojy Varghese > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer and come up with a list of features that we want to > support in MVP. > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4937) Investigate container security options for Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4937: -- Shepherd: Jie Yu > Investigate container security options for Mesos containerizer > -- > > Key: MESOS-4937 > URL: https://issues.apache.org/jira/browse/MESOS-4937 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu >Assignee: Jojy Varghese > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer and come up with a list of features that we want to > support in MVP. > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4936) Improve container security for Mesos containerizer.
[ https://issues.apache.org/jira/browse/MESOS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4936: -- Labels: mesosphere (was: ) > Improve container security for Mesos containerizer. > --- > > Key: MESOS-4936 > URL: https://issues.apache.org/jira/browse/MESOS-4936 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer: > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4936) Improve container security for Mesos containerizer.
[ https://issues.apache.org/jira/browse/MESOS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4936: -- Component/s: containerization > Improve container security for Mesos containerizer. > --- > > Key: MESOS-4936 > URL: https://issues.apache.org/jira/browse/MESOS-4936 > Project: Mesos > Issue Type: Epic > Components: containerization >Reporter: Jie Yu > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer: > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4937) Investigate container security options for Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4937: -- Labels: mesosphere (was: ) > Investigate container security options for Mesos containerizer > -- > > Key: MESOS-4937 > URL: https://issues.apache.org/jira/browse/MESOS-4937 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer and come up with a list of features that we want to > support in MVP. > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4937) Investigate container security options for Mesos containerizer
Jie Yu created MESOS-4937: - Summary: Investigate container security options for Mesos containerizer Key: MESOS-4937 URL: https://issues.apache.org/jira/browse/MESOS-4937 Project: Mesos Issue Type: Task Reporter: Jie Yu We should investigate the following to improve the container security for Mesos containerizer and come up with a list of features that we want to support in MVP. 1) Capabilities 2) User namespace 3) Seccomp 4) SELinux 5) AppArmor We should investigate what other container systems are doing regarding security: 1) [k8s| https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] 2) [docker|https://docs.docker.com/engine/security/security/] 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4937) Investigate container security options for Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4937: -- Component/s: containerization > Investigate container security options for Mesos containerizer > -- > > Key: MESOS-4937 > URL: https://issues.apache.org/jira/browse/MESOS-4937 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jie Yu > Labels: mesosphere > > We should investigate the following to improve the container security for > Mesos containerizer and come up with a list of features that we want to > support in MVP. > 1) Capabilities > 2) User namespace > 3) Seccomp > 4) SELinux > 5) AppArmor > We should investigate what other container systems are doing regarding > security: > 1) [k8s| > https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] > 2) [docker|https://docs.docker.com/engine/security/security/] > 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4936) Improve container security for Mesos containerizer.
Jie Yu created MESOS-4936: - Summary: Improve container security for Mesos containerizer. Key: MESOS-4936 URL: https://issues.apache.org/jira/browse/MESOS-4936 Project: Mesos Issue Type: Epic Reporter: Jie Yu We should investigate the following to improve the container security for Mesos containerizer: 1) Capabilities 2) User namespace 3) Seccomp 4) SELinux 5) AppArmor We should investigate what other container systems are doing regarding security: 1) [k8s| https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2905] 2) [docker|https://docs.docker.com/engine/security/security/] 3) [oci|https://github.com/opencontainers/specs/blob/master/config.md] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4814) Test private registry with ssl enabled/disabled.
[ https://issues.apache.org/jira/browse/MESOS-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4814: -- Summary: Test private registry with ssl enabled/disabled. (was: Implement private registry test with ssl.) > Test private registry with ssl enabled/disabled. > > > Key: MESOS-4814 > URL: https://issues.apache.org/jira/browse/MESOS-4814 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > > Test unified containerizer using docker images, have ssl enabled to test the > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4814) Implement private registry test with ssl.
[ https://issues.apache.org/jira/browse/MESOS-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193629#comment-15193629 ] Jie Yu commented on MESOS-4814: --- SSL is still needed, but 'curl' will handle it automatically. So Mesos does not have to be built with ssl enabled. > Implement private registry test with ssl. > - > > Key: MESOS-4814 > URL: https://issues.apache.org/jira/browse/MESOS-4814 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > > Test unified containerizer using docker images, have ssl enabled to test the > private registry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4781) Executor env variables should not be leaked to the command task.
[ https://issues.apache.org/jira/browse/MESOS-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4781: -- Description: Currently, command task inherits the env variables of the command executor. This is less ideal because the command executor environment variables include some Mesos internal env variables like MESOS_XXX and LIBPROCESS_XXX. Also, this behavior does not match what Docker containerizer does. We should construct the env variables from scratch for the command task, rather than relying on inheriting the env variables from the command executor. (was: Currently mesos command executor just inherits all environment variables from the slave. This could be problematic because when unified containerizer launches the docker images like mongo or jenkins. Some of the environment variables are duplicated (such as `PATH` etc.), which causes those docker images not executing properly. We should prevents those slave environments exposed to command line executor.) > Executor env variables should not be leaked to the command task. > > > Key: MESOS-4781 > URL: https://issues.apache.org/jira/browse/MESOS-4781 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > Currently, command task inherits the env variables of the command executor. > This is less ideal because the command executor environment variables include > some Mesos internal env variables like MESOS_XXX and LIBPROCESS_XXX. Also, > this behavior does not match what Docker containerizer does. We should > construct the env variables from scratch for the command task, rather than > relying on inheriting the env variables from the command executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4781) Executor env variables should not be leaked to the command task.
[ https://issues.apache.org/jira/browse/MESOS-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4781: -- Summary: Executor env variables should not be leaked to the command task. (was: Forbid command executor to inherit environment variables from slave.) > Executor env variables should not be leaked to the command task. > > > Key: MESOS-4781 > URL: https://issues.apache.org/jira/browse/MESOS-4781 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > Currently mesos command executor just inherits all environment variables from > the slave. This could be problematic because when unified containerizer > launches the docker images like mongo or jenkins. Some of the environment > variables are duplicated (such as `PATH` etc.), which causes those docker > images not executing properly. We should prevents those slave environments > exposed to command line executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3902) The Location header when non-leading master redirects to leading master is incomplete.
[ https://issues.apache.org/jira/browse/MESOS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193585#comment-15193585 ] Ashwin Murthy commented on MESOS-3902: -- Not actively, since busy with higher priority work. If this is blocking, please reassign. If not, would be interested in looking at this soon. > The Location header when non-leading master redirects to leading master is > incomplete. > -- > > Key: MESOS-3902 > URL: https://issues.apache.org/jira/browse/MESOS-3902 > Project: Mesos > Issue Type: Bug > Components: HTTP API, master >Affects Versions: 0.25.0 > Environment: 3 masters, 10 slaves >Reporter: Ben Whitehead >Assignee: Ashwin Murthy > Labels: mesosphere > > The master now sets a location header, but it's incomplete. The path of the > URL isn't set. Consider an example: > {code} > > cat /tmp/subscribe-1072944352375841456 | httpp POST > > 127.1.0.3:5050/api/v1/scheduler Content-Type:application/x-protobuf > POST /api/v1/scheduler HTTP/1.1 > Accept: application/json > Accept-Encoding: gzip, deflate > Connection: keep-alive > Content-Length: 123 > Content-Type: application/x-protobuf > Host: 127.1.0.3:5050 > User-Agent: HTTPie/0.9.0 > +-+ > | NOTE: binary data not shown in terminal | > +-+ > HTTP/1.1 307 Temporary Redirect > Content-Length: 0 > Date: Fri, 26 Feb 2016 00:54:41 GMT > Location: //127.1.0.1:5050 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
[ https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193573#comment-15193573 ] Jie Yu commented on MESOS-4810: --- {noformat} [09:46:48]W: [Step 11/11] Failed to exec: No such file or directory {noformat} The above is the suspicious logging. Looks like the command executor cannot find the command in its path. > ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails. > -- > > Key: MESOS-4810 > URL: https://issues.apache.org/jira/browse/MESOS-4810 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.28.0 > Environment: CentOS 7 on AWS, both with or without SSL. >Reporter: Bernd Mathiske >Assignee: Jie Yu > Labels: docker, mesosphere, test > > {noformat} > [09:46:46] : [Step 11/11] [ RUN ] > ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand > [09:46:46]W: [Step 11/11] I0229 09:46:46.628413 1166 leveldb.cpp:174] > Opened db in 4.242882ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.629926 1166 leveldb.cpp:181] > Compacted db in 1.483621ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.629966 1166 leveldb.cpp:196] > Created db iterator in 15498ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.629977 1166 leveldb.cpp:202] > Seeked to beginning of db in 1405ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.629984 1166 leveldb.cpp:271] > Iterated through 0 keys in the db in 239ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.630015 1166 replica.cpp:779] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [09:46:46]W: [Step 11/11] I0229 09:46:46.630470 1183 recover.cpp:447] > Starting replica recovery > [09:46:46]W: [Step 11/11] I0229 09:46:46.630702 1180 recover.cpp:473] > Replica is in EMPTY status > [09:46:46]W: [Step 11/11] I0229 09:46:46.631767 1182 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > (14567)@172.30.2.124:37431 > [09:46:46]W: [Step 11/11] I0229 09:46:46.632115 1183 recover.cpp:193] > Received a recover response from a replica in EMPTY status > [09:46:46]W: [Step 11/11] I0229 09:46:46.632450 1186 recover.cpp:564] > Updating replica status to STARTING > [09:46:46]W: [Step 11/11] I0229 09:46:46.633476 1186 master.cpp:375] > Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) > started on 172.30.2.124:37431 > [09:46:46]W: [Step 11/11] I0229 09:46:46.633491 1186 master.cpp:377] Flags > at startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" > --zk_session_timeout="10secs" > [09:46:46]W: [Step 11/11] I0229 09:46:46.633677 1186 master.cpp:422] > Master only allowing authenticated frameworks to register > [09:46:46]W: [Step 11/11] I0229 09:46:46.633685 1186 master.cpp:427] > Master only allowing authenticated slaves to register > [09:46:46]W: [Step 11/11] I0229 09:46:46.633692 1186 credentials.hpp:35] > Loading credentials for authentication from '/tmp/4UxXoW/credentials' > [09:46:46]W: [Step 11/11] I0229 09:46:46.633851 1183 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 1.191043ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.633873 1183 replica.cpp:320] > Persisted replica status to STARTING > [09:46:46]W: [Step 11/11] I0229 09:46:46.633894 1186 master.cpp:467] Using > default 'crammd5' authenticator > [09:46:46]W: [Step 11/11] I0229 09:46:46.634003 1186 master.cpp:536] Using > default 'basic' HTTP authenticator > [09:46:46]W: [Step 11/11] I0229 09:46:46.634062 1184 recover.cpp:473] > Replica is in STARTING status > [09:46:46]W: [Step 11/11] I0229 09:46:46.634109 1186 master.cpp:570] > Authorization enabled > [09:46:46]W: [Step 11/11] I0229 09:46:46.634249 1187 > whitelist_watcher.cpp:77] No whitelist given > [09:46:46]W: [Step 11/11] I0229 09:
[jira] [Updated] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules
[ https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Hindman updated MESOS-4610: Sprint: Mesosphere Sprint 29 (was: Mesosphere Sprint 29, Mesosphere Sprint 30) > MasterContender/MasterDetector should be loadable as modules > > > Key: MESOS-4610 > URL: https://issues.apache.org/jira/browse/MESOS-4610 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Mark Cavage >Assignee: Mark Cavage > > Currently mesos depends on Zookeeper for leader election and notification to > slaves, although there is a C++ hierarchy in the code to support alternatives > (e.g., unit tests use an in-memory implementation). From an operational > perspective, many organizations/users do not want to take a dependency on > Zookeeper, and use an alternative solution to implementing leader election. > Our organization in particular, very much wants this, and as a reference > there have been several requests from the community (see referenced tickets) > to replace with etcd/consul/etc. > This ticket will serve as the work effort to modularize the > MasterContender/MasterDetector APIs such that integrators can build a > pluggable solution of their choice; this ticket will not fold in any > implementations such as etcd et al., but simply move this hierarchy to be > fully pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4053: -- Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 31 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > MemoryPressureMesosTest tests fail on CentOS 6.6 > > > Key: MESOS-4053 > URL: https://issues.apache.org/jira/browse/MESOS-4053 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann >Assignee: Benjamin Hindman > Labels: mesosphere, test-failure > > {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and > {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It > seems that mounted cgroups are not properly cleaned up after previous tests, > so multiple hierarchies are detected and thus an error is produced: > {code} > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms) > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4924) MAC OS build failed
[ https://issues.apache.org/jira/browse/MESOS-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Niemitz reassigned MESOS-4924: Assignee: Steve Niemitz > MAC OS build failed > --- > > Key: MESOS-4924 > URL: https://issues.apache.org/jira/browse/MESOS-4924 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Steve Niemitz >Priority: Blocker > > Seems caused by https://reviews.apache.org/r/41049/ [~SteveNiemitz] > {code} > -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 > -Qunused-arguments -I/usr/local/opt/subversion/include/subversion-1 > -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Qunused-arguments > build/temp.macosx-10.10-intel-2.7/src/mesos/executor/mesos_executor_driver_impl.o > build/temp.macosx-10.10-intel-2.7/src/mesos/executor/module.o > build/temp.macosx-10.10-intel-2.7/src/mesos/executor/proxy_executor.o > /Users/gyliu/git/mesos/build/src/.libs/libmesos_no_3rdparty.a > /Users/gyliu/git/mesos/build/3rdparty/libprocess/.libs/libprocess.a > /Users/gyliu/git/mesos/build/3rdparty/leveldb-1.4/libleveldb.a > /Users/gyliu/git/mesos/build/3rdparty/zookeeper-3.4.5/src/c/.libs/libzookeeper_mt.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/.libs/libprotobuf.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a > -o build/lib.macosx-10.10-intel-2.7/mesos/executor/_executor.so > -Wl,--as-needed -L/usr/local/opt/subversion/lib -lsasl2 -lsvn_delta-1 > -lsvn_subr-1 -lapr-1 -lcurl -lz > ld: unknown option: --as-needed > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > error: command 'g++' failed with exit status 1 > make[2]: *** [python/dist/mesos.executor-0.29.0-py2.7-macosx-10.10-intel.egg] > Error 1 > make[2]: *** Waiting for unfinished jobs > g++ -bundle -undefined dynamic_lookup -arch x86_64 -arch i386 -Wl,-F. > -L/usr/local/opt/subversion/lib -g -O0 -g -O0 -Wno-unused-local-typedef > -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 > -Qunused-arguments -I/usr/local/opt/subversion/include/subversion-1 > -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Qunused-arguments > build/temp.macosx-10.10-intel-2.7/src/mesos/scheduler/mesos_scheduler_driver_impl.o > build/temp.macosx-10.10-intel-2.7/src/mesos/scheduler/module.o > build/temp.macosx-10.10-intel-2.7/src/mesos/scheduler/proxy_scheduler.o > /Users/gyliu/git/mesos/build/src/.libs/libmesos_no_3rdparty.a > /Users/gyliu/git/mesos/build/3rdparty/libprocess/.libs/libprocess.a > /Users/gyliu/git/mesos/build/3rdparty/leveldb-1.4/libleveldb.a > /Users/gyliu/git/mesos/build/3rdparty/zookeeper-3.4.5/src/c/.libs/libzookeeper_mt.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/.libs/libprotobuf.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a > -o build/lib.macosx-10.10-intel-2.7/mesos/scheduler/_scheduler.so > -Wl,--as-needed -L/usr/local/opt/subversion/lib -lsasl2 -lsvn_delta-1 > -lsvn_subr-1 -lapr-1 -lcurl -lz > ld: unknown option: --as-needed > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > error: command 'g++' failed with exit status 1 > make[2]: *** > [python/dist/mesos.scheduler-0.29.0-py2.7-macosx-10.10-intel.egg] Error 1 > make[1]: *** [all] Error 2 > make: *** [all-recursive] Error 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4924) MAC OS build failed
[ https://issues.apache.org/jira/browse/MESOS-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193354#comment-15193354 ] Steve Niemitz commented on MESOS-4924: -- Review up at https://reviews.apache.org/r/44785/ > MAC OS build failed > --- > > Key: MESOS-4924 > URL: https://issues.apache.org/jira/browse/MESOS-4924 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Priority: Blocker > > Seems caused by https://reviews.apache.org/r/41049/ [~SteveNiemitz] > {code} > -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 > -Qunused-arguments -I/usr/local/opt/subversion/include/subversion-1 > -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Qunused-arguments > build/temp.macosx-10.10-intel-2.7/src/mesos/executor/mesos_executor_driver_impl.o > build/temp.macosx-10.10-intel-2.7/src/mesos/executor/module.o > build/temp.macosx-10.10-intel-2.7/src/mesos/executor/proxy_executor.o > /Users/gyliu/git/mesos/build/src/.libs/libmesos_no_3rdparty.a > /Users/gyliu/git/mesos/build/3rdparty/libprocess/.libs/libprocess.a > /Users/gyliu/git/mesos/build/3rdparty/leveldb-1.4/libleveldb.a > /Users/gyliu/git/mesos/build/3rdparty/zookeeper-3.4.5/src/c/.libs/libzookeeper_mt.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/.libs/libprotobuf.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a > -o build/lib.macosx-10.10-intel-2.7/mesos/executor/_executor.so > -Wl,--as-needed -L/usr/local/opt/subversion/lib -lsasl2 -lsvn_delta-1 > -lsvn_subr-1 -lapr-1 -lcurl -lz > ld: unknown option: --as-needed > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > error: command 'g++' failed with exit status 1 > make[2]: *** [python/dist/mesos.executor-0.29.0-py2.7-macosx-10.10-intel.egg] > Error 1 > make[2]: *** Waiting for unfinished jobs > g++ -bundle -undefined dynamic_lookup -arch x86_64 -arch i386 -Wl,-F. > -L/usr/local/opt/subversion/lib -g -O0 -g -O0 -Wno-unused-local-typedef > -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 > -Qunused-arguments -I/usr/local/opt/subversion/include/subversion-1 > -I/usr/include/apr-1 -I/usr/include/apr-1.0 -Qunused-arguments > build/temp.macosx-10.10-intel-2.7/src/mesos/scheduler/mesos_scheduler_driver_impl.o > build/temp.macosx-10.10-intel-2.7/src/mesos/scheduler/module.o > build/temp.macosx-10.10-intel-2.7/src/mesos/scheduler/proxy_scheduler.o > /Users/gyliu/git/mesos/build/src/.libs/libmesos_no_3rdparty.a > /Users/gyliu/git/mesos/build/3rdparty/libprocess/.libs/libprocess.a > /Users/gyliu/git/mesos/build/3rdparty/leveldb-1.4/libleveldb.a > /Users/gyliu/git/mesos/build/3rdparty/zookeeper-3.4.5/src/c/.libs/libzookeeper_mt.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/.libs/libprotobuf.a > > /Users/gyliu/git/mesos/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a > -o build/lib.macosx-10.10-intel-2.7/mesos/scheduler/_scheduler.so > -Wl,--as-needed -L/usr/local/opt/subversion/lib -lsasl2 -lsvn_delta-1 > -lsvn_subr-1 -lapr-1 -lcurl -lz > ld: unknown option: --as-needed > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > error: command 'g++' failed with exit status 1 > make[2]: *** > [python/dist/mesos.scheduler-0.29.0-py2.7-macosx-10.10-intel.egg] Error 1 > make[1]: *** [all] Error 2 > make: *** [all-recursive] Error 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4935) Docker executor does not track containers killed out-of-band.
Alexander Rukletsov created MESOS-4935: -- Summary: Docker executor does not track containers killed out-of-band. Key: MESOS-4935 URL: https://issues.apache.org/jira/browse/MESOS-4935 Project: Mesos Issue Type: Improvement Components: docker Reporter: Alexander Rukletsov If a docker container is killed out-of-band (using docker CLI), the docker executor does not exit and hence does not report a terminal state to Mesos. It happens because the docker executor does not monitor the state of the underlying container and waits for an explicit {{killTask()}} or {{shutdown()}}: https://github.com/apache/mesos/blob/69b2ad528dd79979a8ee113a8edbbab2669e32e6/src/docker/executor.cpp#L278 We should consider introducing a container observer actor, similar to {{ReaperProcess}} we have for command executor-based tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193349#comment-15193349 ] Vinod Kone commented on MESOS-2043: --- [~kevincox] Can you attach the logs with 0.27.1? > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, > mesos-master.20141104-1606-1706.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4312) Porting Mesos on Power (ppc64le)
[ https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193344#comment-15193344 ] Vinod Kone commented on MESOS-4312: --- I updated the linked tickets to issue tickets instead. Can you update *each* ticket to its correct state (e.g., Reviewable) and add the required details (e.g., review link)? > Porting Mesos on Power (ppc64le) > > > Key: MESOS-4312 > URL: https://issues.apache.org/jira/browse/MESOS-4312 > Project: Mesos > Issue Type: Epic >Reporter: Qian Zhang >Assignee: Chen Zhiwei > > The goal of this ticket is to make IBM Power (ppc64le) as a supported > hardware platform of Mesos. Currently the latest Mesos code can not be > successfully built on ppc64le, we will resolve the build errors in this > ticket, and also make sure Mesos test suite ("make check") can be ran > successfully on ppc64le. > The review list: > * Protobuf: https://reviews.apache.org/r/44257/ > * Glog: https://reviews.apache.org/r/44252/ > * Libev: https://reviews.apache.org/r/44378/ > * Http-parser: https://reviews.apache.org/r/44372/ > * Zookeeper: https://reviews.apache.org/r/44376/ > * Leveldb: https://reviews.apache.org/r/44382/ > * Mesos: https://reviews.apache.org/r/42551/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4312) Porting Mesos on Power (ppc64le)
[ https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4312: -- Epic Name: PowerPC (was: Power) > Porting Mesos on Power (ppc64le) > > > Key: MESOS-4312 > URL: https://issues.apache.org/jira/browse/MESOS-4312 > Project: Mesos > Issue Type: Epic >Reporter: Qian Zhang >Assignee: Chen Zhiwei > > The goal of this ticket is to make IBM Power (ppc64le) as a supported > hardware platform of Mesos. Currently the latest Mesos code can not be > successfully built on ppc64le, we will resolve the build errors in this > ticket, and also make sure Mesos test suite ("make check") can be ran > successfully on ppc64le. > The review list: > * Protobuf: https://reviews.apache.org/r/44257/ > * Glog: https://reviews.apache.org/r/44252/ > * Libev: https://reviews.apache.org/r/44378/ > * Http-parser: https://reviews.apache.org/r/44372/ > * Zookeeper: https://reviews.apache.org/r/44376/ > * Leveldb: https://reviews.apache.org/r/44382/ > * Mesos: https://reviews.apache.org/r/42551/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4897) Update test cases to support PowerPC LE
[ https://issues.apache.org/jira/browse/MESOS-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4897: -- Shepherd: Vinod Kone > Update test cases to support PowerPC LE > --- > > Key: MESOS-4897 > URL: https://issues.apache.org/jira/browse/MESOS-4897 > Project: Mesos > Issue Type: Improvement >Reporter: Chen Zhiwei >Assignee: Chen Zhiwei > > Some docker related test cases will be failed on PowerPC LE, since the Docker > image 'alpine' can't be able to run on PowerPC LE platform. > On PowerPC LE platform, the test cases can use Docker image 'ppc64le/busybox'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4879) Update glog patch to suport PowerPC LE
[ https://issues.apache.org/jira/browse/MESOS-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4879: -- Shepherd: Vinod Kone > Update glog patch to suport PowerPC LE > -- > > Key: MESOS-4879 > URL: https://issues.apache.org/jira/browse/MESOS-4879 > Project: Mesos > Issue Type: Improvement >Reporter: Chen Zhiwei >Assignee: Chen Zhiwei > > This is a part of PowerPC LE porting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4612) Update vendored ZooKeeper to 3.4.8
[ https://issues.apache.org/jira/browse/MESOS-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4612: -- Shepherd: Vinod Kone > Update vendored ZooKeeper to 3.4.8 > -- > > Key: MESOS-4612 > URL: https://issues.apache.org/jira/browse/MESOS-4612 > Project: Mesos > Issue Type: Improvement >Reporter: Cody Maloney >Assignee: Chen Zhiwei > Labels: mesosphere, tech-debt, zookeeper > > See: http://zookeeper.apache.org/doc/r3.4.8/releasenotes.html for > improvements / bug fixes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4678) Upgrade vendored Protobuf to 2.6.1
[ https://issues.apache.org/jira/browse/MESOS-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4678: -- Shepherd: Vinod Kone Summary: Upgrade vendored Protobuf to 2.6.1 (was: Upgrade vendored Protobuf) > Upgrade vendored Protobuf to 2.6.1 > -- > > Key: MESOS-4678 > URL: https://issues.apache.org/jira/browse/MESOS-4678 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Neil Conway >Assignee: Chen Zhiwei > Labels: 3rdParty, mesosphere, protobuf, tech-debt > > We currently vendor Protobuf 2.5.0. We should upgrade to Protobuf 2.6.1. This > introduces various bugfixes, performance improvements, and at least one new > feature we might want to eventually take advantage of ({{map}} data type). > AFAIK there should be no backward compatibility concerns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)