[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot
[ https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162232#comment-16162232 ] Mao Geng commented on MESOS-5482: - [~chhsia0] the problem happened on agent lost connection with master and re-registered, no one was really shutting down marathon. MESOS-7215 look like the root cause. When agent re-registered, it was shutting down all executors of non partition-aware frameworks, including the marathon task. Meanwhile marathon tried to lunch a new task on the agent, and the agent ignored running the task as it thought the framework was shutting down, hence the task got stuck in the "staging" stage. Then marathon tried to kill the task as the task is overdue on deployment, which got ignored by the agent too. Restarting the agent resolves this issue though. > mesos/marathon task stuck in staging after slave reboot > --- > > Key: MESOS-5482 > URL: https://issues.apache.org/jira/browse/MESOS-5482 > Project: Mesos > Issue Type: Bug >Reporter: lutful karim > Labels: tech-debt > Attachments: marathon-mesos-masters_after-reboot.log, > mesos-masters_mesos.log, mesos_slaves_after_reboot.log, > tasks_running_before_rebooot.marathon > > > The main idea of mesos/marathon is to sleep well, but after node reboot mesos > task gets stuck in staging for about 4 hours. > To reproduce the issue: > - setup a mesos cluster in HA mode with systemd enabled mesos-master and > mesos-slave service. > - run docker registry (https://hub.docker.com/_/registry/ ) with mesos > constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and > notice that task getting stuck in staging. > Possible workaround: service mesos-slave restart fixes the issue. > OS: centos 7.2 > mesos version: 0.28.1 > marathon: 1.1.1 > zookeeper: 3.4.8 > docker: 1.9.1 dockerAPIversion: 1.21 > error message: > May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013 > 909 slave.cpp:2018] Ignoring kill task > docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor > 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework > 8517fcb7-f2d0-47ad-ae02-837570bef929- is terminating/terminated -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-817) CHECK is Future.get() can fail.
[ https://issues.apache.org/jira/browse/MESOS-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154819#comment-16154819 ] Mao Geng commented on MESOS-817: Hit a similar check failure when using mesos_http health check recently. Marathon 1.4.3 and Mesos 1.2.0 {code} F0816 00:40:43.62808276 future.hpp:1104] Check failed: !isPending() Future was in PENDING after await() *** Check failure stack trace: *** @ 0x7f055f54d9dd google::LogMessage::Fail() @ 0x7f055f54f65d google::LogMessage::SendToLog() @ 0x7f055f54d5a2 google::LogMessage::Flush() @ 0x7f055f550049 google::LogMessageFatal::~LogMessageFatal() @ 0x7f055edf5ae1 process::Future<>::get() @ 0x7f055efb145c ZooKeeper::get() @ 0x7f055f462d76 mesos::state::ZooKeeperStorageProcess::doGet() @ 0x7f055f46366d mesos::state::ZooKeeperStorageProcess::get() @ 0x7f055f469620 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI6OptionIN5mesos8internal5state5EntryEENS6_5state23ZooKeeperStorageProcessERKSsSsEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSJ_FSH_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f055f4db924 process::ProcessManager::resume() @ 0x7f055f4dbc57 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f0658dfd970 (unknown) @ 0x7f065a9c6064 start_thread @ 0x7f065a0cd62d (unknown) Aborted I0816 00:40:44.243842 111060 health_checker.cpp:165] Health checking stopped W0816 00:40:44.243842 111057 logging.cpp:91] RAW: Received signal SIGTERM from process 45015 of user 0; exiting {code} > CHECK is Future.get() can fail. > --- > > Key: MESOS-817 > URL: https://issues.apache.org/jira/browse/MESOS-817 > Project: Mesos > Issue Type: Bug > Environment: Linux gcc 4.2.1 >Reporter: Jie Yu >Assignee: Jie Yu > > template > T Future::get() const > { > if (!isReady()) { > await(); > } > CHECK(!isPending()) << "Future was in PENDING after await()"; > if (!isReady()) { > if (isFailed()) { > std::cerr << "Future::get() but state == FAILED: " > << failure() << std::endl; > } else if (isDiscarded()) { > std::cerr << "Future::get() but state == DISCARDED" << std::endl; > } > abort(); > } > assert(data->t != NULL); > return *data->t; > } > This CHECK can fail: > CHECK(!isPending()) << "Future was in PENDING after await()"; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot
[ https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121944#comment-16121944 ] Mao Geng commented on MESOS-5482: - Hit this issue on mesos 1.2.0 and marathon 1.4.3 too. The agent timed out the ping for 75secs, then reconnected {quote} I0810 13:18:43.142431 18394 slave.cpp:4378] No pings from master received within 75secs I0810 13:18:43.142588 18393 slave.cpp:920] Re-detecting master I0810 13:18:43.142614 18393 slave.cpp:966] Detecting new master I0810 13:18:43.142674 18407 status_update_manager.cpp:177] Pausing sending status updates I0810 13:18:43.142755 18420 status_update_manager.cpp:177] Pausing sending status updates I0810 13:18:43.142813 18415 slave.cpp:931] New master detected at master@10.1.36.4:5050 I0810 13:18:43.142840 18415 slave.cpp:955] No credentials provided. Attempting to register without authentication I0810 13:18:43.142853 18415 slave.cpp:966] Detecting new master I0810 13:18:44.431833 18415 slave.cpp:1242] Re-registered with master master@10.1.36.4:5050 I0810 13:18:44.431874 18415 slave.cpp:1279] Forwarding total oversubscribed resources {} I0810 13:18:44.431895 18398 status_update_manager.cpp:184] Resuming sending status updates I0810 13:18:44.433912 18386 slave.cpp:2683] Shutting down framework f853458f-b07b-4b79-8192-24953f474369- I0810 13:18:44.433939 18386 slave.cpp:5083] Shutting down executor 'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework f853458f-b07b-4b79-8192-24953f474369- at executor(1)@10.1.98.251:33041 W0810 13:18:44.435637 18440 slave.cpp:2823] Ignoring updating pid for framework f853458f-b07b-4b79-8192-24953f474369- because it is terminating I0810 13:18:46.878993 18408 slave.cpp:1625] Got assigned task 'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework f853458f-b07b-4b79-8192-24953f474369- I0810 13:18:46.879406 18408 slave.cpp:1785] Launching task 'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework f853458f-b07b-4b79-8192-24953f474369- W0810 13:18:46.879436 18408 slave.cpp:1853] Ignoring running task 'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' of framework f853458f-b07b-4b79-8192-24953f474369- because the framework is terminating I0810 13:18:47.613224 18415 slave.cpp:3816] Handling status update TASK_KILLED (UUID: af78fc5c-8552-4aee-abae-cda3d0ec2909) for task metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5 of framework f853458f-b07b-4b79-8192-24953f474369- from executor(1)@10.1.98.251:33041 W0810 13:18:47.613261 18415 slave.cpp:3885] Ignoring status update TASK_KILLED (UUID: af78fc5c-8552-4aee-abae-cda3d0ec2909) for task metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5 of framework f853458f-b07b-4b79-8192-24953f474369- for terminating framework f853458f-b07b-4b79-8192-24953f474369- I0810 13:18:48.618629 18409 slave.cpp:4388] Got exited event for executor(1)@10.1.98.251:33041 I0810 13:18:48.713826 18390 docker.cpp:2358] Executor for container 1f351db2-1011-4244-83c2-1854c44d7b65 has exited I0810 13:18:48.713850 18390 docker.cpp:2052] Destroying container 1f351db2-1011-4244-83c2-1854c44d7b65 I0810 13:18:48.713892 18390 docker.cpp:2179] Running docker stop on container 1f351db2-1011-4244-83c2-1854c44d7b65 I0810 13:18:48.714363 18411 slave.cpp:4769] Executor 'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework f853458f-b07b-4b79-8192-24953f474369- exited with status 0 I0810 13:18:48.714390 18411 slave.cpp:4869] Cleaning up executor 'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework f853458f-b07b-4b79-8192-24953f474369- at executor(1)@10.1.98.251:33041 I0810 13:18:48.714589 18411 slave.cpp:4957] Cleaning up framework f853458f-b07b-4b79-8192-24953f474369- I0810 13:18:48.714607 18432 gc.cpp:55] Scheduling '/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5/runs/1f351db2-1011-4244-83c2-1854c44d7b65' for gc 6.9173026667days in the future I0810 13:18:48.714669 18410 status_update_manager.cpp:285] Closing status update streams for framework f853458f-b07b-4b79-8192-24953f474369- I0810 13:18:48.714679 18432 gc.cpp:55] Scheduling '/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' for gc 6.9172979259days in the future I0810 13:18:48.714709 18432 gc.cpp:55] Scheduling '/mnt/mesos/meta/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5/runs/1f351db2-1011-4244-83c2-1854c44d7b65' for gc 6.9172953778days in the future I0810 13:18:48.714725 18432 gc.cpp:55] Scheduling '/mnt/mesos/meta/slaves/508bde0b-4661-
[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060492#comment-16060492 ] Mao Geng commented on MESOS-7692: - [~tillt] [~jieyu] sorry the second issue about tfmesos is a false alarm. I tested again with mesos 1.3.0-2.0.3 multiple times and couldn't reproduce. Mesos can launch the containerizer correctly with what tfmesos specified. Most likely I messed up the test environment previously. Sorry for misleading, and thanks for fixing the environment issue! > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-7692: Comment: was deleted (was: [~tillt] The framework always use {{-C DOCKER}}, however somehow mesos launched a container like this: {code} I0620 20:29:03.457880 70274 containerizer.cpp:1524] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"clone_namespaces":[131072],"command":{"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"},"environment":{"variables":[{"name":"LIBPROCESS_PORT","type":"VALUE","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","type":"VALUE","value":"10.1.160.40:5051"},{"name":"MESOS_CHECKPOINT","type":"VALUE","value":"0"},{"name":"MESOS_DIRECTORY","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"MESOS_EXECUTOR_ID","type":"VALUE","value":"10"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","type":"VALUE","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","type":"VALUE","value":"609ef166-7000-4c8d-a6ed-909e4d504eaa-0049"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","type":"VALUE","value":"0"},{"name":"MESOS_NATIVE_JAVA_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_NATIVE_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_SLAVE_ID","type":"VALUE","value":"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8"},{"name":"MESOS_SLAVE_PID","type":"VALUE","value":"slave(1)@10.1.160.40:5051"},{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"PYTHONPATH","type":"VALUE","value":"\/:\/usr\/lib\/python2.7:\/usr\/lib\/python2.7\/plat-x86_64-linux-gnu:\/usr\/lib\/python2.7\/lib-tk:\/usr\/lib\/python2.7\/lib-old:\/usr\/lib\/python2.7\/lib-dynload:\/usr\/local\/lib\/python2.7\/dist-packages:\/usr\/lib\/python2.7\/dist-packages"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}],"user":"root","working_directory":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"}" --pipe_read="29" --pipe_write="30" --runtime_directory="/var/run/mesos/containers/c7e77d1e-e411-4703-805f-10bbf9a0eaf8" --unshare_namespace_mnt="false"' {code} Same framework with {{-C DOCKER}} option can launch docker container correctly with Mesos 1.2.0) > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-7692: Comment: was deleted (was: Here is one of the ACCEPT messages sent from framework (capture via tcpdump and formatted): {code} { "accept": { "offer_ids": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854" }, "operations": [ { "launch": { "task_infos": [ { "agent_id": { "value": "f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11" }, "command": { "environment": { "variables": [ { "name": "PYTHONPATH", "value": "/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages" } ] }, "shell": true, "value": "/usr/bin/python -m tfmesos.server 1 :" }, "container": { "docker": { "image": "", "parameters": [ { "key": "memory-swap", "value": "-1" } ] }, "type": "DOCKER", "volumes": [ { "container_path": "/etc/passwd", "host_path": "/etc/passwd", "mode": "RO" }, { "container_path": "/etc/group", "host_path": "/etc/group", "mode": "RO" } ] }, "name": "/job:worker/task:0", "resources": [ { "name": "cpus", "scalar": { "value": 5.0 }, "type": "SCALAR" }, { "name": "mem", "scalar": { "value": 8192.0 }, "type": "SCALAR" } ], "task_id": { "value": "1" } } ] }, "type": "LAUNCH" } ] }, "framework_id": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-0017" }, "type": "ACCEPT" } {code}) > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060492#comment-16060492 ] Mao Geng edited comment on MESOS-7692 at 6/23/17 7:14 AM: -- [~tillt] [~jieyu] sorry the second issue about tfmesos is a false alarm. I tested again with mesos 1.3.0-2.0.3 multiple times and couldn't reproduce. Mesos can launch the containerizer correctly with what tfmesos specified. Most likely I messed up the test environment previously. Sorry for misleading, and thanks for fixing the environment variable issue! was (Author: gengmao): [~tillt] [~jieyu] sorry the second issue about tfmesos is a false alarm. I tested again with mesos 1.3.0-2.0.3 multiple times and couldn't reproduce. Mesos can launch the containerizer correctly with what tfmesos specified. Most likely I messed up the test environment previously. Sorry for misleading, and thanks for fixing the environment issue! > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057090#comment-16057090 ] Mao Geng edited comment on MESOS-7692 at 6/21/17 7:34 AM: -- Here is one of the ACCEPT messages sent from framework (capture via tcpdump and formatted): {code} { "accept": { "offer_ids": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854" }, "operations": [ { "launch": { "task_infos": [ { "agent_id": { "value": "f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11" }, "command": { "environment": { "variables": [ { "name": "PYTHONPATH", "value": "/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages" } ] }, "shell": true, "value": "/usr/bin/python -m tfmesos.server 1 :" }, "container": { "docker": { "image": "", "parameters": [ { "key": "memory-swap", "value": "-1" } ] }, "type": "DOCKER", "volumes": [ { "container_path": "/etc/passwd", "host_path": "/etc/passwd", "mode": "RO" }, { "container_path": "/etc/group", "host_path": "/etc/group", "mode": "RO" } ] }, "name": "/job:worker/task:0", "resources": [ { "name": "cpus", "scalar": { "value": 5.0 }, "type": "SCALAR" }, { "name": "mem", "scalar": { "value": 8192.0 }, "type": "SCALAR" } ], "task_id": { "value": "1" } } ] }, "type": "LAUNCH" } ] }, "framework_id": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-0017" }, "type": "ACCEPT" } {code} was (Author: gengmao): Here is one of the ACCEPT messages sent from framework (capture via tcpdump and formatted): {code} { "accept": { "offer_ids": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854" }, "operations": [ { "launch": { "task_infos": [ { "agent_id": { "value": "f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11" }, "command": { "environment": { "variables": [ { "name": "PYTHONPATH", "value": "/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages"
[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057090#comment-16057090 ] Mao Geng commented on MESOS-7692: - Here is one of the ACCEPT messages sent from framework (capture via tcpdump and formatted): {code} { "accept": { "offer_ids": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854" }, "operations": [ { "launch": { "task_infos": [ { "agent_id": { "value": "f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11" }, "command": { "environment": { "variables": [ { "name": "PYTHONPATH", "value": "/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages" } ] }, "shell": true, "value": "/usr/bin/python -m tfmesos.server 1 :" }, "container": { "docker": { "image": "", "parameters": [ { "key": "memory-swap", "value": "-1" } ] }, "type": "DOCKER", "volumes": [ { "container_path": "/etc/passwd", "host_path": "/etc/passwd", "mode": "RO" }, { "container_path": "/etc/group", "host_path": "/etc/group", "mode": "RO" } ] }, "name": "/job:worker/task:0", "resources": [ { "name": "cpus", "scalar": { "value": 5.0 }, "type": "SCALAR" }, { "name": "mem", "scalar": { "value": 8192.0 }, "type": "SCALAR" } ], "task_id": { "value": "1" } } ] }, "framework_id": { "value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-0017" }, "type": "ACCEPT" } {code} > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056888#comment-16056888 ] Mao Geng commented on MESOS-7692: - [~tillt] The framework always use {{-C DOCKER}}, however somehow mesos launched a container like this: {code} I0620 20:29:03.457880 70274 containerizer.cpp:1524] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"clone_namespaces":[131072],"command":{"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"},"environment":{"variables":[{"name":"LIBPROCESS_PORT","type":"VALUE","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","type":"VALUE","value":"10.1.160.40:5051"},{"name":"MESOS_CHECKPOINT","type":"VALUE","value":"0"},{"name":"MESOS_DIRECTORY","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"MESOS_EXECUTOR_ID","type":"VALUE","value":"10"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","type":"VALUE","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","type":"VALUE","value":"609ef166-7000-4c8d-a6ed-909e4d504eaa-0049"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","type":"VALUE","value":"0"},{"name":"MESOS_NATIVE_JAVA_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_NATIVE_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_SLAVE_ID","type":"VALUE","value":"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8"},{"name":"MESOS_SLAVE_PID","type":"VALUE","value":"slave(1)@10.1.160.40:5051"},{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"PYTHONPATH","type":"VALUE","value":"\/:\/usr\/lib\/python2.7:\/usr\/lib\/python2.7\/plat-x86_64-linux-gnu:\/usr\/lib\/python2.7\/lib-tk:\/usr\/lib\/python2.7\/lib-old:\/usr\/lib\/python2.7\/lib-dynload:\/usr\/local\/lib\/python2.7\/dist-packages:\/usr\/lib\/python2.7\/dist-packages"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}],"user":"root","working_directory":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"}" --pipe_read="29" --pipe_write="30" --runtime_directory="/var/run/mesos/containers/c7e77d1e-e411-4703-805f-10bbf9a0eaf8" --unshare_namespace_mnt="false"' {code} Same framework with {{-C DOCKER}} option can launch docker container correctly with Mesos 1.2.0 > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056740#comment-16056740 ] Mao Geng commented on MESOS-7692: - [~tillt] I found another issue today, not sure if it is same root cause. Tfmesos, the framework based on HTTP Scheduler API to run tensorflow in docker on mesos, couldn't submit tasks in Docker containerizer any more. Previously I could run https://github.com/douban/tfmesos/blob/master/script/tfrun#L17 with {{-C DOCKER}}, which would launch tasks by docker containerizer, however, on mesos 1.3.0-2.0.3 the tasks were launched by mesos containerizer even with same option and code. Could you please help troubleshoot? Should I open a new ticket for that? > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Assignee: Till Toenshoff >Priority: Blocker > Fix For: 1.3.1 > > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053320#comment-16053320 ] Mao Geng commented on MESOS-7692: - The problem exists on mesos container submitted via marathon too (tested with 1.4.3), not just mesos-execute. > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Priority: Blocker > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053269#comment-16053269 ] Mao Geng edited comment on MESOS-7692 at 6/18/17 5:48 PM: -- Tested on a host with following command: {code} /usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 --command="env" {code} Output of the task: {code}Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' Executing pre-exec command '{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' Received SUBSCRIBED event Subscribed executor on Received LAUNCH event Starting task java8 Running '/usr/libexec/mesos/mesos-containerizer launch ' Forked command at 122347 Changing root to /mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46 MESOS_EXECUTOR_ID=java8 MESOS_CHECKPOINT=0 MESOS_HTTP_COMMAND_EXECUTOR=0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.1.100.89:5051 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MESOS_SANDBOX=/mnt/mesos/sandbox MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005 MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3 MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76 PWD=/mnt/mesos/sandbox MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051 Command exited with status 0 (pid: 122347) Received SHUTDOWN event Shutting down{code} Package version: {code}apt-cache policy mesos mesos: Installed: 1.3.0-2.0.3 Candidate: 1.3.0-2.0.3 Version table: *** 1.3.0-2.0.3 0 500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status{code} was (Author: gengmao): Tested on a host with following command: {code} /usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 --command="env" {code} Output of the task: {code}Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' Executing pre-exec command '{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' Received SUBSCRIBED event Subscribed executor on Received LAUNCH event Starting task java8 Running '/usr/libexec/mesos/mesos-containerizer launch ' Forked command at 122347 Changing root to /mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46 MESOS_EXECUTOR_ID=java8 MESOS_CHECKPOINT=0 MESOS_HTTP_COMMAND_EXECUTOR=0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.1.100.89:5051 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MESOS_SANDBOX=/mnt/mesos/sandbox MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005 MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3 MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76 PWD=/mnt/mesos/sandbox MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051 Command exited with status 0 (pid: 122347) Received SHUTDOWN event Shutting down{code} Package version: {quote}apt-cache policy mesos mesos: Installed: 1.3.0-2.0.3 Candidate: 1.3.0-2.0.3 Version table: *** 1.3.0-2.0.3 0 500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status{quote} > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 >
[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053269#comment-16053269 ] Mao Geng commented on MESOS-7692: - Tested on a host with following command: {code} /usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 --command="env" {code} Output of the task: {quote}Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' Executing pre-exec command '{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' Received SUBSCRIBED event Subscribed executor on Received LAUNCH event Starting task java8 Running '/usr/libexec/mesos/mesos-containerizer launch ' Forked command at 122347 Changing root to /mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46 MESOS_EXECUTOR_ID=java8 MESOS_CHECKPOINT=0 MESOS_HTTP_COMMAND_EXECUTOR=0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.1.100.89:5051 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MESOS_SANDBOX=/mnt/mesos/sandbox MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005 MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3 MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76 PWD=/mnt/mesos/sandbox MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051 Command exited with status 0 (pid: 122347) Received SHUTDOWN event Shutting down{quote} Package version: {quote}apt-cache policy mesos mesos: Installed: 1.3.0-2.0.3 Candidate: 1.3.0-2.0.3 Version table: *** 1.3.0-2.0.3 0 500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status{quote} > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng >Priority: Blocker > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053269#comment-16053269 ] Mao Geng edited comment on MESOS-7692 at 6/18/17 5:47 PM: -- Tested on a host with following command: {code} /usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 --command="env" {code} Output of the task: {code}Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' Executing pre-exec command '{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' Received SUBSCRIBED event Subscribed executor on Received LAUNCH event Starting task java8 Running '/usr/libexec/mesos/mesos-containerizer launch ' Forked command at 122347 Changing root to /mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46 MESOS_EXECUTOR_ID=java8 MESOS_CHECKPOINT=0 MESOS_HTTP_COMMAND_EXECUTOR=0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.1.100.89:5051 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MESOS_SANDBOX=/mnt/mesos/sandbox MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005 MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3 MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76 PWD=/mnt/mesos/sandbox MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051 Command exited with status 0 (pid: 122347) Received SHUTDOWN event Shutting down{code} Package version: {quote}apt-cache policy mesos mesos: Installed: 1.3.0-2.0.3 Candidate: 1.3.0-2.0.3 Version table: *** 1.3.0-2.0.3 0 500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status{quote} was (Author: gengmao): Tested on a host with following command: {code} /usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 --command="env" {code} Output of the task: {quote}Executing pre-exec command '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}' Executing pre-exec command '{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}' Received SUBSCRIBED event Subscribed executor on Received LAUNCH event Starting task java8 Running '/usr/libexec/mesos/mesos-containerizer launch ' Forked command at 122347 Changing root to /mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46 MESOS_EXECUTOR_ID=java8 MESOS_CHECKPOINT=0 MESOS_HTTP_COMMAND_EXECUTOR=0 MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs LIBPROCESS_PORT=0 MESOS_AGENT_ENDPOINT=10.1.100.89:5051 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MESOS_SANDBOX=/mnt/mesos/sandbox MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005 MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3 MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76 PWD=/mnt/mesos/sandbox MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051 Command exited with status 0 (pid: 122347) Received SHUTDOWN event Shutting down{quote} Package version: {quote}apt-cache policy mesos mesos: Installed: 1.3.0-2.0.3 Candidate: 1.3.0-2.0.3 Version table: *** 1.3.0-2.0.3 0 500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages 100 /var/lib/dpkg/status{quote} > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 >
[jira] [Updated] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-7692: Description: Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined by ENV statements in dockerfile are not available in mesos containerizer any more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, JAVA_HOME of java:8 image, etc. The env vars are available in mesos containerizer in 1.2.0. Looks like a regression to me, isn't it? was: Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined by ENV statements in dockerfile are not available in mesos containerizer any more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, JAVA_HOME of java:8 image, etc. The env vars are in mesos containerizer in 1.2.0. Looks like a regression to me, isn't it? > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are available in mesos containerizer in 1.2.0. Looks like a > regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer any more
Mao Geng created MESOS-7692: --- Summary: Default environment variables defined in docker image are not available in mesos containerizer any more Key: MESOS-7692 URL: https://issues.apache.org/jira/browse/MESOS-7692 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 1.3.0 Reporter: Mao Geng Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined by ENV statements in dockerfile are not available in mesos containerizer any more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, JAVA_HOME of java:8 image, etc. The env vars are in mesos containerizer in 1.2.0. Looks like a regression to me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-7692: Summary: Default environment variables defined in docker image are not available in mesos containerizer (was: Default environment variables defined in docker image are not available in mesos containerizer any more) > Default environment variables defined in docker image are not available in > mesos containerizer > -- > > Key: MESOS-7692 > URL: https://issues.apache.org/jira/browse/MESOS-7692 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Mao Geng > > Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined > by ENV statements in dockerfile are not available in mesos containerizer any > more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, > JAVA_HOME of java:8 image, etc. > The env vars are in mesos containerizer in 1.2.0. Looks like a regression to > me, isn't it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7522) Mesos containerizer to support docker credential helpers for private docker registries
[ https://issues.apache.org/jira/browse/MESOS-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng reassigned MESOS-7522: --- Assignee: Mao Geng > Mesos containerizer to support docker credential helpers for private docker > registries > -- > > Key: MESOS-7522 > URL: https://issues.apache.org/jira/browse/MESOS-7522 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Mao Geng >Assignee: Mao Geng > Labels: mesos-containerizer > > In Pinterest, we use Amazon ECR as our docker registry and use > https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine > to get auth token automatically. > It works well with docker containerizer, as long as I have the > .docker/config.json configured "credStores" and --docker_config configured > for mesos-agent. > However, this doesn't work for mesos containerizer. Meanwhile we want to use > mesos containerizer's GPU support, so we have to run a separate docker > registry on http and without auth, purely for mesos containerizer. > I think it will be good if mesos containerizer can support > https://github.com/docker/docker-credential-helpers by default, so that it > will address a pain point for the users who are using crendential helpers > with private registries on ECR, GCR, quay, dockerhub etc. > This might be related to MESOS-7088 > CC [~jieyu] [~gilbert] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7522) Mesos containerizer to support docker credential helpers for private docker registries
[ https://issues.apache.org/jira/browse/MESOS-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-7522: Description: In Pinterest, we use Amazon ECR as our docker registry and use https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine to get auth token automatically. It works well with docker containerizer, as long as I have the .docker/config.json configured "credStores" and --docker_config configured for mesos-agent. However, this doesn't work for mesos containerizer. Meanwhile we want to use mesos containerizer's GPU support, so we have to run a separate docker registry on http and without auth, purely for mesos containerizer. I think it will be good if mesos containerizer can support https://github.com/docker/docker-credential-helpers by default, so that it will address a pain point for the users who are using crendential helpers with private registries on ECR, GCR, quay, dockerhub etc. This might be related to MESOS-7088 CC [~jieyu] [~gilbert] was: In Pinterest, we use Amazon ECR as our docker registry and use https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine to get auth token automatically. It works well with docker containerizer, as long as I have the .docker/config.json configured "credStores" and --docker_config configured for mesos-agent. However, this doesn't work for mesos containerizer. Meanwhile we want to use mesos containerizer's GPU support, so we have to run a separate docker registry on http and without auth, purely for mesos containerizer. I think it will be good if mesos containerizer can support https://github.com/docker/docker-credential-helpers by default, so that it will address a pain point for the users who are using crendential helpers with private registries on ECR, GCR, quay, dockerhub etc. This might be related to MESOS-7088 > Mesos containerizer to support docker credential helpers for private docker > registries > -- > > Key: MESOS-7522 > URL: https://issues.apache.org/jira/browse/MESOS-7522 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Mao Geng > Labels: mesos-containerizer > > In Pinterest, we use Amazon ECR as our docker registry and use > https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine > to get auth token automatically. > It works well with docker containerizer, as long as I have the > .docker/config.json configured "credStores" and --docker_config configured > for mesos-agent. > However, this doesn't work for mesos containerizer. Meanwhile we want to use > mesos containerizer's GPU support, so we have to run a separate docker > registry on http and without auth, purely for mesos containerizer. > I think it will be good if mesos containerizer can support > https://github.com/docker/docker-credential-helpers by default, so that it > will address a pain point for the users who are using crendential helpers > with private registries on ECR, GCR, quay, dockerhub etc. > This might be related to MESOS-7088 > CC [~jieyu] [~gilbert] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7522) Mesos containerizer to support docker credential helpers for private docker registries
Mao Geng created MESOS-7522: --- Summary: Mesos containerizer to support docker credential helpers for private docker registries Key: MESOS-7522 URL: https://issues.apache.org/jira/browse/MESOS-7522 Project: Mesos Issue Type: Wish Components: containerization Reporter: Mao Geng In Pinterest, we use Amazon ECR as our docker registry and use https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine to get auth token automatically. It works well with docker containerizer, as long as I have the .docker/config.json configured "credStores" and --docker_config configured for mesos-agent. However, this doesn't work for mesos containerizer. Meanwhile we want to use mesos containerizer's GPU support, so we have to run a separate docker registry on http and without auth, purely for mesos containerizer. I think it will be good if mesos containerizer can support https://github.com/docker/docker-credential-helpers by default, so that it will address a pain point for the users who are using crendential helpers with private registries on ECR, GCR, quay, dockerhub etc. This might be related to MESOS-7088 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7088) Support private registry credential per container.
[ https://issues.apache.org/jira/browse/MESOS-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966894#comment-15966894 ] Mao Geng edited comment on MESOS-7088 at 4/13/17 12:04 AM: --- [~gilbert] Will you consider support https://github.com/docker/docker-credential-helpers in general? We are using a helper for ecr (https://github.com/awslabs/amazon-ecr-credential-helper). Basically docker-engine can invoke a helper with four commands: "erase", "store", "get", "list" via stdin, and get response from stdout. To pull a image, issue "get" command and get auth will be good enough. Will be glad to see it is considered in design doc, as many users leverage credential helpers to use private registries in clouds. Thanks was (Author: gengmao): [~gilbert] Will you consider support https://github.com/docker/docker-credential-helpers in general? We are using a helper for ecr (https://github.com/awslabs/amazon-ecr-credential-helper). Basically docker-engine can invoke a helper with four commands: "erase", "store", "get", "list" via stdin, and get response from stdout. To pull a image, issue "get" command and get auth will be good enough. Will be glad to see it is considered in design doc, as many users leverage credential helpers to use private registries in clouds. > Support private registry credential per container. > -- > > Key: MESOS-7088 > URL: https://issues.apache.org/jira/browse/MESOS-7088 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7088) Support private registry credential per container.
[ https://issues.apache.org/jira/browse/MESOS-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966894#comment-15966894 ] Mao Geng commented on MESOS-7088: - [~gilbert] Will you consider support https://github.com/docker/docker-credential-helpers in general? We are using a helper for ecr (https://github.com/awslabs/amazon-ecr-credential-helper). Basically docker-engine can invoke a helper with four commands: "erase", "store", "get", "list" via stdin, and get response from stdout. To pull a image, issue "get" command and get auth will be good enough. Will be glad to see it is considered in design doc, as many users leverage credential helpers to use private registries in clouds. > Support private registry credential per container. > -- > > Key: MESOS-7088 > URL: https://issues.apache.org/jira/browse/MESOS-7088 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: containerizer > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7232) Add support to auto-load /dev/nvidia-uvm in the GPU isolator
[ https://issues.apache.org/jira/browse/MESOS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng reassigned MESOS-7232: --- Assignee: Kevin Klues (was: Mao Geng) > Add support to auto-load /dev/nvidia-uvm in the GPU isolator > > > Key: MESOS-7232 > URL: https://issues.apache.org/jira/browse/MESOS-7232 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Kevin Klues >Assignee: Kevin Klues > > Loading /dev/nvidia-uvm (and installing a script to make sure it loads on > reboot) is not technically part of the official Nvidia driver installation > process. The rationale being that CUDA applications typically load this > device on-demand if they need it. Unfortunately, it can't load it if mesos > hasn't made it available to the container running the CUDA application though. > We should add support to have the mesos agent auto-load this device when > running the GPU isolator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7232) Add support to auto-load /dev/nvidia-uvm in the GPU isolator
[ https://issues.apache.org/jira/browse/MESOS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng reassigned MESOS-7232: --- Assignee: Mao Geng (was: Kevin Klues) > Add support to auto-load /dev/nvidia-uvm in the GPU isolator > > > Key: MESOS-7232 > URL: https://issues.apache.org/jira/browse/MESOS-7232 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Kevin Klues >Assignee: Mao Geng > > Loading /dev/nvidia-uvm (and installing a script to make sure it loads on > reboot) is not technically part of the official Nvidia driver installation > process. The rationale being that CUDA applications typically load this > device on-demand if they need it. Unfortunately, it can't load it if mesos > hasn't made it available to the container running the CUDA application though. > We should add support to have the mesos agent auto-load this device when > running the GPU isolator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems
[ https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516903#comment-15516903 ] Mao Geng commented on MESOS-5909: - [~kaysoky] Thanks for shepherding. Addressed your review comments in https://reviews.apache.org/r/52048/, can you please check? > Stout "OsTest.User" test can fail on some systems > - > > Key: MESOS-5909 > URL: https://issues.apache.org/jira/browse/MESOS-5909 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Kapil Arya >Assignee: Mao Geng > Labels: mesosphere > Attachments: MESOS-5909-fix.diff > > > Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner > (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted > list ("100 471" in my case) causing the validation inside the loop to fail. > We should sort both lists before comparing the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper
[ https://issues.apache.org/jira/browse/MESOS-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505909#comment-15505909 ] Mao Geng commented on MESOS-6169: - Sorry. The error only occurs when the .docker/config.json has no "auths". I corrected the description accordingly. When I set config.json with "auths" like below (and restarted mesos agent), actually it works well with the https://github.com/awslabs/amazon-ecr-credential-helper. {code} { "credsStore": "ecr-login", "auths": { ".dkr.ecr.us-east-1.amazonaws.com": { } } } {code} > --docker_config doesn't work with amazon-ecr-credential-helper > -- > > Key: MESOS-6169 > URL: https://issues.apache.org/jira/browse/MESOS-6169 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.0 >Reporter: Mao Geng >Assignee: Gilbert Song > > We are using AWS ECR as docker registry and using > https://github.com/awslabs/amazon-ecr-credential-helper to get credential > automatically. > As amazon-ecr-credential-helper required, we set a .docker/config.json file > like below: > {code} > { > "credsStore": "ecr-login" > } > {code} > According to the "credsStore" field, docker engine will invoke a > "docker-credential-ecr-login" command (which we've installed into /usr/bin/) > to get registry credential whenever required, for example when executing > docker pull/push. > This works fine when we tar the .docker/config.json and use uris prarameter > to pull the tar.gz file for every task using docker image. > But when I try the new --docker_config option, it doesn't work. The task > failed to pull the image from ECR. The error message is > {code} > Failed to launch container: Failed to run 'docker -H > unix:///var/run/docker.sock pull > .dkr.ecr.us-east-1.amazonaws.com/:latest': exited > with status 1; stderr='WARNING: Error loading config > file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: > authentication required ' > {code} > Checked the source at > https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't > understand why above errors message says loading a temp .dockercfg file, > which doesn't exist btw. I assume mesos should pull the image using the > config.json file I set to --docker_config, right? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper
[ https://issues.apache.org/jira/browse/MESOS-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-6169: Description: We are using AWS ECR as docker registry and using https://github.com/awslabs/amazon-ecr-credential-helper to get credential automatically. As amazon-ecr-credential-helper required, we set a .docker/config.json file like below: {code} { "credsStore": "ecr-login" {code} According to the "credsStore" field, docker engine will invoke a "docker-credential-ecr-login" command (which we've installed into /usr/bin/) to get registry credential whenever required, for example when executing docker pull/push. This works fine when we tar the .docker/config.json and use uris prarameter to pull the tar.gz file for every task using docker image. But when I try the new --docker_config option, it doesn't work. The task failed to pull the image from ECR. The error message is {code} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock pull .dkr.ecr.us-east-1.amazonaws.com/:latest': exited with status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: authentication required ' {code} Checked the source at https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't understand why above errors message says loading a temp .dockercfg file, which doesn't exist btw. I assume mesos should pull the image using the config.json file I set to --docker_config, right? was: We are using AWS ECR as docker registry and using https://github.com/awslabs/amazon-ecr-credential-helper to get credential automatically. As amazon-ecr-credential-helper required, we set a .docker/config.json file like below: {code} { "credsStore": "ecr-login", "auths": { ".dkr.ecr.us-east-1.amazonaws.com": { } } } {code} According to the "credsStore" field, docker engine will invoke a "docker-credential-ecr-login" command (which we've installed into /usr/bin/) to get registry credential whenever required, for example when executing docker pull/push. This works fine when we tar the .docker/config.json and use uris prarameter to pull the tar.gz file for every task using docker image. But when I try the new --docker_config option, it doesn't work. The task failed to pull the image from ECR. The error message is {code} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock pull .dkr.ecr.us-east-1.amazonaws.com/:latest': exited with status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: authentication required ' {code} Checked the source at https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't understand why above errors message says loading a temp .dockercfg file, which doesn't exist btw. I assume mesos should pull the image using the config.json file I set to --docker_config, right? > --docker_config doesn't work with amazon-ecr-credential-helper > -- > > Key: MESOS-6169 > URL: https://issues.apache.org/jira/browse/MESOS-6169 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.0 >Reporter: Mao Geng >Assignee: Gilbert Song > > We are using AWS ECR as docker registry and using > https://github.com/awslabs/amazon-ecr-credential-helper to get credential > automatically. > As amazon-ecr-credential-helper required, we set a .docker/config.json file > like below: > {code} > { > "credsStore": "ecr-login" > {code} > According to the "credsStore" field, docker engine will invoke a > "docker-credential-ecr-login" command (which we've installed into /usr/bin/) > to get registry credential whenever required, for example when executing > docker pull/push. > This works fine when we tar the .docker/config.json and use uris prarameter > to pull the tar.gz file for every task using docker image. > But when I try the new --docker_config option, it doesn't work. The task > failed to pull the image from ECR. The error message is > {code} > Failed to launch container: Failed to run 'docker -H > unix:///var/run/docker.sock pull > .dkr.ecr.us-east-1.amazonaws.com/:latest': exited > with status 1; stderr='WARNING: Error loading config > file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: > authentication required ' > {code} > Checked the source at > https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't > understand why above errors message says loading a temp .dockercfg file, > which doesn't exist btw. I assume mesos should pull the image using the > config.json file I set to --docker_config, right?
[jira] [Updated] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper
[ https://issues.apache.org/jira/browse/MESOS-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-6169: Description: We are using AWS ECR as docker registry and using https://github.com/awslabs/amazon-ecr-credential-helper to get credential automatically. As amazon-ecr-credential-helper required, we set a .docker/config.json file like below: {code} { "credsStore": "ecr-login" } {code} According to the "credsStore" field, docker engine will invoke a "docker-credential-ecr-login" command (which we've installed into /usr/bin/) to get registry credential whenever required, for example when executing docker pull/push. This works fine when we tar the .docker/config.json and use uris prarameter to pull the tar.gz file for every task using docker image. But when I try the new --docker_config option, it doesn't work. The task failed to pull the image from ECR. The error message is {code} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock pull .dkr.ecr.us-east-1.amazonaws.com/:latest': exited with status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: authentication required ' {code} Checked the source at https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't understand why above errors message says loading a temp .dockercfg file, which doesn't exist btw. I assume mesos should pull the image using the config.json file I set to --docker_config, right? was: We are using AWS ECR as docker registry and using https://github.com/awslabs/amazon-ecr-credential-helper to get credential automatically. As amazon-ecr-credential-helper required, we set a .docker/config.json file like below: {code} { "credsStore": "ecr-login" {code} According to the "credsStore" field, docker engine will invoke a "docker-credential-ecr-login" command (which we've installed into /usr/bin/) to get registry credential whenever required, for example when executing docker pull/push. This works fine when we tar the .docker/config.json and use uris prarameter to pull the tar.gz file for every task using docker image. But when I try the new --docker_config option, it doesn't work. The task failed to pull the image from ECR. The error message is {code} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock pull .dkr.ecr.us-east-1.amazonaws.com/:latest': exited with status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: authentication required ' {code} Checked the source at https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't understand why above errors message says loading a temp .dockercfg file, which doesn't exist btw. I assume mesos should pull the image using the config.json file I set to --docker_config, right? > --docker_config doesn't work with amazon-ecr-credential-helper > -- > > Key: MESOS-6169 > URL: https://issues.apache.org/jira/browse/MESOS-6169 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.0 >Reporter: Mao Geng >Assignee: Gilbert Song > > We are using AWS ECR as docker registry and using > https://github.com/awslabs/amazon-ecr-credential-helper to get credential > automatically. > As amazon-ecr-credential-helper required, we set a .docker/config.json file > like below: > {code} > { > "credsStore": "ecr-login" > } > {code} > According to the "credsStore" field, docker engine will invoke a > "docker-credential-ecr-login" command (which we've installed into /usr/bin/) > to get registry credential whenever required, for example when executing > docker pull/push. > This works fine when we tar the .docker/config.json and use uris prarameter > to pull the tar.gz file for every task using docker image. > But when I try the new --docker_config option, it doesn't work. The task > failed to pull the image from ECR. The error message is > {code} > Failed to launch container: Failed to run 'docker -H > unix:///var/run/docker.sock pull > .dkr.ecr.us-east-1.amazonaws.com/:latest': exited > with status 1; stderr='WARNING: Error loading config > file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: > authentication required ' > {code} > Checked the source at > https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't > understand why above errors message says loading a temp .dockercfg file, > which doesn't exist btw. I assume mesos should pull the image using the > config.json file I set to --docker_config, right? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems
[ https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504625#comment-15504625 ] Mao Geng commented on MESOS-5909: - Got it. Thanks! Look forward to addressing review comments. > Stout "OsTest.User" test can fail on some systems > - > > Key: MESOS-5909 > URL: https://issues.apache.org/jira/browse/MESOS-5909 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Kapil Arya >Assignee: Gilbert Song > Labels: mesosphere > Attachments: MESOS-5909-fix.diff > > > Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner > (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted > list ("100 471" in my case) causing the validation inside the loop to fail. > We should sort both lists before comparing the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems
[ https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504307#comment-15504307 ] Mao Geng commented on MESOS-5909: - Thanks [~kaysoky]! I created https://reviews.apache.org/r/52048/ and added [~gilbert] [~karya] as reviewers. May I add you as a reviewer too? > Stout "OsTest.User" test can fail on some systems > - > > Key: MESOS-5909 > URL: https://issues.apache.org/jira/browse/MESOS-5909 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Kapil Arya >Assignee: Gilbert Song > Labels: mesosphere > Attachments: MESOS-5909-fix.diff > > > Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner > (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted > list ("100 471" in my case) causing the validation inside the loop to fail. > We should sort both lists before comparing the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5909) Stout "OsTest.User" test can fail on some systems
[ https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mao Geng updated MESOS-5909: Attachment: MESOS-5909-fix.diff Just hit this error when running `make check`. Figured out a fix (as [~karya] said, sort both lists then compare) to the os_tests.cpp. See attached. I am new to Mesos community. If I want to submit a patch, should I create a PR on https://github.com/apache/mesos (I saw someone did), or just follow the http://mesos.apache.org/documentation/latest/submitting-a-patch/ to create a review using post-reviews.py? cc [~karya] [~gilbert] > Stout "OsTest.User" test can fail on some systems > - > > Key: MESOS-5909 > URL: https://issues.apache.org/jira/browse/MESOS-5909 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Kapil Arya >Assignee: Gilbert Song > Labels: mesosphere > Attachments: MESOS-5909-fix.diff > > > Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner > (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted > list ("100 471" in my case) causing the validation inside the loop to fail. > We should sort both lists before comparing the values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper
Mao Geng created MESOS-6169: --- Summary: --docker_config doesn't work with amazon-ecr-credential-helper Key: MESOS-6169 URL: https://issues.apache.org/jira/browse/MESOS-6169 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 1.0.0 Reporter: Mao Geng We are using AWS ECR as docker registry and using https://github.com/awslabs/amazon-ecr-credential-helper to get credential automatically. As amazon-ecr-credential-helper required, we set a .docker/config.json file like below: {code} { "credsStore": "ecr-login", "auths": { ".dkr.ecr.us-east-1.amazonaws.com": { } } } {code} According to the "credsStore" field, docker engine will invoke a "docker-credential-ecr-login" command (which we've installed into /usr/bin/) to get registry credential whenever required, for example when executing docker pull/push. This works fine when we tar the .docker/config.json and use uris prarameter to pull the tar.gz file for every task using docker image. But when I try the new --docker_config option, it doesn't work. The task failed to pull the image from ECR. The error message is {code} Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock pull .dkr.ecr.us-east-1.amazonaws.com/:latest': exited with status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: authentication required ' {code} Checked the source at https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't understand why above errors message says loading a temp .dockercfg file, which doesn't exist btw. I assume mesos should pull the image using the config.json file I set to --docker_config, right? -- This message was sent by Atlassian JIRA (v6.3.4#6332)