[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot

2017-09-11 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162232#comment-16162232
 ] 

Mao Geng commented on MESOS-5482:
-

[~chhsia0] the problem happened on agent lost connection with master and 
re-registered, no one was really shutting down marathon. MESOS-7215 look like 
the root cause. 
When agent re-registered, it was shutting down all executors of non 
partition-aware frameworks, including the marathon task. Meanwhile marathon 
tried to lunch a new task on the agent, and the agent ignored running the task 
as it thought the framework was shutting down, hence the task got stuck in the 
"staging" stage. Then marathon tried to kill the task as the task is overdue on 
deployment, which got ignored by the agent too. 
Restarting the agent resolves this issue though.

> mesos/marathon task stuck in staging after slave reboot
> ---
>
> Key: MESOS-5482
> URL: https://issues.apache.org/jira/browse/MESOS-5482
> Project: Mesos
>  Issue Type: Bug
>Reporter: lutful karim
>  Labels: tech-debt
> Attachments: marathon-mesos-masters_after-reboot.log, 
> mesos-masters_mesos.log, mesos_slaves_after_reboot.log, 
> tasks_running_before_rebooot.marathon
>
>
> The main idea of mesos/marathon is to sleep well, but after node reboot mesos 
> task gets stuck in staging for about 4 hours.
> To reproduce the issue: 
> - setup a mesos cluster in HA mode with systemd enabled mesos-master and 
> mesos-slave service.
> - run docker registry (https://hub.docker.com/_/registry/ ) with mesos 
> constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and 
> notice that task getting stuck in staging.
> Possible workaround: service mesos-slave restart fixes the issue.
> OS: centos 7.2
> mesos version: 0.28.1
> marathon: 1.1.1
> zookeeper: 3.4.8
> docker: 1.9.1 dockerAPIversion: 1.21
> error message:
> May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013   
> 909 slave.cpp:2018] Ignoring kill task 
> docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor 
> 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework 
> 8517fcb7-f2d0-47ad-ae02-837570bef929- is terminating/terminated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-817) CHECK is Future.get() can fail.

2017-09-05 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154819#comment-16154819
 ] 

Mao Geng commented on MESOS-817:


Hit a similar check failure when using mesos_http health check recently. 
Marathon 1.4.3 and Mesos 1.2.0
{code}
F0816 00:40:43.62808276 future.hpp:1104] Check failed: !isPending() Future 
was in PENDING after await()
*** Check failure stack trace: ***
@ 0x7f055f54d9dd  google::LogMessage::Fail()
@ 0x7f055f54f65d  google::LogMessage::SendToLog()
@ 0x7f055f54d5a2  google::LogMessage::Flush()
@ 0x7f055f550049  google::LogMessageFatal::~LogMessageFatal()
@ 0x7f055edf5ae1  process::Future<>::get()
@ 0x7f055efb145c  ZooKeeper::get()
@ 0x7f055f462d76  mesos::state::ZooKeeperStorageProcess::doGet()
@ 0x7f055f46366d  mesos::state::ZooKeeperStorageProcess::get()
@ 0x7f055f469620  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI6OptionIN5mesos8internal5state5EntryEENS6_5state23ZooKeeperStorageProcessERKSsSsEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSJ_FSH_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f055f4db924  process::ProcessManager::resume()
@ 0x7f055f4dbc57  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f0658dfd970  (unknown)
@ 0x7f065a9c6064  start_thread
@ 0x7f065a0cd62d  (unknown)
Aborted
I0816 00:40:44.243842 111060 health_checker.cpp:165] Health checking stopped
W0816 00:40:44.243842 111057 logging.cpp:91] RAW: Received signal SIGTERM from 
process 45015 of user 0; exiting
{code}

> CHECK is Future.get() can fail.
> ---
>
> Key: MESOS-817
> URL: https://issues.apache.org/jira/browse/MESOS-817
> Project: Mesos
>  Issue Type: Bug
> Environment: Linux gcc 4.2.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> template 
> T Future::get() const
> {
>   if (!isReady()) {
> await();
>   }
>   CHECK(!isPending()) << "Future was in PENDING after await()";
>   if (!isReady()) {
> if (isFailed()) {
>   std::cerr << "Future::get() but state == FAILED: "
> << failure()  << std::endl;
> } else if (isDiscarded()) {
>   std::cerr << "Future::get() but state == DISCARDED" << std::endl;
> }
> abort();
>   }
>   assert(data->t != NULL);
>   return *data->t;
> }
> This CHECK can fail:
> CHECK(!isPending()) << "Future was in PENDING after await()";



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot

2017-08-10 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121944#comment-16121944
 ] 

Mao Geng commented on MESOS-5482:
-

Hit this issue on mesos 1.2.0 and marathon 1.4.3 too. 
The agent timed out the ping for 75secs, then reconnected
{quote}
I0810 13:18:43.142431 18394 slave.cpp:4378] No pings from master received 
within 75secs
I0810 13:18:43.142588 18393 slave.cpp:920] Re-detecting master
I0810 13:18:43.142614 18393 slave.cpp:966] Detecting new master
I0810 13:18:43.142674 18407 status_update_manager.cpp:177] Pausing sending 
status updates
I0810 13:18:43.142755 18420 status_update_manager.cpp:177] Pausing sending 
status updates
I0810 13:18:43.142813 18415 slave.cpp:931] New master detected at 
master@10.1.36.4:5050
I0810 13:18:43.142840 18415 slave.cpp:955] No credentials provided. Attempting 
to register without authentication
I0810 13:18:43.142853 18415 slave.cpp:966] Detecting new master
I0810 13:18:44.431833 18415 slave.cpp:1242] Re-registered with master 
master@10.1.36.4:5050
I0810 13:18:44.431874 18415 slave.cpp:1279] Forwarding total oversubscribed 
resources {}
I0810 13:18:44.431895 18398 status_update_manager.cpp:184] Resuming sending 
status updates
I0810 13:18:44.433912 18386 slave.cpp:2683] Shutting down framework 
f853458f-b07b-4b79-8192-24953f474369-
I0810 13:18:44.433939 18386 slave.cpp:5083] Shutting down executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework 
f853458f-b07b-4b79-8192-24953f474369- at executor(1)@10.1.98.251:33041
W0810 13:18:44.435637 18440 slave.cpp:2823] Ignoring updating pid for framework 
f853458f-b07b-4b79-8192-24953f474369- because it is terminating
I0810 13:18:46.878993 18408 slave.cpp:1625] Got assigned task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework 
f853458f-b07b-4b79-8192-24953f474369-
I0810 13:18:46.879406 18408 slave.cpp:1785] Launching task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework 
f853458f-b07b-4b79-8192-24953f474369-
W0810 13:18:46.879436 18408 slave.cpp:1853] Ignoring running task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' of framework 
f853458f-b07b-4b79-8192-24953f474369- because the framework is terminating
I0810 13:18:47.613224 18415 slave.cpp:3816] Handling status update TASK_KILLED 
(UUID: af78fc5c-8552-4aee-abae-cda3d0ec2909) for task 
metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5 of framework 
f853458f-b07b-4b79-8192-24953f474369- from executor(1)@10.1.98.251:33041
W0810 13:18:47.613261 18415 slave.cpp:3885] Ignoring status update TASK_KILLED 
(UUID: af78fc5c-8552-4aee-abae-cda3d0ec2909) for task 
metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5 of framework 
f853458f-b07b-4b79-8192-24953f474369- for terminating framework 
f853458f-b07b-4b79-8192-24953f474369-
I0810 13:18:48.618629 18409 slave.cpp:4388] Got exited event for 
executor(1)@10.1.98.251:33041
I0810 13:18:48.713826 18390 docker.cpp:2358] Executor for container 
1f351db2-1011-4244-83c2-1854c44d7b65 has exited
I0810 13:18:48.713850 18390 docker.cpp:2052] Destroying container 
1f351db2-1011-4244-83c2-1854c44d7b65
I0810 13:18:48.713892 18390 docker.cpp:2179] Running docker stop on container 
1f351db2-1011-4244-83c2-1854c44d7b65
I0810 13:18:48.714363 18411 slave.cpp:4769] Executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework 
f853458f-b07b-4b79-8192-24953f474369- exited with status 0
I0810 13:18:48.714390 18411 slave.cpp:4869] Cleaning up executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework 
f853458f-b07b-4b79-8192-24953f474369- at executor(1)@10.1.98.251:33041
I0810 13:18:48.714589 18411 slave.cpp:4957] Cleaning up framework 
f853458f-b07b-4b79-8192-24953f474369-
I0810 13:18:48.714607 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5/runs/1f351db2-1011-4244-83c2-1854c44d7b65'
 for gc 6.9173026667days in the future
I0810 13:18:48.714669 18410 status_update_manager.cpp:285] Closing status 
update streams for framework f853458f-b07b-4b79-8192-24953f474369-
I0810 13:18:48.714679 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5'
 for gc 6.9172979259days in the future
I0810 13:18:48.714709 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/meta/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5/runs/1f351db2-1011-4244-83c2-1854c44d7b65'
 for gc 6.9172953778days in the future
I0810 13:18:48.714725 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/meta/slaves/508bde0b-4661-

[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-23 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060492#comment-16060492
 ] 

Mao Geng commented on MESOS-7692:
-

[~tillt] [~jieyu] sorry the second issue about tfmesos is a false alarm. I 
tested again with mesos 1.3.0-2.0.3 multiple times and couldn't reproduce. 
Mesos can launch the containerizer correctly with what tfmesos specified. Most 
likely I messed up the test environment previously. Sorry for misleading, and 
thanks for fixing the environment issue!

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-23 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-7692:

Comment: was deleted

(was: [~tillt] The framework always use {{-C DOCKER}}, however somehow mesos 
launched a container like this:
{code}
I0620 20:29:03.457880 70274 containerizer.cpp:1524] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"clone_namespaces":[131072],"command":{"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"},"environment":{"variables":[{"name":"LIBPROCESS_PORT","type":"VALUE","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","type":"VALUE","value":"10.1.160.40:5051"},{"name":"MESOS_CHECKPOINT","type":"VALUE","value":"0"},{"name":"MESOS_DIRECTORY","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"MESOS_EXECUTOR_ID","type":"VALUE","value":"10"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","type":"VALUE","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","type":"VALUE","value":"609ef166-7000-4c8d-a6ed-909e4d504eaa-0049"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","type":"VALUE","value":"0"},{"name":"MESOS_NATIVE_JAVA_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_NATIVE_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_SLAVE_ID","type":"VALUE","value":"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8"},{"name":"MESOS_SLAVE_PID","type":"VALUE","value":"slave(1)@10.1.160.40:5051"},{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"PYTHONPATH","type":"VALUE","value":"\/:\/usr\/lib\/python2.7:\/usr\/lib\/python2.7\/plat-x86_64-linux-gnu:\/usr\/lib\/python2.7\/lib-tk:\/usr\/lib\/python2.7\/lib-old:\/usr\/lib\/python2.7\/lib-dynload:\/usr\/local\/lib\/python2.7\/dist-packages:\/usr\/lib\/python2.7\/dist-packages"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}],"user":"root","working_directory":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"}"
 --pipe_read="29" --pipe_write="30" 
--runtime_directory="/var/run/mesos/containers/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"
 --unshare_namespace_mnt="false"'
{code}
Same framework with {{-C DOCKER}} option can launch docker container correctly 
with Mesos 1.2.0)

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-23 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-7692:

Comment: was deleted

(was: Here is one of the ACCEPT messages sent from framework (capture via 
tcpdump and formatted): 
{code}
{
"accept": {
"offer_ids": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854"
},
"operations": [
{
"launch": {
"task_infos": [
{
"agent_id": {
"value": 
"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11"
},
"command": {
"environment": {
"variables": [
{
"name": "PYTHONPATH",
"value": 
"/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages"
}
]
},
"shell": true,
"value": "/usr/bin/python -m tfmesos.server 1 
:"
},
"container": {
"docker": {
"image": "",
"parameters": [
{
"key": "memory-swap",
"value": "-1"
}
]
},
"type": "DOCKER",
"volumes": [
{
"container_path": "/etc/passwd",
"host_path": "/etc/passwd",
"mode": "RO"
},
{
"container_path": "/etc/group",
"host_path": "/etc/group",
"mode": "RO"
}
]
},
"name": "/job:worker/task:0",
"resources": [
{
"name": "cpus",
"scalar": {
"value": 5.0
},
"type": "SCALAR"
},
{
"name": "mem",
"scalar": {
"value": 8192.0
},
"type": "SCALAR"
}
],
"task_id": {
"value": "1"
}
}
]
},
"type": "LAUNCH"
}
]
},
"framework_id": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-0017"
},
"type": "ACCEPT"
}
{code})

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-23 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060492#comment-16060492
 ] 

Mao Geng edited comment on MESOS-7692 at 6/23/17 7:14 AM:
--

[~tillt] [~jieyu] sorry the second issue about tfmesos is a false alarm. I 
tested again with mesos 1.3.0-2.0.3 multiple times and couldn't reproduce. 
Mesos can launch the containerizer correctly with what tfmesos specified. Most 
likely I messed up the test environment previously. Sorry for misleading, and 
thanks for fixing the environment variable issue!


was (Author: gengmao):
[~tillt] [~jieyu] sorry the second issue about tfmesos is a false alarm. I 
tested again with mesos 1.3.0-2.0.3 multiple times and couldn't reproduce. 
Mesos can launch the containerizer correctly with what tfmesos specified. Most 
likely I messed up the test environment previously. Sorry for misleading, and 
thanks for fixing the environment issue!

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-21 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057090#comment-16057090
 ] 

Mao Geng edited comment on MESOS-7692 at 6/21/17 7:34 AM:
--

Here is one of the ACCEPT messages sent from framework (capture via tcpdump and 
formatted): 
{code}
{
"accept": {
"offer_ids": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854"
},
"operations": [
{
"launch": {
"task_infos": [
{
"agent_id": {
"value": 
"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11"
},
"command": {
"environment": {
"variables": [
{
"name": "PYTHONPATH",
"value": 
"/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages"
}
]
},
"shell": true,
"value": "/usr/bin/python -m tfmesos.server 1 
:"
},
"container": {
"docker": {
"image": "",
"parameters": [
{
"key": "memory-swap",
"value": "-1"
}
]
},
"type": "DOCKER",
"volumes": [
{
"container_path": "/etc/passwd",
"host_path": "/etc/passwd",
"mode": "RO"
},
{
"container_path": "/etc/group",
"host_path": "/etc/group",
"mode": "RO"
}
]
},
"name": "/job:worker/task:0",
"resources": [
{
"name": "cpus",
"scalar": {
"value": 5.0
},
"type": "SCALAR"
},
{
"name": "mem",
"scalar": {
"value": 8192.0
},
"type": "SCALAR"
}
],
"task_id": {
"value": "1"
}
}
]
},
"type": "LAUNCH"
}
]
},
"framework_id": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-0017"
},
"type": "ACCEPT"
}
{code}


was (Author: gengmao):
Here is one of the ACCEPT messages sent from framework (capture via tcpdump and 
formatted): 
{code}
{
"accept": {
"offer_ids": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854"
},
"operations": [
{
"launch": {
"task_infos": [
{
"agent_id": {
"value": 
"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11"
},
"command": {
"environment": {
"variables": [
{
"name": "PYTHONPATH",
"value": 
"/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages"

[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-21 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057090#comment-16057090
 ] 

Mao Geng commented on MESOS-7692:
-

Here is one of the ACCEPT messages sent from framework (capture via tcpdump and 
formatted): 
{code}
{
"accept": {
"offer_ids": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-O100854"
},
"operations": [
{
"launch": {
"task_infos": [
{
"agent_id": {
"value": 
"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S11"
},
"command": {
"environment": {
"variables": [
{
"name": "PYTHONPATH",
"value": 
"/usr/local/bin:/:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages"
}
]
},
"shell": true,
"value": "/usr/bin/python -m tfmesos.server 1 
:"
},
"container": {
"docker": {
"image": "",
"parameters": [
{
"key": "memory-swap",
"value": "-1"
}
]
},
"type": "DOCKER",
"volumes": [
{
"container_path": "/etc/passwd",
"host_path": "/etc/passwd",
"mode": "RO"
},
{
"container_path": "/etc/group",
"host_path": "/etc/group",
"mode": "RO"
}
]
},
"name": "/job:worker/task:0",
"resources": [
{
"name": "cpus",
"scalar": {
"value": 5.0
},
"type": "SCALAR"
},
{
"name": "mem",
"scalar": {
"value": 8192.0
},
"type": "SCALAR"
}
],
"task_id": {
"value": "1"
}
}
]
},
"framework_id": {
"value": "55541916-22e8-41c5-a7fa-acf9a0f5836a-0017"
},
"type": "ACCEPT"
}
{code}

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-20 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056888#comment-16056888
 ] 

Mao Geng commented on MESOS-7692:
-

[~tillt] The framework always use {{-C DOCKER}}, however somehow mesos launched 
a container like this:
{code}
I0620 20:29:03.457880 70274 containerizer.cpp:1524] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"clone_namespaces":[131072],"command":{"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"},"environment":{"variables":[{"name":"LIBPROCESS_PORT","type":"VALUE","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","type":"VALUE","value":"10.1.160.40:5051"},{"name":"MESOS_CHECKPOINT","type":"VALUE","value":"0"},{"name":"MESOS_DIRECTORY","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"MESOS_EXECUTOR_ID","type":"VALUE","value":"10"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","type":"VALUE","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","type":"VALUE","value":"609ef166-7000-4c8d-a6ed-909e4d504eaa-0049"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","type":"VALUE","value":"0"},{"name":"MESOS_NATIVE_JAVA_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_NATIVE_LIBRARY","type":"VALUE","value":"\/usr\/lib\/libmesos-1.3.0.so"},{"name":"MESOS_SLAVE_ID","type":"VALUE","value":"f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8"},{"name":"MESOS_SLAVE_PID","type":"VALUE","value":"slave(1)@10.1.160.40:5051"},{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"},{"name":"PYTHONPATH","type":"VALUE","value":"\/:\/usr\/lib\/python2.7:\/usr\/lib\/python2.7\/plat-x86_64-linux-gnu:\/usr\/lib\/python2.7\/lib-tk:\/usr\/lib\/python2.7\/lib-old:\/usr\/lib\/python2.7\/lib-dynload:\/usr\/local\/lib\/python2.7\/dist-packages:\/usr\/lib\/python2.7\/dist-packages"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}],"user":"root","working_directory":"\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S8\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0049\/executors\/10\/runs\/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"}"
 --pipe_read="29" --pipe_write="30" 
--runtime_directory="/var/run/mesos/containers/c7e77d1e-e411-4703-805f-10bbf9a0eaf8"
 --unshare_namespace_mnt="false"'
{code}
Same framework with {{-C DOCKER}} option can launch docker container correctly 
with Mesos 1.2.0

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-20 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056740#comment-16056740
 ] 

Mao Geng commented on MESOS-7692:
-

[~tillt] I found another issue today, not sure if it is same root cause. 
Tfmesos, the framework based on HTTP Scheduler API to run tensorflow in docker 
on mesos, couldn't submit tasks in Docker containerizer any more. Previously I 
could run https://github.com/douban/tfmesos/blob/master/script/tfrun#L17 with 
{{-C DOCKER}}, which would launch tasks by docker containerizer, however, on 
mesos 1.3.0-2.0.3 the tasks were launched by mesos containerizer even with same 
option and code. 
Could you please help troubleshoot? Should I open a new ticket for that? 

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Assignee: Till Toenshoff
>Priority: Blocker
> Fix For: 1.3.1
>
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-18 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053320#comment-16053320
 ] 

Mao Geng commented on MESOS-7692:
-

The problem exists on mesos container submitted via marathon too (tested with 
1.4.3), not just mesos-execute. 

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Priority: Blocker
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-18 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053269#comment-16053269
 ] 

Mao Geng edited comment on MESOS-7692 at 6/18/17 5:48 PM:
--

Tested on a host with following command:
{code}
/usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 
--command="env"
{code}

Output of the task:
{code}Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
Executing pre-exec command 
'{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
Received SUBSCRIBED event
Subscribed executor on 
Received LAUNCH event
Starting task java8
Running '/usr/libexec/mesos/mesos-containerizer launch 
'
Forked command at 122347
Changing root to 
/mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46
MESOS_EXECUTOR_ID=java8
MESOS_CHECKPOINT=0
MESOS_HTTP_COMMAND_EXECUTOR=0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.1.100.89:5051
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005
MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3
MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76
PWD=/mnt/mesos/sandbox
MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051
Command exited with status 0 (pid: 122347)
Received SHUTDOWN event
Shutting down{code}

Package version:
{code}apt-cache policy mesos
mesos:
  Installed: 1.3.0-2.0.3
  Candidate: 1.3.0-2.0.3
  Version table:
 *** 1.3.0-2.0.3 0
500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status{code}


was (Author: gengmao):
Tested on a host with following command:
{code}
/usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 
--command="env"
{code}

Output of the task:
{code}Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
Executing pre-exec command 
'{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
Received SUBSCRIBED event
Subscribed executor on 
Received LAUNCH event
Starting task java8
Running '/usr/libexec/mesos/mesos-containerizer launch 
'
Forked command at 122347
Changing root to 
/mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46
MESOS_EXECUTOR_ID=java8
MESOS_CHECKPOINT=0
MESOS_HTTP_COMMAND_EXECUTOR=0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.1.100.89:5051
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005
MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3
MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76
PWD=/mnt/mesos/sandbox
MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051
Command exited with status 0 (pid: 122347)
Received SHUTDOWN event
Shutting down{code}

Package version:
{quote}apt-cache policy mesos
mesos:
  Installed: 1.3.0-2.0.3
  Candidate: 1.3.0-2.0.3
  Version table:
 *** 1.3.0-2.0.3 0
500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status{quote}

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
>

[jira] [Commented] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-18 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053269#comment-16053269
 ] 

Mao Geng commented on MESOS-7692:
-

Tested on a host with following command:
{code}
/usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 
--command="env"
{code}

Output of the task:
{quote}Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
Executing pre-exec command 
'{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
Received SUBSCRIBED event
Subscribed executor on 
Received LAUNCH event
Starting task java8
Running '/usr/libexec/mesos/mesos-containerizer launch 
'
Forked command at 122347
Changing root to 
/mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46
MESOS_EXECUTOR_ID=java8
MESOS_CHECKPOINT=0
MESOS_HTTP_COMMAND_EXECUTOR=0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.1.100.89:5051
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005
MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3
MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76
PWD=/mnt/mesos/sandbox
MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051
Command exited with status 0 (pid: 122347)
Received SHUTDOWN event
Shutting down{quote}

Package version:
{quote}apt-cache policy mesos
mesos:
  Installed: 1.3.0-2.0.3
  Candidate: 1.3.0-2.0.3
  Version table:
 *** 1.3.0-2.0.3 0
500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status{quote}

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>Priority: Blocker
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-18 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053269#comment-16053269
 ] 

Mao Geng edited comment on MESOS-7692 at 6/18/17 5:47 PM:
--

Tested on a host with following command:
{code}
/usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 
--command="env"
{code}

Output of the task:
{code}Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
Executing pre-exec command 
'{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
Received SUBSCRIBED event
Subscribed executor on 
Received LAUNCH event
Starting task java8
Running '/usr/libexec/mesos/mesos-containerizer launch 
'
Forked command at 122347
Changing root to 
/mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46
MESOS_EXECUTOR_ID=java8
MESOS_CHECKPOINT=0
MESOS_HTTP_COMMAND_EXECUTOR=0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.1.100.89:5051
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005
MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3
MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76
PWD=/mnt/mesos/sandbox
MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051
Command exited with status 0 (pid: 122347)
Received SHUTDOWN event
Shutting down{code}

Package version:
{quote}apt-cache policy mesos
mesos:
  Installed: 1.3.0-2.0.3
  Candidate: 1.3.0-2.0.3
  Version table:
 *** 1.3.0-2.0.3 0
500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status{quote}


was (Author: gengmao):
Tested on a host with following command:
{code}
/usr/bin/mesos-execute --master= --name=java8 --docker_image=java:8 
--command="env"
{code}

Output of the task:
{quote}Executing pre-exec command 
'{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-containerizer"}'
Executing pre-exec command 
'{"arguments":["mount","-n","--rbind","\/mnt\/mesos\/slaves\/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3\/frameworks\/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005\/executors\/java8\/runs\/4a381932-6bc2-4e52-a044-697491694d76","\/mnt\/mesos\/provisioner\/containers\/4a381932-6bc2-4e52-a044-697491694d76\/backends\/overlay\/rootfses\/4d202d5d-42f9-4904-b67f-b995c7dfab46\/mnt\/mesos\/sandbox"],"shell":false,"value":"mount"}'
Received SUBSCRIBED event
Subscribed executor on 
Received LAUNCH event
Starting task java8
Running '/usr/libexec/mesos/mesos-containerizer launch 
'
Forked command at 122347
Changing root to 
/mnt/mesos/provisioner/containers/4a381932-6bc2-4e52-a044-697491694d76/backends/overlay/rootfses/4d202d5d-42f9-4904-b67f-b995c7dfab46
MESOS_EXECUTOR_ID=java8
MESOS_CHECKPOINT=0
MESOS_HTTP_COMMAND_EXECUTOR=0
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
LIBPROCESS_PORT=0
MESOS_AGENT_ENDPOINT=10.1.100.89:5051
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MESOS_SANDBOX=/mnt/mesos/sandbox
MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_FRAMEWORK_ID=609ef166-7000-4c8d-a6ed-909e4d504eaa-0005
MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.3.0.so
MESOS_SLAVE_ID=f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3
MESOS_DIRECTORY=/mnt/mesos/slaves/f2bcc63d-e887-4e25-b2c0-3772dfb40fb0-S3/frameworks/609ef166-7000-4c8d-a6ed-909e4d504eaa-0005/executors/java8/runs/4a381932-6bc2-4e52-a044-697491694d76
PWD=/mnt/mesos/sandbox
MESOS_SLAVE_PID=slave(1)@10.1.100.89:5051
Command exited with status 0 (pid: 122347)
Received SHUTDOWN event
Shutting down{quote}

Package version:
{quote}apt-cache policy mesos
mesos:
  Installed: 1.3.0-2.0.3
  Candidate: 1.3.0-2.0.3
  Version table:
 *** 1.3.0-2.0.3 0
500 http://repos.mesosphere.io/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status{quote}

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
>

[jira] [Updated] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-18 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-7692:

Description: 
Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
by ENV statements in dockerfile are not available in mesos containerizer any 
more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
JAVA_HOME of java:8 image, etc. 

The env vars are available in mesos containerizer in 1.2.0. Looks like a 
regression to me, isn't it? 

  was:
Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
by ENV statements in dockerfile are not available in mesos containerizer any 
more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
JAVA_HOME of java:8 image, etc. 

The env vars are in mesos containerizer in 1.2.0. Looks like a regression to 
me, isn't it? 


> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are available in mesos containerizer in 1.2.0. Looks like a 
> regression to me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer any more

2017-06-18 Thread Mao Geng (JIRA)
Mao Geng created MESOS-7692:
---

 Summary: Default environment variables defined in docker image are 
not available in mesos containerizer any more
 Key: MESOS-7692
 URL: https://issues.apache.org/jira/browse/MESOS-7692
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 1.3.0
Reporter: Mao Geng


Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
by ENV statements in dockerfile are not available in mesos containerizer any 
more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
JAVA_HOME of java:8 image, etc. 

The env vars are in mesos containerizer in 1.2.0. Looks like a regression to 
me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7692) Default environment variables defined in docker image are not available in mesos containerizer

2017-06-18 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-7692:

Summary: Default environment variables defined in docker image are not 
available in mesos containerizer  (was: Default environment variables defined 
in docker image are not available in mesos containerizer any more)

> Default environment variables defined in docker image are not available in 
> mesos containerizer
> --
>
> Key: MESOS-7692
> URL: https://issues.apache.org/jira/browse/MESOS-7692
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
>Reporter: Mao Geng
>
> Found an unexpected change in 1.3.0-2.0.3 - the environment variables defined 
> by ENV statements in dockerfile are not available in mesos containerizer any 
> more. For example LD_LIBRARY_PATH of tensorflow/tensorflow:latest-gpu image, 
> JAVA_HOME of java:8 image, etc. 
> The env vars are in mesos containerizer in 1.2.0. Looks like a regression to 
> me, isn't it? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7522) Mesos containerizer to support docker credential helpers for private docker registries

2017-05-18 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng reassigned MESOS-7522:
---

Assignee: Mao Geng

> Mesos containerizer to support docker credential helpers for private docker 
> registries
> --
>
> Key: MESOS-7522
> URL: https://issues.apache.org/jira/browse/MESOS-7522
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Mao Geng
>Assignee: Mao Geng
>  Labels: mesos-containerizer
>
> In Pinterest, we use Amazon ECR as our docker registry and use 
> https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine 
> to get auth token automatically. 
> It works well with docker containerizer, as long as I have the 
> .docker/config.json configured "credStores" and --docker_config configured 
> for mesos-agent. 
> However, this doesn't work for mesos containerizer. Meanwhile we want to use 
> mesos containerizer's GPU support, so we have to run a separate docker 
> registry on http and without auth, purely for mesos containerizer. 
> I think it will be good if mesos containerizer can support 
> https://github.com/docker/docker-credential-helpers by default, so that it 
> will address a pain point for the users who are using crendential helpers 
> with private registries on ECR, GCR, quay, dockerhub etc. 
> This might be related to MESOS-7088
> CC [~jieyu] [~gilbert]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7522) Mesos containerizer to support docker credential helpers for private docker registries

2017-05-18 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-7522:

Description: 
In Pinterest, we use Amazon ECR as our docker registry and use 
https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine to 
get auth token automatically. 

It works well with docker containerizer, as long as I have the 
.docker/config.json configured "credStores" and --docker_config configured for 
mesos-agent. 

However, this doesn't work for mesos containerizer. Meanwhile we want to use 
mesos containerizer's GPU support, so we have to run a separate docker registry 
on http and without auth, purely for mesos containerizer. 

I think it will be good if mesos containerizer can support 
https://github.com/docker/docker-credential-helpers by default, so that it will 
address a pain point for the users who are using crendential helpers with 
private registries on ECR, GCR, quay, dockerhub etc. 

This might be related to MESOS-7088

CC [~jieyu] [~gilbert]

  was:
In Pinterest, we use Amazon ECR as our docker registry and use 
https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine to 
get auth token automatically. 

It works well with docker containerizer, as long as I have the 
.docker/config.json configured "credStores" and --docker_config configured for 
mesos-agent. 

However, this doesn't work for mesos containerizer. Meanwhile we want to use 
mesos containerizer's GPU support, so we have to run a separate docker registry 
on http and without auth, purely for mesos containerizer. 

I think it will be good if mesos containerizer can support 
https://github.com/docker/docker-credential-helpers by default, so that it will 
address a pain point for the users who are using crendential helpers with 
private registries on ECR, GCR, quay, dockerhub etc. 

This might be related to MESOS-7088


> Mesos containerizer to support docker credential helpers for private docker 
> registries
> --
>
> Key: MESOS-7522
> URL: https://issues.apache.org/jira/browse/MESOS-7522
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Mao Geng
>  Labels: mesos-containerizer
>
> In Pinterest, we use Amazon ECR as our docker registry and use 
> https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine 
> to get auth token automatically. 
> It works well with docker containerizer, as long as I have the 
> .docker/config.json configured "credStores" and --docker_config configured 
> for mesos-agent. 
> However, this doesn't work for mesos containerizer. Meanwhile we want to use 
> mesos containerizer's GPU support, so we have to run a separate docker 
> registry on http and without auth, purely for mesos containerizer. 
> I think it will be good if mesos containerizer can support 
> https://github.com/docker/docker-credential-helpers by default, so that it 
> will address a pain point for the users who are using crendential helpers 
> with private registries on ECR, GCR, quay, dockerhub etc. 
> This might be related to MESOS-7088
> CC [~jieyu] [~gilbert]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7522) Mesos containerizer to support docker credential helpers for private docker registries

2017-05-18 Thread Mao Geng (JIRA)
Mao Geng created MESOS-7522:
---

 Summary: Mesos containerizer to support docker credential helpers 
for private docker registries
 Key: MESOS-7522
 URL: https://issues.apache.org/jira/browse/MESOS-7522
 Project: Mesos
  Issue Type: Wish
  Components: containerization
Reporter: Mao Geng


In Pinterest, we use Amazon ECR as our docker registry and use 
https://github.com/awslabs/amazon-ecr-credential-helper to let docker engine to 
get auth token automatically. 

It works well with docker containerizer, as long as I have the 
.docker/config.json configured "credStores" and --docker_config configured for 
mesos-agent. 

However, this doesn't work for mesos containerizer. Meanwhile we want to use 
mesos containerizer's GPU support, so we have to run a separate docker registry 
on http and without auth, purely for mesos containerizer. 

I think it will be good if mesos containerizer can support 
https://github.com/docker/docker-credential-helpers by default, so that it will 
address a pain point for the users who are using crendential helpers with 
private registries on ECR, GCR, quay, dockerhub etc. 

This might be related to MESOS-7088



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7088) Support private registry credential per container.

2017-04-12 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966894#comment-15966894
 ] 

Mao Geng edited comment on MESOS-7088 at 4/13/17 12:04 AM:
---

[~gilbert] Will you consider support 
https://github.com/docker/docker-credential-helpers in general? We are using a 
helper for ecr (https://github.com/awslabs/amazon-ecr-credential-helper). 
Basically docker-engine can invoke a helper with four commands: "erase", 
"store", "get", "list" via stdin, and get response from stdout. To pull a 
image, issue "get" command and get auth will be good enough. 
Will be glad to see it is considered in design doc, as many users leverage 
credential helpers to use private registries in clouds. Thanks


was (Author: gengmao):
[~gilbert] Will you consider support 
https://github.com/docker/docker-credential-helpers in general? We are using a 
helper for ecr (https://github.com/awslabs/amazon-ecr-credential-helper). 
Basically docker-engine can invoke a helper with four commands: "erase", 
"store", "get", "list" via stdin, and get response from stdout. To pull a 
image, issue "get" command and get auth will be good enough. 
Will be glad to see it is considered in design doc, as many users leverage 
credential helpers to use private registries in clouds. 

> Support private registry credential per container.
> --
>
> Key: MESOS-7088
> URL: https://issues.apache.org/jira/browse/MESOS-7088
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7088) Support private registry credential per container.

2017-04-12 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966894#comment-15966894
 ] 

Mao Geng commented on MESOS-7088:
-

[~gilbert] Will you consider support 
https://github.com/docker/docker-credential-helpers in general? We are using a 
helper for ecr (https://github.com/awslabs/amazon-ecr-credential-helper). 
Basically docker-engine can invoke a helper with four commands: "erase", 
"store", "get", "list" via stdin, and get response from stdout. To pull a 
image, issue "get" command and get auth will be good enough. 
Will be glad to see it is considered in design doc, as many users leverage 
credential helpers to use private registries in clouds. 

> Support private registry credential per container.
> --
>
> Key: MESOS-7088
> URL: https://issues.apache.org/jira/browse/MESOS-7088
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7232) Add support to auto-load /dev/nvidia-uvm in the GPU isolator

2017-03-11 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng reassigned MESOS-7232:
---

Assignee: Kevin Klues  (was: Mao Geng)

> Add support to auto-load /dev/nvidia-uvm in the GPU isolator
> 
>
> Key: MESOS-7232
> URL: https://issues.apache.org/jira/browse/MESOS-7232
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>
> Loading /dev/nvidia-uvm (and installing a script to make sure it loads on 
> reboot) is not technically part of the official Nvidia driver installation 
> process. The rationale being that CUDA applications typically load this 
> device on-demand if they need it. Unfortunately, it can't load it if mesos 
> hasn't made it available to the container running the CUDA application though.
> We should add support to have the mesos agent auto-load this device when 
> running the GPU isolator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7232) Add support to auto-load /dev/nvidia-uvm in the GPU isolator

2017-03-11 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng reassigned MESOS-7232:
---

Assignee: Mao Geng  (was: Kevin Klues)

> Add support to auto-load /dev/nvidia-uvm in the GPU isolator
> 
>
> Key: MESOS-7232
> URL: https://issues.apache.org/jira/browse/MESOS-7232
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Kevin Klues
>Assignee: Mao Geng
>
> Loading /dev/nvidia-uvm (and installing a script to make sure it loads on 
> reboot) is not technically part of the official Nvidia driver installation 
> process. The rationale being that CUDA applications typically load this 
> device on-demand if they need it. Unfortunately, it can't load it if mesos 
> hasn't made it available to the container running the CUDA application though.
> We should add support to have the mesos agent auto-load this device when 
> running the GPU isolator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems

2016-09-23 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516903#comment-15516903
 ] 

Mao Geng commented on MESOS-5909:
-

[~kaysoky] Thanks for shepherding. Addressed your review comments in 
https://reviews.apache.org/r/52048/, can you please check? 

> Stout "OsTest.User" test can fail on some systems
> -
>
> Key: MESOS-5909
> URL: https://issues.apache.org/jira/browse/MESOS-5909
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Kapil Arya
>Assignee: Mao Geng
>  Labels: mesosphere
> Attachments: MESOS-5909-fix.diff
>
>
> Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner 
> (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted 
> list ("100 471" in my case) causing the validation inside the loop to fail.
> We should sort both lists before comparing the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper

2016-09-20 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505909#comment-15505909
 ] 

Mao Geng commented on MESOS-6169:
-

Sorry. The error only occurs when the .docker/config.json has no "auths". I 
corrected the description accordingly. 
When I set config.json with "auths" like below (and restarted mesos agent), 
actually it works well with the 
https://github.com/awslabs/amazon-ecr-credential-helper. 
{code}
{
"credsStore": "ecr-login",
"auths": {
".dkr.ecr.us-east-1.amazonaws.com": {
}
}
}
{code}

> --docker_config doesn't work with amazon-ecr-credential-helper
> --
>
> Key: MESOS-6169
> URL: https://issues.apache.org/jira/browse/MESOS-6169
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.0.0
>Reporter: Mao Geng
>Assignee: Gilbert Song
>
> We are using AWS ECR as docker registry and using 
> https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
> automatically. 
> As amazon-ecr-credential-helper required, we set a .docker/config.json file 
> like below: 
> {code}
> {
> "credsStore": "ecr-login"
> }
> {code}
> According to the "credsStore" field, docker engine will invoke a 
> "docker-credential-ecr-login" command (which we've installed into /usr/bin/) 
> to get registry credential whenever required, for example when executing 
> docker pull/push. 
> This works fine when we tar the .docker/config.json and use uris prarameter 
> to pull the tar.gz file for every task using docker image. 
> But when I try the new --docker_config option, it doesn't work. The task 
> failed to pull the image from ECR. The error message is 
> {code}
> Failed to launch container: Failed to run 'docker -H 
> unix:///var/run/docker.sock pull 
> .dkr.ecr.us-east-1.amazonaws.com/:latest': exited 
> with status 1; stderr='WARNING: Error loading config 
> file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: 
> authentication required '
> {code}
> Checked the source at 
> https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
> understand why above errors message says loading a temp .dockercfg file, 
> which doesn't exist btw. I assume mesos should pull the image using the 
> config.json file I set to --docker_config, right? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper

2016-09-20 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-6169:

Description: 
We are using AWS ECR as docker registry and using 
https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
automatically. 

As amazon-ecr-credential-helper required, we set a .docker/config.json file 
like below: 
{code}
{
"credsStore": "ecr-login"
{code}
According to the "credsStore" field, docker engine will invoke a 
"docker-credential-ecr-login" command (which we've installed into /usr/bin/) to 
get registry credential whenever required, for example when executing docker 
pull/push. 
This works fine when we tar the .docker/config.json and use uris prarameter to 
pull the tar.gz file for every task using docker image. 

But when I try the new --docker_config option, it doesn't work. The task failed 
to pull the image from ECR. The error message is 
{code}
Failed to launch container: Failed to run 'docker -H 
unix:///var/run/docker.sock pull 
.dkr.ecr.us-east-1.amazonaws.com/:latest': exited with 
status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - 
The Auth config file is empty unauthorized: authentication required '
{code}

Checked the source at 
https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
understand why above errors message says loading a temp .dockercfg file, which 
doesn't exist btw. I assume mesos should pull the image using the config.json 
file I set to --docker_config, right? 

  was:
We are using AWS ECR as docker registry and using 
https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
automatically. 

As amazon-ecr-credential-helper required, we set a .docker/config.json file 
like below: 
{code}
{
"credsStore": "ecr-login",
"auths": {
".dkr.ecr.us-east-1.amazonaws.com": {
}
}
}
{code}
According to the "credsStore" field, docker engine will invoke a 
"docker-credential-ecr-login" command (which we've installed into /usr/bin/) to 
get registry credential whenever required, for example when executing docker 
pull/push. 
This works fine when we tar the .docker/config.json and use uris prarameter to 
pull the tar.gz file for every task using docker image. 

But when I try the new --docker_config option, it doesn't work. The task failed 
to pull the image from ECR. The error message is 
{code}
Failed to launch container: Failed to run 'docker -H 
unix:///var/run/docker.sock pull 
.dkr.ecr.us-east-1.amazonaws.com/:latest': exited with 
status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - 
The Auth config file is empty unauthorized: authentication required '
{code}

Checked the source at 
https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
understand why above errors message says loading a temp .dockercfg file, which 
doesn't exist btw. I assume mesos should pull the image using the config.json 
file I set to --docker_config, right? 


> --docker_config doesn't work with amazon-ecr-credential-helper
> --
>
> Key: MESOS-6169
> URL: https://issues.apache.org/jira/browse/MESOS-6169
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.0.0
>Reporter: Mao Geng
>Assignee: Gilbert Song
>
> We are using AWS ECR as docker registry and using 
> https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
> automatically. 
> As amazon-ecr-credential-helper required, we set a .docker/config.json file 
> like below: 
> {code}
> {
> "credsStore": "ecr-login"
> {code}
> According to the "credsStore" field, docker engine will invoke a 
> "docker-credential-ecr-login" command (which we've installed into /usr/bin/) 
> to get registry credential whenever required, for example when executing 
> docker pull/push. 
> This works fine when we tar the .docker/config.json and use uris prarameter 
> to pull the tar.gz file for every task using docker image. 
> But when I try the new --docker_config option, it doesn't work. The task 
> failed to pull the image from ECR. The error message is 
> {code}
> Failed to launch container: Failed to run 'docker -H 
> unix:///var/run/docker.sock pull 
> .dkr.ecr.us-east-1.amazonaws.com/:latest': exited 
> with status 1; stderr='WARNING: Error loading config 
> file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: 
> authentication required '
> {code}
> Checked the source at 
> https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
> understand why above errors message says loading a temp .dockercfg file, 
> which doesn't exist btw. I assume mesos should pull the image using the 
> config.json file I set to --docker_config, right? 

[jira] [Updated] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper

2016-09-20 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-6169:

Description: 
We are using AWS ECR as docker registry and using 
https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
automatically. 

As amazon-ecr-credential-helper required, we set a .docker/config.json file 
like below: 
{code}
{
"credsStore": "ecr-login"
}
{code}
According to the "credsStore" field, docker engine will invoke a 
"docker-credential-ecr-login" command (which we've installed into /usr/bin/) to 
get registry credential whenever required, for example when executing docker 
pull/push. 
This works fine when we tar the .docker/config.json and use uris prarameter to 
pull the tar.gz file for every task using docker image. 

But when I try the new --docker_config option, it doesn't work. The task failed 
to pull the image from ECR. The error message is 
{code}
Failed to launch container: Failed to run 'docker -H 
unix:///var/run/docker.sock pull 
.dkr.ecr.us-east-1.amazonaws.com/:latest': exited with 
status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - 
The Auth config file is empty unauthorized: authentication required '
{code}

Checked the source at 
https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
understand why above errors message says loading a temp .dockercfg file, which 
doesn't exist btw. I assume mesos should pull the image using the config.json 
file I set to --docker_config, right? 

  was:
We are using AWS ECR as docker registry and using 
https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
automatically. 

As amazon-ecr-credential-helper required, we set a .docker/config.json file 
like below: 
{code}
{
"credsStore": "ecr-login"
{code}
According to the "credsStore" field, docker engine will invoke a 
"docker-credential-ecr-login" command (which we've installed into /usr/bin/) to 
get registry credential whenever required, for example when executing docker 
pull/push. 
This works fine when we tar the .docker/config.json and use uris prarameter to 
pull the tar.gz file for every task using docker image. 

But when I try the new --docker_config option, it doesn't work. The task failed 
to pull the image from ECR. The error message is 
{code}
Failed to launch container: Failed to run 'docker -H 
unix:///var/run/docker.sock pull 
.dkr.ecr.us-east-1.amazonaws.com/:latest': exited with 
status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - 
The Auth config file is empty unauthorized: authentication required '
{code}

Checked the source at 
https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
understand why above errors message says loading a temp .dockercfg file, which 
doesn't exist btw. I assume mesos should pull the image using the config.json 
file I set to --docker_config, right? 


> --docker_config doesn't work with amazon-ecr-credential-helper
> --
>
> Key: MESOS-6169
> URL: https://issues.apache.org/jira/browse/MESOS-6169
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.0.0
>Reporter: Mao Geng
>Assignee: Gilbert Song
>
> We are using AWS ECR as docker registry and using 
> https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
> automatically. 
> As amazon-ecr-credential-helper required, we set a .docker/config.json file 
> like below: 
> {code}
> {
> "credsStore": "ecr-login"
> }
> {code}
> According to the "credsStore" field, docker engine will invoke a 
> "docker-credential-ecr-login" command (which we've installed into /usr/bin/) 
> to get registry credential whenever required, for example when executing 
> docker pull/push. 
> This works fine when we tar the .docker/config.json and use uris prarameter 
> to pull the tar.gz file for every task using docker image. 
> But when I try the new --docker_config option, it doesn't work. The task 
> failed to pull the image from ECR. The error message is 
> {code}
> Failed to launch container: Failed to run 'docker -H 
> unix:///var/run/docker.sock pull 
> .dkr.ecr.us-east-1.amazonaws.com/:latest': exited 
> with status 1; stderr='WARNING: Error loading config 
> file:/tmp/28Vd2O/.dockercfg - The Auth config file is empty unauthorized: 
> authentication required '
> {code}
> Checked the source at 
> https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
> understand why above errors message says loading a temp .dockercfg file, 
> which doesn't exist btw. I assume mesos should pull the image using the 
> config.json file I set to --docker_config, right? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems

2016-09-19 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504625#comment-15504625
 ] 

Mao Geng commented on MESOS-5909:
-

Got it. Thanks! Look forward to addressing review comments. 

> Stout "OsTest.User" test can fail on some systems
> -
>
> Key: MESOS-5909
> URL: https://issues.apache.org/jira/browse/MESOS-5909
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Kapil Arya
>Assignee: Gilbert Song
>  Labels: mesosphere
> Attachments: MESOS-5909-fix.diff
>
>
> Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner 
> (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted 
> list ("100 471" in my case) causing the validation inside the loop to fail.
> We should sort both lists before comparing the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5909) Stout "OsTest.User" test can fail on some systems

2016-09-19 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504307#comment-15504307
 ] 

Mao Geng commented on MESOS-5909:
-

Thanks [~kaysoky]! I created https://reviews.apache.org/r/52048/ and added 
[~gilbert]  [~karya] as reviewers. May I add you as a reviewer too? 

> Stout "OsTest.User" test can fail on some systems
> -
>
> Key: MESOS-5909
> URL: https://issues.apache.org/jira/browse/MESOS-5909
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Kapil Arya
>Assignee: Gilbert Song
>  Labels: mesosphere
> Attachments: MESOS-5909-fix.diff
>
>
> Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner 
> (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted 
> list ("100 471" in my case) causing the validation inside the loop to fail.
> We should sort both lists before comparing the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5909) Stout "OsTest.User" test can fail on some systems

2016-09-18 Thread Mao Geng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao Geng updated MESOS-5909:

Attachment: MESOS-5909-fix.diff

Just hit this error when running `make check`. Figured out a fix (as [~karya] 
said, sort both lists then compare) to the os_tests.cpp. See attached. 

I am new to Mesos community. If I want to submit a patch, should I create a PR 
on https://github.com/apache/mesos (I saw someone did), or just follow the 
http://mesos.apache.org/documentation/latest/submitting-a-patch/ to create a 
review using post-reviews.py? 

cc [~karya] [~gilbert]

> Stout "OsTest.User" test can fail on some systems
> -
>
> Key: MESOS-5909
> URL: https://issues.apache.org/jira/browse/MESOS-5909
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Kapil Arya
>Assignee: Gilbert Song
>  Labels: mesosphere
> Attachments: MESOS-5909-fix.diff
>
>
> Libc call {{getgrouplist}} doesn't return the {{gid}} list in a sorted manner 
> (in my case, it's returning "471 100") ... whereas {{id -G}} return a sorted 
> list ("100 471" in my case) causing the validation inside the loop to fail.
> We should sort both lists before comparing the values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6169) --docker_config doesn't work with amazon-ecr-credential-helper

2016-09-15 Thread Mao Geng (JIRA)
Mao Geng created MESOS-6169:
---

 Summary: --docker_config doesn't work with 
amazon-ecr-credential-helper
 Key: MESOS-6169
 URL: https://issues.apache.org/jira/browse/MESOS-6169
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 1.0.0
Reporter: Mao Geng


We are using AWS ECR as docker registry and using 
https://github.com/awslabs/amazon-ecr-credential-helper to get credential 
automatically. 

As amazon-ecr-credential-helper required, we set a .docker/config.json file 
like below: 
{code}
{
"credsStore": "ecr-login",
"auths": {
".dkr.ecr.us-east-1.amazonaws.com": {
}
}
}
{code}
According to the "credsStore" field, docker engine will invoke a 
"docker-credential-ecr-login" command (which we've installed into /usr/bin/) to 
get registry credential whenever required, for example when executing docker 
pull/push. 
This works fine when we tar the .docker/config.json and use uris prarameter to 
pull the tar.gz file for every task using docker image. 

But when I try the new --docker_config option, it doesn't work. The task failed 
to pull the image from ECR. The error message is 
{code}
Failed to launch container: Failed to run 'docker -H 
unix:///var/run/docker.sock pull 
.dkr.ecr.us-east-1.amazonaws.com/:latest': exited with 
status 1; stderr='WARNING: Error loading config file:/tmp/28Vd2O/.dockercfg - 
The Auth config file is empty unauthorized: authentication required '
{code}

Checked the source at 
https://github.com/apache/mesos/blob/master/src/docker/docker.cpp, but don't 
understand why above errors message says loading a temp .dockercfg file, which 
doesn't exist btw. I assume mesos should pull the image using the config.json 
file I set to --docker_config, right? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)