[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048240#comment-15048240
 ] 

Mark Hindess commented on MESOS-3738:
-------------------------------------

Has this fix been backported to a 0.23.x release? I'm using the latest 0.23.1 
debian package and it is still broken.

In case it helps anyone else upgrade smoothly to a working release, I am using 
a workaround of creating a mesos-health-check wrapper that execs the real 
mesos-health-check. That is:

{code}
    bash$ cat <<EOF >mesos-health-check
    > #!/bin/sh
    > exec /usr/libexec/mesos/mesos-health-check "$@"
    > EOF
    bash$ chmod 0755 mesos-health-check
    bash$ fakeroot sh -c "chown root:root mesos-health-check; \
                   tar cf - mesos-health-check |gzip -9 
>mesos-health-check.tar.gz"
    bash$ tar tvzf mesos-health-check.tar.gz
    -rwxr-xr-x root/root        56 2015-12-09 07:44 mesos-health-check
    bash$ # deploy mesos-health-check.tar.gz to your mesos-slaves (I used 
ansible)
    bash$ # if using docker, restart your slaves with mesos-health-check.tar.gz
    bash$ # mounted as volume into your mesos-slave container
    bash$ # add file:///path/to/mesos-health-check.tar.gz to uris in app json
{code}

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -----------------------------------------------------------------------------------------
>
>                 Key: MESOS-3738
>                 URL: https://issues.apache.org/jira/browse/MESOS-3738
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 0.25.0
>         Environment: Docker 1.8.0:
> Client:
>  Version:      1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:        Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:      linux/amd64
> Server:
>  Version:      1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:        Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:      linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>            Reporter: Yong Tang
>            Assignee: haosdent
>             Fix For: 0.26.0
>
>         Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> {noformat}
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> {noformat}
> Marathon JSON file:
> {code}
> {
>   "id": "ubuntu",
>   "container":
>   {
>     "type": "DOCKER",
>     "docker":
>     {
>       "image": "ubuntu",
>       "network": "BRIDGE",
>       "parameters": []
>     }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
>     {
>       "protocol": "COMMAND",
>       "command": { "value": "echo Success" },
>       "gracePeriodSeconds": 3000,
>       "intervalSeconds": 5,
>       "timeoutSeconds": 5,
>       "maxConsecutiveFailures": 300
>     }
>   ],
>   "instances": 1
> }
> {code}
> {noformat}
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-0000/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-0000/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-0000/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.127950    56 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.130627    62 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does not support swap limit capabilities, memory limited 
> without swap.
> ABORT: 
> (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177): 
> Failed to os::execvpe in childMain: No such file or directory*** Aborted at 
> 1444864558 (unix time) try "date -d @1444864558" if you are using GNU date ***
> PC: @     0x7fc8c5975107 (unknown)
> *** SIGABRT (@0x5e) received by PID 94 (TID 0x7fc8bee5e700) from PID 94; 
> stack trace: ***
>     @     0x7fc8c5cf88d0 (unknown)
>     @     0x7fc8c5975107 (unknown)
>     @     0x7fc8c59764e8 (unknown)
>     @           0x419142 _Abort()
>     @           0x41917c _Abort()
>     @     0x7fc8c7745780 process::childMain()
>     @     0x7fc8c7747a49 std::_Function_handler<>::_M_invoke()
>     @     0x7fc8c774561c process::defaultClone()
>     @     0x7fc8c7745f81 process::subprocess()
>     @           0x43c58d 
> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>     @     0x7fc8c771b424 process::ProcessManager::resume()
>     @     0x7fc8c771b74f process::internal::schedule()
>     @     0x7fc8c64d3970 (unknown)
>     @     0x7fc8c5cf10a4 start_thread
>     @     0x7fc8c5a2604d (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to