[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.
[ https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204284#comment-15204284 ] Sergey Galkin commented on MESOS-4977: -- During creating cluster in Marathon I dumped traffic on the 5050 port and did not find differences in the requests between failed b4ee1f97bf56980fbc0891a83e3652a4.b7b6bf11-ef5a-11e5-89d2-6805ca32e0f0 and running b4ee1f97bf56980fbc0891a83e3652a4.fd840243-ef5a-11e5-89d2-6805ca32e0f0 tasks > Sometime Cmd":["-c","echo 'No such file or directory'] in task. > --- > > Key: MESOS-4977 > URL: https://issues.apache.org/jira/browse/MESOS-4977 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.2 > Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS >Reporter: Sergey Galkin > > mesos - 0.27.0 > marathon - 0.15.2 > I am trying to launch 1 simple docker application with nginx with 500 > instances on cluster with 189 HW nodes through Marathon > {code} > ID /1f532267a08494e3081c1acb42d273b7 > Command Unspecified > Constraints Unspecified > Dependencies Unspecified > Labels Unspecified > Resource Roles Unspecified > Container > { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "nginx", > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 80, > "hostPort": 0, > "servicePort": 1, > "protocol": "tcp" > } > ], > "privileged": false, > "parameters": [], > "forcePullImage": false > } > } > CPUs 1 > Environment Unspecified > Executor Unspecified > Health Checks > [ > { > "path": "/", > "protocol": "HTTP", > "portIndex": 0, > "gracePeriodSeconds": 300, > "intervalSeconds": 60, > "timeoutSeconds": 20, > "maxConsecutiveFailures": 3, > "ignoreHttp1xx": false > } > ] > Instances 500 > IP Address Unspecified > Memory 256 MiB > Disk Space 50 MiB > Ports 1 > Backoff Factor 1.15 > Backoff 1 seconds > Max Launch Delay 3600 seconds > URIs Unspecified > User Unspecified > {code} > Deployment stopped on Delayed, only about 360-370 of 500 instances are > successful. In the stdout in the failed mesos tasks I see "No such file or > directory" > As I see in /var/log/upstarе/docker.log with enabled debug mesos sometimes > try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or > directory'; exit 1"]) and this task failed. Sometime everything is ok > "Cmd":null and task in RUNNING state > Part of the log available in http://paste.openstack.org/show/491122/ > I successfully started 700 nginx with docker applications with 10 instances > simultaneously in this cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.
[ https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201802#comment-15201802 ] haosdent commented on MESOS-4977: - Your error log related to this part in code. {codetitle=slave.cpp|borderStyle=solid} if (task.has_command()) { ... executor.mutable_command()->set_value( "echo '" + (path.isError() ? path.error() : "No such file or directory") + "'; exit 1"); ... } {code} The wired thing is it go into this part while you don't have command in task. If you don't use marathon, for example, use {{mesos-execute}} to simulate this case, would it still happens? > Sometime Cmd":["-c","echo 'No such file or directory'] in task. > --- > > Key: MESOS-4977 > URL: https://issues.apache.org/jira/browse/MESOS-4977 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.2 > Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS >Reporter: Sergey Galkin > > mesos - 0.27.0 > marathon - 0.15.2 > I am trying to launch 1 simple docker application with nginx with 500 > instances on cluster with 189 HW nodes through Marathon > {code} > ID /1f532267a08494e3081c1acb42d273b7 > Command Unspecified > Constraints Unspecified > Dependencies Unspecified > Labels Unspecified > Resource Roles Unspecified > Container > { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "nginx", > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 80, > "hostPort": 0, > "servicePort": 1, > "protocol": "tcp" > } > ], > "privileged": false, > "parameters": [], > "forcePullImage": false > } > } > CPUs 1 > Environment Unspecified > Executor Unspecified > Health Checks > [ > { > "path": "/", > "protocol": "HTTP", > "portIndex": 0, > "gracePeriodSeconds": 300, > "intervalSeconds": 60, > "timeoutSeconds": 20, > "maxConsecutiveFailures": 3, > "ignoreHttp1xx": false > } > ] > Instances 500 > IP Address Unspecified > Memory 256 MiB > Disk Space 50 MiB > Ports 1 > Backoff Factor 1.15 > Backoff 1 seconds > Max Launch Delay 3600 seconds > URIs Unspecified > User Unspecified > {code} > Deployment stopped on Delayed, only about 360-370 of 500 instances are > successful. In the stdout in the failed mesos tasks I see "No such file or > directory" > As I see in /var/log/upstarе/docker.log with enabled debug mesos sometimes > try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or > directory'; exit 1"]) and this task failed. Sometime everything is ok > "Cmd":null and task in RUNNING state > Part of the log available in http://paste.openstack.org/show/491122/ > I successfully started 700 nginx with docker applications with 10 instances > simultaneously in this cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.
[ https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201767#comment-15201767 ] SERGEY GALKIN commented on MESOS-4977: -- Mesos Slaves HW HP ProLiant DL380 Gen9, CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with hyperthreading)) RAM - 264G, Storage - 3.0T on RAID on HP Smart Array P840 Controller, HDD - 12 x HP EH0600JDYTL Network - 2 x Intel Corporation Ethernet 10G2P X710, > Sometime Cmd":["-c","echo 'No such file or directory'] in task. > --- > > Key: MESOS-4977 > URL: https://issues.apache.org/jira/browse/MESOS-4977 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.2 > Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS >Reporter: SERGEY GALKIN > > mesos - 0.27.0 > marathon - 0.15.2 > I am trying to launch 1 simple docker application with nginx with 500 > instances on cluster with 189 HW nodes through Marathon > {code} > ID /1f532267a08494e3081c1acb42d273b7 > Command Unspecified > Constraints Unspecified > Dependencies Unspecified > Labels Unspecified > Resource Roles Unspecified > Container > { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "nginx", > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 80, > "hostPort": 0, > "servicePort": 1, > "protocol": "tcp" > } > ], > "privileged": false, > "parameters": [], > "forcePullImage": false > } > } > CPUs 1 > Environment Unspecified > Executor Unspecified > Health Checks > [ > { > "path": "/", > "protocol": "HTTP", > "portIndex": 0, > "gracePeriodSeconds": 300, > "intervalSeconds": 60, > "timeoutSeconds": 20, > "maxConsecutiveFailures": 3, > "ignoreHttp1xx": false > } > ] > Instances 500 > IP Address Unspecified > Memory 256 MiB > Disk Space 50 MiB > Ports 1 > Backoff Factor 1.15 > Backoff 1 seconds > Max Launch Delay 3600 seconds > URIs Unspecified > User Unspecified > {code} > Deployment stopped on Delayed, only about 360-370 of 500 instances are > successful. In the stdout in the failed mesos tasks I see "No such file or > directory" > As I see in /var/log/upstarе/docker.log with enabled debug mesos sometimes > try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or > directory'; exit 1"]) and this task failed. Sometime everything is ok > "Cmd":null and task in RUNNING state > Part of the log available in http://paste.openstack.org/show/491122/ > I successfully started 700 nginx with docker applications with 10 instances > simultaneously in this cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.
[ https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201757#comment-15201757 ] SERGEY GALKIN commented on MESOS-4977: -- Logs from mesos-master 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 - with "Cmd":["-c","echo 'No such file or directory'; exit 1"] (failed) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.224059 2638 master.hpp:176] Adding task 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.224105 2638 master.cpp:3621] Launching task 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:W0318 15:14:33.154769 2656 master.cpp:4885] Ignoring unknown exited executor '1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0' of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:33.156250 2639 master.cpp:4789] Status update TASK_FAILED (UUID: 7c90d238-fcc4-4ede-9238-200744693449) for task 1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 - with "Cmd":null (running) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.223767 2638 master.hpp:176] Adding task 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:27.223814 2638 master.cpp:3621] Launching task 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318 15:14:33.200388 2648 master.cpp:4789] Status update TASK_RUNNING (UUID: 563864b0-8780-4fd3-a106-041600599e2e) for task 1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 (172.20.9.205) > Sometime Cmd":["-c","echo 'No such file or directory'] in task. > --- > > Key: MESOS-4977 > URL: https://issues.apache.org/jira/browse/MESOS-4977 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.2 > Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS >Reporter: SERGEY GALKIN > > mesos - 0.27.0 > marathon - 0.15.2 > I am trying to launch 1 simple docker application with nginx with 500 > instances on cluster with 189 HW nodes through Marathon > {code} > ID /1f532267a08494e3081c1acb42d273b7 > Command Unspecified > Constraints Unspecified > Dependencies Unspecified > Labels Unspecified > Resource Roles Unspecified > Container > { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "nginx", > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 80, > "hostPort": 0, > "servicePort": 1, > "protocol": "tcp" > } > ], > "privileged": false, > "parameters": [], > "forcePullImage": false > } > } > CPUs 1 > Environment Unspecified > Executor Unspecified > Health Checks > [ > { > "path": "/", > "protocol": "HTTP", > "portIndex": 0, > "gracePeriodSeconds": 300, > "intervalSeconds": 60, > "timeoutSeconds": 20, > "maxConsecutiveFailures": 3, > "ignoreHttp1xx": false > } > ] > Instances 500 > IP Address Unspecified > M