[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.

2016-03-21 Thread Sergey Galkin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204284#comment-15204284
 ] 

Sergey Galkin commented on MESOS-4977:
--

During creating cluster in Marathon I dumped traffic on the 5050 port and did 
not find differences in the requests between 
failed
b4ee1f97bf56980fbc0891a83e3652a4.b7b6bf11-ef5a-11e5-89d2-6805ca32e0f0
and running
b4ee1f97bf56980fbc0891a83e3652a4.fd840243-ef5a-11e5-89d2-6805ca32e0f0
tasks



> Sometime Cmd":["-c","echo 'No such file or directory'] in task.
> ---
>
> Key: MESOS-4977
> URL: https://issues.apache.org/jira/browse/MESOS-4977
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.2
> Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS
>Reporter: Sergey Galkin
>
> mesos - 0.27.0
> marathon - 0.15.2
> I am trying to launch 1 simple docker application with nginx with 500 
> instances on cluster with 189 HW nodes through Marathon
> {code}
> ID /1f532267a08494e3081c1acb42d273b7
> Command Unspecified
> Constraints Unspecified
> Dependencies Unspecified
> Labels Unspecified
> Resource Roles Unspecified
> Container
> {
>   "type": "DOCKER",
>   "volumes": [],
>   "docker": {
> "image": "nginx",
> "network": "BRIDGE",
> "portMappings": [
>   {
> "containerPort": 80,
> "hostPort": 0,
> "servicePort": 1,
> "protocol": "tcp"
>   }
> ],
> "privileged": false,
> "parameters": [],
> "forcePullImage": false
>   }
> }
> CPUs 1
> Environment Unspecified
> Executor Unspecified
> Health Checks 
> [
>   {
> "path": "/",
> "protocol": "HTTP",
> "portIndex": 0,
> "gracePeriodSeconds": 300,
> "intervalSeconds": 60,
> "timeoutSeconds": 20,
> "maxConsecutiveFailures": 3,
> "ignoreHttp1xx": false
>   }
> ]
> Instances 500
> IP Address Unspecified
> Memory 256 MiB
> Disk Space 50 MiB
> Ports 1
> Backoff Factor 1.15
> Backoff 1 seconds
> Max Launch Delay 3600 seconds
> URIs Unspecified
> User Unspecified
> {code}
> Deployment stopped on Delayed, only about 360-370 of 500 instances are 
> successful. In the stdout in the failed mesos tasks I see "No such file or 
> directory"
> As I see in /var/log/upstarе/docker.log with enabled debug mesos sometimes 
> try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or 
> directory'; exit 1"]) and this task failed. Sometime everything is ok 
> "Cmd":null and task in RUNNING state
> Part of the log available in http://paste.openstack.org/show/491122/
> I successfully started 700 nginx with docker applications with 10 instances 
> simultaneously in this cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.

2016-03-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201802#comment-15201802
 ] 

haosdent commented on MESOS-4977:
-

Your error log related to this part in code.
{codetitle=slave.cpp|borderStyle=solid}
  if (task.has_command()) {
  ...
  executor.mutable_command()->set_value(
  "echo '" +
  (path.isError() ? path.error() : "No such file or directory") +
  "'; exit 1");
  ...
  }
{code}

The wired thing is it go into this part while you don't have command in task. 
If you don't use marathon, for example, use {{mesos-execute}} to simulate this 
case, would it still happens?

> Sometime Cmd":["-c","echo 'No such file or directory'] in task.
> ---
>
> Key: MESOS-4977
> URL: https://issues.apache.org/jira/browse/MESOS-4977
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.2
> Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS
>Reporter: Sergey Galkin
>
> mesos - 0.27.0
> marathon - 0.15.2
> I am trying to launch 1 simple docker application with nginx with 500 
> instances on cluster with 189 HW nodes through Marathon
> {code}
> ID /1f532267a08494e3081c1acb42d273b7
> Command Unspecified
> Constraints Unspecified
> Dependencies Unspecified
> Labels Unspecified
> Resource Roles Unspecified
> Container
> {
>   "type": "DOCKER",
>   "volumes": [],
>   "docker": {
> "image": "nginx",
> "network": "BRIDGE",
> "portMappings": [
>   {
> "containerPort": 80,
> "hostPort": 0,
> "servicePort": 1,
> "protocol": "tcp"
>   }
> ],
> "privileged": false,
> "parameters": [],
> "forcePullImage": false
>   }
> }
> CPUs 1
> Environment Unspecified
> Executor Unspecified
> Health Checks 
> [
>   {
> "path": "/",
> "protocol": "HTTP",
> "portIndex": 0,
> "gracePeriodSeconds": 300,
> "intervalSeconds": 60,
> "timeoutSeconds": 20,
> "maxConsecutiveFailures": 3,
> "ignoreHttp1xx": false
>   }
> ]
> Instances 500
> IP Address Unspecified
> Memory 256 MiB
> Disk Space 50 MiB
> Ports 1
> Backoff Factor 1.15
> Backoff 1 seconds
> Max Launch Delay 3600 seconds
> URIs Unspecified
> User Unspecified
> {code}
> Deployment stopped on Delayed, only about 360-370 of 500 instances are 
> successful. In the stdout in the failed mesos tasks I see "No such file or 
> directory"
> As I see in /var/log/upstarе/docker.log with enabled debug mesos sometimes 
> try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or 
> directory'; exit 1"]) and this task failed. Sometime everything is ok 
> "Cmd":null and task in RUNNING state
> Part of the log available in http://paste.openstack.org/show/491122/
> I successfully started 700 nginx with docker applications with 10 instances 
> simultaneously in this cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.

2016-03-19 Thread SERGEY GALKIN (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201767#comment-15201767
 ] 

SERGEY GALKIN commented on MESOS-4977:
--

Mesos Slaves HW

HP ProLiant DL380 Gen9,
CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with 
hyperthreading))
RAM - 264G,
Storage - 3.0T on RAID on HP Smart Array P840 Controller,
HDD - 12 x HP EH0600JDYTL
Network - 2 x Intel Corporation Ethernet 10G2P 
X710,


> Sometime Cmd":["-c","echo 'No such file or directory'] in task.
> ---
>
> Key: MESOS-4977
> URL: https://issues.apache.org/jira/browse/MESOS-4977
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.2
> Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS
>Reporter: SERGEY GALKIN
>
> mesos - 0.27.0
> marathon - 0.15.2
> I am trying to launch 1 simple docker application with nginx with 500 
> instances on cluster with 189 HW nodes through Marathon
> {code}
> ID /1f532267a08494e3081c1acb42d273b7
> Command Unspecified
> Constraints Unspecified
> Dependencies Unspecified
> Labels Unspecified
> Resource Roles Unspecified
> Container
> {
>   "type": "DOCKER",
>   "volumes": [],
>   "docker": {
> "image": "nginx",
> "network": "BRIDGE",
> "portMappings": [
>   {
> "containerPort": 80,
> "hostPort": 0,
> "servicePort": 1,
> "protocol": "tcp"
>   }
> ],
> "privileged": false,
> "parameters": [],
> "forcePullImage": false
>   }
> }
> CPUs 1
> Environment Unspecified
> Executor Unspecified
> Health Checks 
> [
>   {
> "path": "/",
> "protocol": "HTTP",
> "portIndex": 0,
> "gracePeriodSeconds": 300,
> "intervalSeconds": 60,
> "timeoutSeconds": 20,
> "maxConsecutiveFailures": 3,
> "ignoreHttp1xx": false
>   }
> ]
> Instances 500
> IP Address Unspecified
> Memory 256 MiB
> Disk Space 50 MiB
> Ports 1
> Backoff Factor 1.15
> Backoff 1 seconds
> Max Launch Delay 3600 seconds
> URIs Unspecified
> User Unspecified
> {code}
> Deployment stopped on Delayed, only about 360-370 of 500 instances are 
> successful. In the stdout in the failed mesos tasks I see "No such file or 
> directory"
> As I see in /var/log/upstarе/docker.log with enabled debug mesos sometimes 
> try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or 
> directory'; exit 1"]) and this task failed. Sometime everything is ok 
> "Cmd":null and task in RUNNING state
> Part of the log available in http://paste.openstack.org/show/491122/
> I successfully started 700 nginx with docker applications with 10 instances 
> simultaneously in this cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4977) Sometime Cmd":["-c","echo 'No such file or directory'] in task.

2016-03-18 Thread SERGEY GALKIN (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201757#comment-15201757
 ] 

SERGEY GALKIN commented on MESOS-4977:
--

Logs from mesos-master

1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 - with 
"Cmd":["-c","echo 'No such file or directory'; exit 1"] (failed)

mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.224059  2638 master.hpp:176] Adding task 
1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 with 
resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.224105  2638 master.cpp:3621] Launching task 
1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at 
scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources 
cpus(*):1; mem(*):256; disk(*):50; ports(*):[19743-19743] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:W0318
 15:14:33.154769  2656 master.cpp:4885] Ignoring unknown exited executor 
'1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0' of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:33.156250  2639 master.cpp:4789] Status update TASK_FAILED (UUID: 
7c90d238-fcc4-4ede-9238-200744693449) for task 
1f532267a08494e3081c1acb42d273b7.e25466eb-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)


1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 - with 
"Cmd":null (running)

mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.223767  2638 master.hpp:176] Adding task 
1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 with 
resources cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 (172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:27.223814  2638 master.cpp:3621] Launching task 
1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- (marathon) at 
scheduler-f59022ec-3650-4212-beea-38f50ce6e427@172.20.9.50:56418 with resources 
cpus(*):1; mem(*):256; disk(*):50; ports(*):[9016-9016] on slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)
mesos-master.729039-comp-disk-280.invalid-user.log.INFO.20160318-151426.2595:I0318
 15:14:33.200388  2648 master.cpp:4789] Status update TASK_RUNNING (UUID: 
563864b0-8780-4fd3-a106-041600599e2e) for task 
1f532267a08494e3081c1acb42d273b7.e2548d07-ed1b-11e5-89d2-6805ca32e0f0 of 
framework 5445dbdc-c58a-4f78-aef2-9ab129a640fa- from slave 
5445dbdc-c58a-4f78-aef2-9ab129a640fa-S60 at slave(1)@172.20.9.205:5051 
(172.20.9.205)


> Sometime Cmd":["-c","echo 'No such file or directory'] in task.
> ---
>
> Key: MESOS-4977
> URL: https://issues.apache.org/jira/browse/MESOS-4977
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.2
> Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS
>Reporter: SERGEY GALKIN
>
> mesos - 0.27.0
> marathon - 0.15.2
> I am trying to launch 1 simple docker application with nginx with 500 
> instances on cluster with 189 HW nodes through Marathon
> {code}
> ID /1f532267a08494e3081c1acb42d273b7
> Command Unspecified
> Constraints Unspecified
> Dependencies Unspecified
> Labels Unspecified
> Resource Roles Unspecified
> Container
> {
>   "type": "DOCKER",
>   "volumes": [],
>   "docker": {
> "image": "nginx",
> "network": "BRIDGE",
> "portMappings": [
>   {
> "containerPort": 80,
> "hostPort": 0,
> "servicePort": 1,
> "protocol": "tcp"
>   }
> ],
> "privileged": false,
> "parameters": [],
> "forcePullImage": false
>   }
> }
> CPUs 1
> Environment Unspecified
> Executor Unspecified
> Health Checks 
> [
>   {
> "path": "/",
> "protocol": "HTTP",
> "portIndex": 0,
> "gracePeriodSeconds": 300,
> "intervalSeconds": 60,
> "timeoutSeconds": 20,
> "maxConsecutiveFailures": 3,
> "ignoreHttp1xx": false
>   }
> ]
> Instances 500
> IP Address Unspecified
> M