[ https://issues.apache.org/jira/browse/MESOS-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201767#comment-15201767 ]
SERGEY GALKIN edited comment on MESOS-4977 at 3/18/16 4:55 PM: --------------------------------------------------------------- Mesos Slaves HW (189 nodes) HP ProLiant DL380 Gen9, CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with hyperthreading)) RAM - 264G, Storage - 3.0T on RAID on HP Smart Array P840 Controller, HDD - 12 x HP EH0600JDYTL Network - 2 x Intel Corporation Ethernet 10G 2P X710, was (Author: sergeygals): Mesos Slaves HW HP ProLiant DL380 Gen9, CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with hyperthreading)) RAM - 264G, Storage - 3.0T on RAID on HP Smart Array P840 Controller, HDD - 12 x HP EH0600JDYTL Network - 2 x Intel Corporation Ethernet 10G 2P X710, > Sometime Cmd":["-c","echo 'No such file or directory'] in task. > --------------------------------------------------------------- > > Key: MESOS-4977 > URL: https://issues.apache.org/jira/browse/MESOS-4977 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.27.2 > Environment: 189 mesos slaves on Ubuntu 14.04.3 LTS > Reporter: SERGEY GALKIN > > mesos - 0.27.0 > marathon - 0.15.2 > I am trying to launch 1 simple docker application with nginx with 500 > instances on cluster with 189 HW nodes through Marathon > {code} > ID /1f532267a08494e3081c1acb42d273b7 > Command Unspecified > Constraints Unspecified > Dependencies Unspecified > Labels Unspecified > Resource Roles Unspecified > Container > { > "type": "DOCKER", > "volumes": [], > "docker": { > "image": "nginx", > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 80, > "hostPort": 0, > "servicePort": 10000, > "protocol": "tcp" > } > ], > "privileged": false, > "parameters": [], > "forcePullImage": false > } > } > CPUs 1 > Environment Unspecified > Executor Unspecified > Health Checks > [ > { > "path": "/", > "protocol": "HTTP", > "portIndex": 0, > "gracePeriodSeconds": 300, > "intervalSeconds": 60, > "timeoutSeconds": 20, > "maxConsecutiveFailures": 3, > "ignoreHttp1xx": false > } > ] > Instances 500 > IP Address Unspecified > Memory 256 MiB > Disk Space 50 MiB > Ports 10000 > Backoff Factor 1.15 > Backoff 1 seconds > Max Launch Delay 3600 seconds > URIs Unspecified > User Unspecified > {code} > Deployment stopped on Delayed, only about 360-370 of 500 instances are > successful. In the stdout in the failed mesos tasks I see "No such file or > directory" > As I see in /var/log/upstarŠµ/docker.log with enabled debug mesos sometimes > try to start containers with strange Cmd ("Cmd":["-c","echo 'No such file or > directory'; exit 1"]) and this task failed. Sometime everything is ok > "Cmd":null and task in RUNNING state > Part of the log available in http://paste.openstack.org/show/491122/ > I successfully started 700 nginx with docker applications with 10 instances > simultaneously in this cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)