Re: Docker odd behavior
Hi Eduardo, There is a known defect in Mesos that matches your description: https://issues.apache.org/jira/browse/MESOS-1915 https://issues.apache.org/jira/browse/MESOS-1884 A fix will be included in the next release. https://reviews.apache.org/r/26486 You see the killTask because the default --task_launch_timeout value for Marathon is 60 seconds. Created an issue to make the logging around this better: https://github.com/mesosphere/marathon/issues/732 -- Connor On Oct 22, 2014, at 16:18, Eduardo Jiménez wrote: > Hi, > > I've started experimenting with mesos using the docker containerizer, and > running a simple example got into a very strange state. > > I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux: > > Linux 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 > x86_64 x86_64 x86_64 GNU/Linux > > Docker version 1.2.0, build fa7b24f/1.2.0 > > I start the mesos slave with these relevant options: > > --cgroups_hierarchy=/cgroup > --containerizers=docker,mesos > --executor_registration_timeout=5mins > --isolation=cgroups/cpu,cgroups/mem > > I launched a very simple app, which is from the mesosphere examples: > > { > "container": { > "type": "DOCKER", > "docker": { > "image": "libmesos/ubuntu" > } > }, > "id": "ubuntu-docker2", > "instances": "1", > "cpus": "0.5", > "mem": "512", > "uris": [], > "cmd": "while sleep 10; do date -u +%T; done" > } > > The app launches, but then mesos states the task is KILLED, yet the docker > container is STILL running. Here's the sequence of logs from that mesos-slave. > > 1) Task gets created and assigned: > > I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework > 20141017-172055-3489660938-5050-1603- > I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework > 20141017-172055-3489660938-5050-1603- > I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > '20141017-172055-3489660938-5050-1603- > I1022 17:44:13.989893 15195 docker.cpp:743] Starting container > 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework > '20141017-172055-3489660938-5050-1603-' > > So far so good. The log statements right next to "Starting container" is: > > I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603- > I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update > TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603- from @0.0.0.0:0 > W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered executor > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework > 20141017-172055-3489660938-5050-1603- because it has no tasks > E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for > container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for > terminal task, destroying container: No container found > > After this, there's several log messages like this: > > I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status > update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603- > I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding status > update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050 > W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status > update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603- > I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding status > update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050 > > > Eventually the TASK_KILLED update is acked and the Mesos UI shows the task as > killed. By then, the process should be dead, but its not. > > $ sudo docker ps > CONTA
Docker odd behavior
Hi, I've started experimenting with mesos using the docker containerizer, and running a simple example got into a very strange state. I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux: Linux 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Docker version 1.2.0, build fa7b24f/1.2.0 I start the mesos slave with these relevant options: --cgroups_hierarchy=/cgroup --containerizers=docker,mesos --executor_registration_timeout=5mins --isolation=cgroups/cpu,cgroups/mem I launched a very simple app, which is from the mesosphere examples: { "container": { "type": "DOCKER", "docker": { "image": "libmesos/ubuntu" } }, "id": "ubuntu-docker2", "instances": "1", "cpus": "0.5", "mem": "512", "uris": [], "cmd": "while sleep 10; do date -u +%T; done" } The app launches, but then mesos states the task is KILLED, yet the docker container is STILL running. Here's the sequence of logs from that mesos-slave. 1) Task gets created and assigned: I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 20141017-172055-3489660938-5050-1603- I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 20141017-172055-3489660938-5050-1603- I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework '20141017-172055-3489660938-5050-1603- I1022 17:44:13.989893 15195 docker.cpp:743] Starting container 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework '20141017-172055-3489660938-5050-1603-' So far so good. The log statements right next to "Starting container" is: I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 20141017-172055-3489660938-5050-1603- I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 20141017-172055-3489660938-5050-1603- from @0.0.0.0:0 W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework 20141017-172055-3489660938-5050-1603- because it has no tasks E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for terminal task, destroying container: No container found After this, there's several log messages like this: I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 20141017-172055-3489660938-5050-1603- I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050 W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 20141017-172055-3489660938-5050-1603- I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 20141017-172055-3489660938-5050-1603- to master@10.0.0.147:5050 Eventually the TASK_KILLED update is acked and the Mesos UI shows the task as killed. By then, the process should be dead, but its not. $ sudo docker ps CONTAINER IDIMAGECOMMANDCREATED STATUS PORTS NAMES f76784e1af8blibmesos/ubuntu:latest "/bin/sh -c 'while s 5 hours ago Up 5 hours mesos-c1fc27c8-13e9-484f-a30c-cb062ec4c978 The container shows in the UI like this: ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799KILLED5 hours ago5 hours ago And its been running the whole time. There's no other logging indicating why killTask was invoked, which makes this extremely frustrating to debug. Has anyone seen something similar? Thanks, Eduardo
Re: Do i really need HDFS?
I haven't got as far as deploying a FS yet - still weighing up the options. Our Mesos cluster is just a PaaS at the moment but I think the option to use capacity for adhoc distributed computing alongside the web workloads is a killer feature. We're soon to Dockerize as well so some option that can be reached from containers is pretty important too. Ceph is a strong candidate because of the S3 compatibility, since I know that will be usable from within Docker without any trouble when we need non-DB persistence. That and it's resilience seems a good match to Mesos' own. I need some real world war story type research before I can really say it's a good alternative though. As I'm a Spark newbie I don't want to run before I can walk, so I'll probably start with a HDFS deployment on the test systems to get the feel of it first. On 22 October 2014 17:40, CCAAT wrote: > Ok so, > > I'd be curious to know your final architecture (D. Davies)? > > I was looking to put Ceph on top of the (3) btrfs nodes in case we need a > DFS at some later point. We're not really sure what softwares will be > in our final mix. Certainly installing Ceph does not hurt anything (?); > and I'm not sure we want to use ceph from userspace only. We have had > excellent success using btrfs, so that is firm for us, short of some > gapping problem emerging. Growing the cluster size will happen, once > we establish the basic functionality of the cluster. > > Right now, there is a focus on subsurface fluid simulations for carbon > sequsttration, but also using the cluster for general (cron-chronos) batch > jobs is a secondary appeal to us. So, I guess my question is, knowing that > we want to avoid the hdfs/hadoop setup entirely, will localFS/DFS with > btrfs/ceph be sufficiently robust to test not only mesos+spark but many > other related softwares, such as but not limited to R, scala, sparkR, > database(sql) and many other softwares? We're just trying to avoid some > common mistakes as we move forward with mesos. > > James > > > > > On 10/22/14 02:29, Dick Davies wrote: >> >> Be interested to know what that is, if you don't mind sharing. >> >> We're thinking of deploying a Ceph cluster for another project anyway, >> it seems to remove some of the chokepoints/points of failure HDFS suffers >> from >> but I've no idea how well it can interoperate with the usual HDFS clients >> (Spark in my particular case but I'm trying to keep this general). >> >> On 21 October 2014 13:16, David Greenberg wrote: >>> >>> We use spark without HDFS--in our case, we just use ansible to copy the >>> spark executors onto all hosts at the same path. We also load and store >>> our >>> spark data from non-HDFS sources. >>> >>> On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies >>> wrote: I think Spark needs a way to send jobs to/from the workers - the Spark distro itself will pull down the executor ok, but in my (very basic) tests I got stuck without HDFS. So basically it depends on the framework. I think in Sparks case they assume most users are migrating from an existing Hadoop deployment, so HDFS is sort of assumed. On 20 October 2014 23:18, CCAAT wrote: > > On 10/20/14 11:46, Steven Schlansker wrote: > > >> We are running Mesos entirely without HDFS with no problems. We use >> Docker to distribute our >> application to slave nodes, and keep no state on individual nodes. > > > > > Background: I'm building up a 3 node cluster to run mesos and spark. No > legacy Hadoop needed or wanted. I am using btrfs for the local file > system, > with (2) drives set up for raid1 on each system. > > So you are suggesting that I can install mesos + spark + docker > and not a DFS on these (3) machines? > > > Will I need any other softwares? My application is a geophysical > fluid simulator, so scala, R, and all sorts of advanced math will > be required on the cluster for the Finite Element Methods. > > > James > > >>> >>> >> >
Re: Do i really need HDFS?
Ok so, I'd be curious to know your final architecture (D. Davies)? I was looking to put Ceph on top of the (3) btrfs nodes in case we need a DFS at some later point. We're not really sure what softwares will be in our final mix. Certainly installing Ceph does not hurt anything (?); and I'm not sure we want to use ceph from userspace only. We have had excellent success using btrfs, so that is firm for us, short of some gapping problem emerging. Growing the cluster size will happen, once we establish the basic functionality of the cluster. Right now, there is a focus on subsurface fluid simulations for carbon sequsttration, but also using the cluster for general (cron-chronos) batch jobs is a secondary appeal to us. So, I guess my question is, knowing that we want to avoid the hdfs/hadoop setup entirely, will localFS/DFS with btrfs/ceph be sufficiently robust to test not only mesos+spark but many other related softwares, such as but not limited to R, scala, sparkR, database(sql) and many other softwares? We're just trying to avoid some common mistakes as we move forward with mesos. James On 10/22/14 02:29, Dick Davies wrote: Be interested to know what that is, if you don't mind sharing. We're thinking of deploying a Ceph cluster for another project anyway, it seems to remove some of the chokepoints/points of failure HDFS suffers from but I've no idea how well it can interoperate with the usual HDFS clients (Spark in my particular case but I'm trying to keep this general). On 21 October 2014 13:16, David Greenberg wrote: We use spark without HDFS--in our case, we just use ansible to copy the spark executors onto all hosts at the same path. We also load and store our spark data from non-HDFS sources. On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies wrote: I think Spark needs a way to send jobs to/from the workers - the Spark distro itself will pull down the executor ok, but in my (very basic) tests I got stuck without HDFS. So basically it depends on the framework. I think in Sparks case they assume most users are migrating from an existing Hadoop deployment, so HDFS is sort of assumed. On 20 October 2014 23:18, CCAAT wrote: On 10/20/14 11:46, Steven Schlansker wrote: We are running Mesos entirely without HDFS with no problems. We use Docker to distribute our application to slave nodes, and keep no state on individual nodes. Background: I'm building up a 3 node cluster to run mesos and spark. No legacy Hadoop needed or wanted. I am using btrfs for the local file system, with (2) drives set up for raid1 on each system. So you are suggesting that I can install mesos + spark + docker and not a DFS on these (3) machines? Will I need any other softwares? My application is a geophysical fluid simulator, so scala, R, and all sorts of advanced math will be required on the cluster for the Finite Element Methods. James
Re: Do i really need HDFS?
If it's locally mounted via fuse then there is no issue. Also there are tickets open about volume mounting in the sandbox, that would be the ideal solution. Cheers, Tim - Original Message - > From: "Dick Davies" > To: user@mesos.apache.org > Sent: Wednesday, October 22, 2014 2:29:20 AM > Subject: Re: Do i really need HDFS? > > Be interested to know what that is, if you don't mind sharing. > > We're thinking of deploying a Ceph cluster for another project anyway, > it seems to remove some of the chokepoints/points of failure HDFS suffers > from > but I've no idea how well it can interoperate with the usual HDFS clients > (Spark in my particular case but I'm trying to keep this general). > > On 21 October 2014 13:16, David Greenberg wrote: > > We use spark without HDFS--in our case, we just use ansible to copy the > > spark executors onto all hosts at the same path. We also load and store our > > spark data from non-HDFS sources. > > > > On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies > > wrote: > >> > >> I think Spark needs a way to send jobs to/from the workers - the Spark > >> distro itself > >> will pull down the executor ok, but in my (very basic) tests I got > >> stuck without HDFS. > >> > >> So basically it depends on the framework. I think in Sparks case they > >> assume most > >> users are migrating from an existing Hadoop deployment, so HDFS is > >> sort of assumed. > >> > >> > >> On 20 October 2014 23:18, CCAAT wrote: > >> > On 10/20/14 11:46, Steven Schlansker wrote: > >> > > >> > > >> >> We are running Mesos entirely without HDFS with no problems. We use > >> >> Docker to distribute our > >> >> application to slave nodes, and keep no state on individual nodes. > >> > > >> > > >> > > >> > Background: I'm building up a 3 node cluster to run mesos and spark. No > >> > legacy Hadoop needed or wanted. I am using btrfs for the local file > >> > system, > >> > with (2) drives set up for raid1 on each system. > >> > > >> > So you are suggesting that I can install mesos + spark + docker > >> > and not a DFS on these (3) machines? > >> > > >> > > >> > Will I need any other softwares? My application is a geophysical > >> > fluid simulator, so scala, R, and all sorts of advanced math will > >> > be required on the cluster for the Finite Element Methods. > >> > > >> > > >> > James > >> > > >> > > > > > > -- Cheers, Timothy St. Clair Red Hat Inc.
Re: Do i really need HDFS?
We use lustre and a couple internal data storage services. I wouldn't recommend lustre much; it's got an SPOF which is a problem at scale. I just wanted to point out that you can skip hdfs if you so choose. On Wednesday, October 22, 2014, Dick Davies wrote: > Be interested to know what that is, if you don't mind sharing. > > We're thinking of deploying a Ceph cluster for another project anyway, > it seems to remove some of the chokepoints/points of failure HDFS suffers > from > but I've no idea how well it can interoperate with the usual HDFS clients > (Spark in my particular case but I'm trying to keep this general). > > On 21 October 2014 13:16, David Greenberg > wrote: > > We use spark without HDFS--in our case, we just use ansible to copy the > > spark executors onto all hosts at the same path. We also load and store > our > > spark data from non-HDFS sources. > > > > On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies > wrote: > >> > >> I think Spark needs a way to send jobs to/from the workers - the Spark > >> distro itself > >> will pull down the executor ok, but in my (very basic) tests I got > >> stuck without HDFS. > >> > >> So basically it depends on the framework. I think in Sparks case they > >> assume most > >> users are migrating from an existing Hadoop deployment, so HDFS is > >> sort of assumed. > >> > >> > >> On 20 October 2014 23:18, CCAAT > > wrote: > >> > On 10/20/14 11:46, Steven Schlansker wrote: > >> > > >> > > >> >> We are running Mesos entirely without HDFS with no problems. We use > >> >> Docker to distribute our > >> >> application to slave nodes, and keep no state on individual nodes. > >> > > >> > > >> > > >> > Background: I'm building up a 3 node cluster to run mesos and spark. > No > >> > legacy Hadoop needed or wanted. I am using btrfs for the local file > >> > system, > >> > with (2) drives set up for raid1 on each system. > >> > > >> > So you are suggesting that I can install mesos + spark + docker > >> > and not a DFS on these (3) machines? > >> > > >> > > >> > Will I need any other softwares? My application is a geophysical > >> > fluid simulator, so scala, R, and all sorts of advanced math will > >> > be required on the cluster for the Finite Element Methods. > >> > > >> > > >> > James > >> > > >> > > > > > >
Re: Do i really need HDFS?
Be interested to know what that is, if you don't mind sharing. We're thinking of deploying a Ceph cluster for another project anyway, it seems to remove some of the chokepoints/points of failure HDFS suffers from but I've no idea how well it can interoperate with the usual HDFS clients (Spark in my particular case but I'm trying to keep this general). On 21 October 2014 13:16, David Greenberg wrote: > We use spark without HDFS--in our case, we just use ansible to copy the > spark executors onto all hosts at the same path. We also load and store our > spark data from non-HDFS sources. > > On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies wrote: >> >> I think Spark needs a way to send jobs to/from the workers - the Spark >> distro itself >> will pull down the executor ok, but in my (very basic) tests I got >> stuck without HDFS. >> >> So basically it depends on the framework. I think in Sparks case they >> assume most >> users are migrating from an existing Hadoop deployment, so HDFS is >> sort of assumed. >> >> >> On 20 October 2014 23:18, CCAAT wrote: >> > On 10/20/14 11:46, Steven Schlansker wrote: >> > >> > >> >> We are running Mesos entirely without HDFS with no problems. We use >> >> Docker to distribute our >> >> application to slave nodes, and keep no state on individual nodes. >> > >> > >> > >> > Background: I'm building up a 3 node cluster to run mesos and spark. No >> > legacy Hadoop needed or wanted. I am using btrfs for the local file >> > system, >> > with (2) drives set up for raid1 on each system. >> > >> > So you are suggesting that I can install mesos + spark + docker >> > and not a DFS on these (3) machines? >> > >> > >> > Will I need any other softwares? My application is a geophysical >> > fluid simulator, so scala, R, and all sorts of advanced math will >> > be required on the cluster for the Finite Element Methods. >> > >> > >> > James >> > >> > > >