Re: Alternate HDFS Filesystems + Hadoop on Mesos
Okay, I guess MapRFS is protocol compatible with HDFS, but not uri-compatible. I know the MapR guys have gotten MapR on Mesos working. They may have more answers for you on how they accomplished this. why hard code the file prefixes? We allow any uri, so we need to have handlers coded for each type of protocol group, which so far includes hdfs/hftp/s3/s3n which use hdfs::copyToLocal, or http/https/ftp/ftps which use net::download, or file:// or an absolute/relative path for files pre-populated on the machine (uses 'cp'). MapRFS (and Tachyon) would probably fit into the hdfs::copyToLocal group so easily that it would be a one-line fix each. I really think the hdfs vs other prefixes should be looked at I agree. Could you file a JIRA with your request? It should be an easy enough change for us to pick up. I would also like to see Tachyon as a possible filesystem for the fetcher. On Fri, Aug 15, 2014 at 5:16 PM, John Omernik j...@omernik.com wrote: I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples below) I really think the hdfs vs other prefixes should be looked at. Like I said above, the tachyon project just added a env variable to address this. hdfs://cldbnode:7222/ WARNING: Logging before InitGoogleLogging() is written to STDERR I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz' E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz' WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -copyToLocal: Wrong FS: maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] src ... localdst Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz Failed to synchronize with slave (it's probably exited) hdfs:/// I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz' E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz' WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] src ... localdst Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz Failed to synchronize with slave (it's probably exited) On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote: I am away from my cluster right now, I trued doing a hadoop fs -ls maprfs:// and that worked. When I tries hadoop fs -ls hdfs:/// it failed with wrong fs type. With that error I didn't try it in the mapred-site. I will try it. Still...why hard code the file prefixes? I guess I am curious on how glusterfs would work, or others as they pop up. On Aug 15, 2014 5:04 PM, Adam Bordelon a...@mesosphere.io wrote: Can't you just use the hdfs:// protocol for maprfs? That should work just fine. On Fri, Aug 15, 2014 at 2:50 PM, John Omernik j...@omernik.com wrote: Thanks all. I realized MapR has a work around for me that I will try soon in that I have MapR fs NFS mounted on each node, I.e. I should be able to get the tar from there. That said, perhaps someone with better coding skills than me could provide an env variable where a user could provide the HDFS prefixes to try. I know we did
Re: MesosCon attendee introduction thread
Hello friends, I'm Adam from Mesosphere (adam-mesos), also an Apache Mesos committer, and lately I've been working on a Kubernetes-Mesos framework with Niklas and Connor. I'm excited to meet the rest of the community and discuss how we can make the Mesos ecosystem even more awesome and get the whole world using Mesos. Along with Connor, I will be leading the Mesos Frameworks SDK workshop and providing tips on building and running frameworks on top of Mesos. I'll be sticking around for the hackathon and staying in Chicago through the weekend, so don't hesitate to invite me out to drinks and a game of pool. There's gotta be a good dive bar around Chicago somewhere. :) -Adam- mesosphere.io On Fri, Aug 15, 2014 at 11:14 PM, mohit soni mohitsoni1...@gmail.com wrote: I'm Mohit Soni. I work for eBay. I have been hacking things around Mesos for a while now. I am excited to talk about, running YARN alongside Mesos, alongwith Renan DelValle. Looking forward to meet everyone at MesosCon! -Mohit @mohitsoni On Thu, Aug 14, 2014 at 9:06 AM, Dave Lester daveles...@gmail.com wrote: Hi All, I thought it would be nice to kickoff a thread for folks to introduce themselves in advance of #MesosCon http://events.linuxfoundation.org/events/mesoscon, so here goes: My name is Dave Lester, and I am Open Source Advocate at Twitter. Twitter is an organizing sponsor for #MesosCon, and I've worked closely with Chris Aniszczyk, the Linux Foundation, and a great team of volunteers to hopefully make this an awesome community event. I'm interested in meeting more companies using Mesos that we can add to our #PoweredByMesos list http://mesos.apache.org/documentation/latest/powered-by-mesos/, and chatting with folks about Apache Aurora http://aurora.incubator.apache.org. Right now my Thursday and Friday evenings are free, so let's grab a beer and chat more. I'm also on Twitter: @davelester Next!
Mesos + storm on top of Docker
Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this mailing list, I have run all frameworks directly on the Mesos master (is this correct? is it guaranteed to have only one master at any given time?) Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems like supervisors are not able to communicate with nimbus. I can deploy topologies, but the executors fail. Here's the project on github: https://github.com/yaronr/docker-mesos I've spent over a week on this and I'm hitting a wall. Thanks! (Y)
Re: Alternate HDFS Filesystems + Hadoop on Mesos
Adam - I am new to using Jira properly. (I couldn't find the JIRA for the Tachyon change as an example, so I linked to the code... is that ok?) I created https://issues.apache.org/jira/browse/MESOS-1711 If you wouldn't mind taking a quick look to make sure I filled things out correctly to get addressed I'd appreciate it. If you want to hit me up off list with any recommendations on what I did to make it better in the future, I'd appreciate it as well. Thanks! John On Mon, Aug 18, 2014 at 4:43 AM, Adam Bordelon a...@mesosphere.io wrote: Okay, I guess MapRFS is protocol compatible with HDFS, but not uri-compatible. I know the MapR guys have gotten MapR on Mesos working. They may have more answers for you on how they accomplished this. why hard code the file prefixes? We allow any uri, so we need to have handlers coded for each type of protocol group, which so far includes hdfs/hftp/s3/s3n which use hdfs::copyToLocal, or http/https/ftp/ftps which use net::download, or file:// or an absolute/relative path for files pre-populated on the machine (uses 'cp'). MapRFS (and Tachyon) would probably fit into the hdfs::copyToLocal group so easily that it would be a one-line fix each. I really think the hdfs vs other prefixes should be looked at I agree. Could you file a JIRA with your request? It should be an easy enough change for us to pick up. I would also like to see Tachyon as a possible filesystem for the fetcher. On Fri, Aug 15, 2014 at 5:16 PM, John Omernik j...@omernik.com wrote: I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples below) I really think the hdfs vs other prefixes should be looked at. Like I said above, the tachyon project just added a env variable to address this. hdfs://cldbnode:7222/ WARNING: Logging before InitGoogleLogging() is written to STDERR I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz' E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz' WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -copyToLocal: Wrong FS: maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] src ... localdst Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz Failed to synchronize with slave (it's probably exited) hdfs:/// I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz' E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz' WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] src ... localdst Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz Failed to synchronize with slave (it's probably exited) On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote: I am away from my cluster right now, I trued doing a hadoop fs -ls maprfs:// and that worked. When I tries hadoop fs -ls hdfs:/// it failed with wrong fs type. With that error I didn't try it in the mapred-site. I will try it. Still...why hard code the file prefixes? I guess I am curious on how glusterfs would work, or others as they
Re: Mesos + storm on top of Docker
Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this mailing list, I have run all frameworks directly on the Mesos master (is this correct? is it guaranteed to have only one master at any given time?) Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems like supervisors are not able to communicate with nimbus. I can deploy topologies, but the executors fail. Here's the project on github: https://github.com/yaronr/docker-mesos I've spent over a week on this and I'm hitting a wall. Thanks! (Y)
Re: MesosCon attendee introduction thread
Hi All, My name is Nic Grayson (@nicgrayson). I'm an infrastructure engineer at Banno (banno.com). Zach Cox from Banno will also be also be attending. We are in the process migrating hosting of our web applications and api to docker on mesos with marathon. We are really looking forward to seeing how this stack is used elsewhere and make sure we get it setup correctly the first time. Managing the incoming access to the cluster is our current focus. Nic On Thu, Aug 14, 2014 at 6:05 PM, Dave Lester daveles...@gmail.com wrote: Hi All, I thought it would be nice to kickoff a thread for folks to introduce themselves in advance of #MesosCon http://events.linuxfoundation.org/events/mesoscon, so here goes: My name is Dave Lester, and I am Open Source Advocate at Twitter. Twitter is an organizing sponsor for #MesosCon, and I've worked closely with Chris Aniszczyk, the Linux Foundation, and a great team of volunteers to hopefully make this an awesome community event. I'm interested in meeting more companies using Mesos that we can add to our #PoweredByMesos list http://mesos.apache.org/documentation/latest/powered-by-mesos/, and chatting with folks about Apache Aurora http://aurora.incubator.apache.org. Right now my Thursday and Friday evenings are free, so let's grab a beer and chat more. I'm also on Twitter: @davelester Next!
Re: Mesos + storm on top of Docker
Hi @vinodkone nimbus log: 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive for all the executors. On the mesos slave, there are no storm related logs. Which leads me to believe that there's no supervisor to be found, even-though there's obviously an executor that's assigned to the job. My understanding is that Mesos is responsible for spawning the supervisors (although that's not explicitly stated anywhere). The documentation is not very clear. But if I run the supervisors, then Mesos can't do the resource allocation as it's supposed to. (Y) On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote: Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this mailing list, I have run all frameworks directly on the Mesos master (is this correct? is it guaranteed to have only one master at any given time?) Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems like supervisors are not able to communicate with nimbus. I can deploy topologies, but the executors fail. Here's the project on github: https://github.com/yaronr/docker-mesos I've spent over a week on this and I'm hitting a wall. Thanks! (Y)
Re: Mesos + storm on top of Docker
@vinodkone Finally found some relevant logs.. Let's start with the slave: slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework '20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537] Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor 'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework '20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.70804413 launcher.cpp:117] Forked child with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' slave_1 | I0818 16:18:51.71742711 mesos_containerizer.cpp:647] Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using command '/usr/local/libexec/mesos/mesos-fetcher' slave_1 | I0818 16:19:01.10964414 slave.cpp:2873] Current usage 37.40%. Max allowed age: 3.681899907883981days slave_1 | I0818 16:19:09.76684512 slave.cpp:2355] Monitoring executor 'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' in container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' slave_1 | I0818 16:19:10.76505814 mesos_containerizer.cpp:1112] Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited slave_1 | I0818 16:19:10.76538814 mesos_containerizer.cpp:996] Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' So the executor gets started, and then exists. Found the stderr of the framework/run I0818 16:23:53.42701650 fetcher.cpp:61] Extracted resource '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz' into '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0' --2014-08-18 16:23:54-- http://7df8d3d507a1:41765/conf/storm.yaml Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known. wget: unable to resolve host address '7df8d3d507a1' So the problem is with host resolution. It's trying to resolve 7df8d3d507a1 and fails. Obviously this node is not in the /etc/hosts. Why would it be able to resolve it? (Y) On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote: Hi @vinodkone nimbus log: 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive for all the executors. On the mesos slave, there are no storm related logs. Which leads me to believe that there's no supervisor to be found, even-though there's obviously an executor that's assigned to the job. My understanding is that Mesos is responsible for spawning the supervisors (although that's not explicitly stated anywhere). The documentation is not very clear. But if I run the supervisors, then Mesos can't do the resource allocation as it's supposed to. (Y) On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote: Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this mailing list, I have run all frameworks directly on the Mesos master (is this correct? is it guaranteed to have only one master at any given time?) Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems like supervisors are not able to communicate with nimbus. I can deploy topologies, but the executors fail. Here's the project on github: https://github.com/yaronr/docker-mesos I've spent over a week on this and I'm hitting a wall. Thanks! (Y)
Re: Mesos + storm on top of Docker
Is the hostname set correctly on the machine running nimbus? It looks like that may not be correct. On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote: @vinodkone Finally found some relevant logs.. Let's start with the slave: slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework '20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537] Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor 'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework '20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.70804413 launcher.cpp:117] Forked child with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' slave_1 | I0818 16:18:51.71742711 mesos_containerizer.cpp:647] Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using command '/usr/local/libexec/mesos/mesos-fetcher' slave_1 | I0818 16:19:01.10964414 slave.cpp:2873] Current usage 37.40%. Max allowed age: 3.681899907883981days slave_1 | I0818 16:19:09.76684512 slave.cpp:2355] Monitoring executor 'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' in container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' slave_1 | I0818 16:19:10.76505814 mesos_containerizer.cpp:1112] Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited slave_1 | I0818 16:19:10.76538814 mesos_containerizer.cpp:996] Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' So the executor gets started, and then exists. Found the stderr of the framework/run I0818 16:23:53.42701650 fetcher.cpp:61] Extracted resource '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz' into '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0' --2014-08-18 16:23:54-- http://7df8d3d507a1:41765/conf/storm.yaml Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known. wget: unable to resolve host address '7df8d3d507a1' So the problem is with host resolution. It's trying to resolve 7df8d3d507a1 and fails. Obviously this node is not in the /etc/hosts. Why would it be able to resolve it? (Y) On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote: Hi @vinodkone nimbus log: 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive for all the executors. On the mesos slave, there are no storm related logs. Which leads me to believe that there's no supervisor to be found, even-though there's obviously an executor that's assigned to the job. My understanding is that Mesos is responsible for spawning the supervisors (although that's not explicitly stated anywhere). The documentation is not very clear. But if I run the supervisors, then Mesos can't do the resource allocation as it's supposed to. (Y) On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote: Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this mailing list, I have run all frameworks directly on the Mesos master (is this correct? is it guaranteed to have only one master at any given time?) Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems like supervisors are not able to communicate with nimbus. I can deploy topologies, but the executors fail. Here's the project on github: https://github.com/yaronr/docker-mesos I've spent over a week on this and I'm hitting a wall. Thanks! (Y)
Re: Mesos + storm on top of Docker
Including --hostname=host in your docker run command should help with the resolution problem (so long as host is resolvable) On Mon, Aug 18, 2014 at 9:42 AM, Brenden Matthews brenden.matth...@airbedandbreakfast.com wrote: Is the hostname set correctly on the machine running nimbus? It looks like that may not be correct. On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote: @vinodkone Finally found some relevant logs.. Let's start with the slave: slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework '20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537] Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor 'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework '20140818-161802-2214597036-5050-10-0002 slave_1 | I0818 16:18:51.70804413 launcher.cpp:117] Forked child with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' slave_1 | I0818 16:18:51.71742711 mesos_containerizer.cpp:647] Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using command '/usr/local/libexec/mesos/mesos-fetcher' slave_1 | I0818 16:19:01.10964414 slave.cpp:2873] Current usage 37.40%. Max allowed age: 3.681899907883981days slave_1 | I0818 16:19:09.76684512 slave.cpp:2355] Monitoring executor 'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' in container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' slave_1 | I0818 16:19:10.76505814 mesos_containerizer.cpp:1112] Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited slave_1 | I0818 16:19:10.76538814 mesos_containerizer.cpp:996] Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' So the executor gets started, and then exists. Found the stderr of the framework/run I0818 16:23:53.42701650 fetcher.cpp:61] Extracted resource '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz' into '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0' --2014-08-18 16:23:54-- http://7df8d3d507a1:41765/conf/storm.yaml Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known. wget: unable to resolve host address '7df8d3d507a1' So the problem is with host resolution. It's trying to resolve 7df8d3d507a1 and fails. Obviously this node is not in the /etc/hosts. Why would it be able to resolve it? (Y) On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote: Hi @vinodkone nimbus log: 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] not alive for all the executors. On the mesos slave, there are no storm related logs. Which leads me to believe that there's no supervisor to be found, even-though there's obviously an executor that's assigned to the job. My understanding is that Mesos is responsible for spawning the supervisors (although that's not explicitly stated anywhere). The documentation is not very clear. But if I run the supervisors, then Mesos can't do the resource allocation as it's supposed to. (Y) On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote: Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this mailing list, I have run all frameworks directly on the Mesos master (is this correct? is it guaranteed to have only one master at any given time?) Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems like supervisors are not able to communicate with nimbus. I can deploy topologies, but the executors fail. Here's the project on github: https://github.com/yaronr/docker
Re: mesos scheduling
Mesos also provides the ability to reserve resources, if you need guarantees about the resources available to a particular framework. For now, resources can be reserved at the per-slave level and they will *only* be offered to the role that has them reserved. On Mon, Aug 18, 2014 at 2:13 AM, Adam Bordelon a...@mesosphere.io wrote: That's correct (for now). We're looking into features that would support preemption of running tasks, but currently a user/admin would have to manually kill long-running tasks to scale down an over-provisioned framework. Marathon also has a nice API (web or REST) for scaling down the number of instances of a long-running service. On Mon, Aug 18, 2014 at 1:43 AM, Jun Feng Liu liuj...@cn.ibm.com wrote: Thanks, Adam.. Sounds like it is going to be pretty effective when all the framework running a short tasks, then mesos can balance the resource allocation based on the DRF among the framework quickly. If one framework is happening to run some long tasks and take too many resources, mesos have to wait until some resource being free up to assign to other framework. Is it correct? Best Regards *Jun Feng Liu* IBM China Systems Technology Laboratory in Beijing -- [image: 2D barcode - encoded with contact information] *Phone: *86-10-82452683 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com [image: IBM] BLD 28,ZGC Software Park No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 China *Adam Bordelon a...@mesosphere.io a...@mesosphere.io* 2014/08/18 16:26 Please respond to user@mesos.apache.org To user@mesos.apache.org user@mesos.apache.org, cc Jay Buffington jaybuffing...@gmail.com Subject Re: mesos scheduling Mesos uses a fair-sharing algorithm[1] to ensure that each framework registered with Mesos is ensured its fair share of resources. If you want more control over the groupings and weights of different frameworks, check out the roles and weights parameters: mesos-master --roles=services,batch and --weights=services=2,batch=1 as described at *http://mesosphere.io/docs/mesos/deep-dive/mesos-master/* http://mesosphere.io/docs/mesos/deep-dive/mesos-master/ Mesos uses these algorithms and parameters to decide which framework gets the next offer, so it won't affect already running tasks if one framework is already hogging the cluster when you start a new framework. But if you start killing tasks from the over-provisioned framework, those resources will be offered to the new framework(s) until it reaches its fair share. [1] *http://static.usenix.org/event/nsdi11/tech/full_papers/Ghodsi.pdf* http://static.usenix.org/event/nsdi11/tech/full_papers/Ghodsi.pdf On Sun, Aug 17, 2014 at 7:06 PM, Jun Feng Liu *liuj...@cn.ibm.com* liuj...@cn.ibm.com wrote: Thanks Jay.. Dose it mean if one of scheduler/frame need a lot resource and keep ask for more resources from mesos, then it will cause other framework/scheduler hard to get resources? Any way I can configure the mesos to setup a resource consuming boundary for each framework? Best Regards * Jun Feng Liu* IBM China Systems Technology Laboratory in Beijing -- *Phone: *86-10-82452683 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com [image: IBM] BLD 28,ZGC Software Park No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 China *Jay Buffington **m...@jaybuff.com* m...@jaybuff.com** Sent by: *jaybuffing...@gmail.com* jaybuffing...@gmail.com 2014/08/18 02:44 Please respond to *user@mesos.apache.org* user@mesos.apache.org To *user@mesos.apache.org* user@mesos.apache.org, cc Subject Re: mesos scheduling On Sun, Aug 17, 2014 at 6:13 AM, Jun Feng Liu *liuj...@cn.ibm.com* liuj...@cn.ibm.com wrote: I am trying to better understand how mesos allocator works. In the offer resource model, will mesos send the same offer to multiple framework? Or it just send all resource to one framework then wait the response from the the framework then try the next one? Mesos sends an offer to one scheduler (a scheduler is part of a framework) at a time. That scheduler will have the offer until it uses it, gives it back or mesos rescinds it. This strategy was referred to as pessimistic by Google's Omega paper [1] and has drawbacks. In order to address these points a new type of offer, an Optimistic Offer, is being considered. See *https://issues.apache.org/jira/browse/MESOS-1607* https://issues.apache.org/jira/browse/MESOS-1607 Jay [1] *http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf* http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf
Re: [VOTE] Release Apache Mesos 0.20.0 (rc1)
OK, I can confirm it is a bug due to the new docker stuff. Partially my bad not test it on mac. I need to have rc2 with the bug fix. I'll submit a bug fix shortly. - Jie On Mon, Aug 18, 2014 at 1:53 PM, Vinod Kone vinodk...@gmail.com wrote: make check succeed on Centos 5.5 but failed on Python framework on OSX Mavericks. environment details: ➜ mesos-0.20.0 gcc --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) Target: x86_64-apple-darwin13.3.0 Thread model: posix ➜ mesos-0.20.0 g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) Target: x86_64-apple-darwin13.3.0 Thread model: posix ➜ mesos-0.20.0 python --version Python 2.7.7 [ RUN ] ExamplesTest.PythonFramework Using temporary directory '/tmp/ExamplesTest_PythonFramework_nX85Jw' Traceback (most recent call last): File /tmp/mesos-0.20.0/src/examples/python/test_framework.py, line 25, in module import mesos.native File build/bdist.macosx-10.9-x86_64/egg/mesos/native/__init__.py, line 17, in module File build/bdist.macosx-10.9-x86_64/egg/mesos/native/_mesos.py, line 7, in module File build/bdist.macosx-10.9-x86_64/egg/mesos/native/_mesos.py, line 6, in __bootstrap__ ImportError: dlopen(/Users/vinod/.python-eggs/mesos.native-0.20.0-py2.7-macosx-10.9-x86_64.egg-tmp/mesos/native/_mesos.so, 2): Symbol not found: __ZN7cgroups9hierarchyERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIc Referenced from: /Users/vinod/.python-eggs/mesos.native-0.20.0-py2.7-macosx-10.9-x86_64.egg-tmp/mesos/native/_mesos.so Expected in: flat namespace in /Users/vinod/.python-eggs/mesos.native-0.20.0-py2.7-macosx-10.9-x86_64.egg-tmp/mesos/native/_mesos.so tests/script.cpp:83: Failure Failed python_framework_test.sh exited with status 1 [ FAILED ] ExamplesTest.PythonFramework (674 ms) On Sun, Aug 17, 2014 at 12:05 AM, Jie Yu yujie@gmail.com wrote: Hi all, Please vote on releasing the following candidate as Apache Mesos 0.20.0. 0.20.0 includes the following: This release includes a lot of new cool features. The major new features are listed below: * Docker support in Mesos. * Users now can launch executors/tasks within Docker containers. * Mesos now supports running multiple containerizers simultaneously. The slave can dynamically choose a containerizer to launch containers based on the configuration of executors/tasks. * Container level network monitoring for mesos containerizer. * Network statistics for each active container can be retrieved through the /monitor/statistics.json endpoint on the slave. * Completely transparent to the tasks running on the slave. No need to change the service discovery mechanism for tasks. * Framework authorization. * Allows frameworks to (re-)register with authorized roles. * Allows frameworks to launch tasks/executors as authorized users. * Allows authorized principals to shutdown framework(s) through HTTP endpoint. * Framework rate limiting. * In a multi-framework environment, this feature aims to protect the throughput of high-SLA (e.g., production, service) frameworks by having the master throttle messages from other (e.g., development, batch) frameworks. * Enable building against installed third-party dependencies. This release also includes several bug fixes and stability improvements. The candidate for Mesos 0.20.0 release is available at: https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc1/mesos-0.20.0.tar.gz The tag to be voted on is 0.20.0-rc1: https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.20.0-rc1 The MD5 checksum of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc1/mesos-0.20.0.tar.gz.md5 The signature of the tarball can be found at: https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc1/mesos-0.20.0.tar.gz.asc The PGP key used to sign the release is here: https://dist.apache.org/repos/dist/release/mesos/KEYS The JAR is up in Maven in a staging repository here: https://repository.apache.org/content/repositories/orgapachemesos-1028 Please vote on releasing this package as Apache Mesos 0.20.0! The vote is open until Wed Aug 20 00:03:55 PDT 2014 and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Mesos 0.20.0 [ ] -1 Do not release this package because ... Thanks, - Jie
Re: Struggling with task controller Permissions on Hadoop Mesos
On Sat, Aug 16, 2014 at 4:26 AM, John Omernik j...@omernik.com wrote: I've confirmed on the package I am using that when I untar it using tar zxf as root, that the task-controller does NOT lose the setuid bit. But on the lost tasks in Mesos I get the error below. What's interesting is that if drill down to the directory, the owner is root:root, but just the setuid bit is missing. What user is the slave running as? root?