Re: Alternate HDFS Filesystems + Hadoop on Mesos

2014-08-18 Thread Adam Bordelon
Okay, I guess MapRFS is protocol compatible with HDFS, but not
uri-compatible. I know the MapR guys have gotten MapR on Mesos working.
They may have more answers for you on how they accomplished this.

 why hard code the file prefixes?
We allow any uri, so we need to have handlers coded for each type of
protocol group, which so far includes hdfs/hftp/s3/s3n which use
hdfs::copyToLocal, or http/https/ftp/ftps which use net::download, or
file:// or an absolute/relative path for files pre-populated on the machine
(uses 'cp'). MapRFS (and Tachyon) would probably fit into the
hdfs::copyToLocal group so easily that it would be a one-line fix each.

 I really think the hdfs vs other prefixes should be looked at
I agree. Could you file a JIRA with your request? It should be an easy
enough change for us to pick up. I would also like to see Tachyon as a
possible filesystem for the fetcher.


On Fri, Aug 15, 2014 at 5:16 PM, John Omernik j...@omernik.com wrote:

 I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples below)
 I really think the hdfs vs other prefixes should be looked at. Like I said
 above, the tachyon project just added a env variable to address this.



 hdfs://cldbnode:7222/

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 -copyToLocal: Wrong FS: 
 maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: 
 hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)




 hdfs:///




 I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: 
 hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)



 On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote:

 I am away from my cluster right now, I trued doing a hadoop fs -ls
 maprfs:// and that worked.   When I tries hadoop fs -ls hdfs:/// it failed
 with wrong fs type.  With that error I didn't try it in the mapred-site.  I
 will try it.  Still...why hard code the file prefixes? I guess I am curious
 on how glusterfs would work, or others as they pop up.
  On Aug 15, 2014 5:04 PM, Adam Bordelon a...@mesosphere.io wrote:

 Can't you just use the hdfs:// protocol for maprfs? That should work
 just fine.


 On Fri, Aug 15, 2014 at 2:50 PM, John Omernik j...@omernik.com wrote:

 Thanks all.

 I realized MapR has a work around for me that I will try soon in that I
 have MapR fs NFS mounted on each node, I.e. I should be able to get the tar
 from there.

 That said, perhaps someone with better coding skills than me could
 provide an env variable where a user could provide the HDFS prefixes to
 try. I know we did 

Re: MesosCon attendee introduction thread

2014-08-18 Thread Adam Bordelon
Hello friends,

I'm Adam from Mesosphere (adam-mesos), also an Apache Mesos committer, and
lately I've been working on a Kubernetes-Mesos framework with Niklas and
Connor. I'm excited to meet the rest of the community and discuss how we
can make the Mesos ecosystem even more awesome and get the whole world
using Mesos. Along with Connor, I will be leading the Mesos Frameworks SDK
workshop and providing tips on building and running frameworks on top of
Mesos. I'll be sticking around for the hackathon and staying in Chicago
through the weekend, so don't hesitate to invite me out to drinks and a
game of pool. There's gotta be a good dive bar around Chicago somewhere. :)

-Adam-
mesosphere.io


On Fri, Aug 15, 2014 at 11:14 PM, mohit soni mohitsoni1...@gmail.com
wrote:

 I'm Mohit Soni. I work for eBay. I have been hacking things around
 Mesos for a while now.

 I am excited to talk about, running YARN alongside Mesos, alongwith
 Renan DelValle.

 Looking forward to meet everyone at MesosCon!

 -Mohit
 @mohitsoni

 On Thu, Aug 14, 2014 at 9:06 AM, Dave Lester daveles...@gmail.com wrote:
 
  Hi All,
 
  I thought it would be nice to kickoff a thread for folks to introduce
  themselves in advance of #MesosCon
  http://events.linuxfoundation.org/events/mesoscon, so here goes:
 
  My name is Dave Lester, and I am Open Source Advocate at Twitter. Twitter
  is an organizing sponsor for #MesosCon, and I've worked closely with
 Chris
  Aniszczyk, the Linux Foundation, and a great team of volunteers to
  hopefully make this an awesome community event.
 
  I'm interested in meeting more companies using Mesos that we can add
  to our #PoweredByMesos
  list http://mesos.apache.org/documentation/latest/powered-by-mesos/,
 and
  chatting with folks about Apache Aurora 
 http://aurora.incubator.apache.org.
  Right now my Thursday and Friday evenings are free, so let's grab a beer
  and chat more.
 
  I'm also on Twitter: @davelester
 
  Next!
 
 



Mesos + storm on top of Docker

2014-08-18 Thread Yaron Rosenbaum
Hi

I have created a Docker based Mesos setup, including chronos, marathon, and 
storm.
Following advice I saw previously on this mailing list, I have run all 
frameworks directly on the Mesos master (is this correct? is it guaranteed to 
have only one master at any given time?)

Chronos and marathon work perfectly, but storm doesn't. UI works, but it seems 
like supervisors are not able to communicate with nimbus. I can deploy 
topologies, but the executors fail.

Here's the project on github:
https://github.com/yaronr/docker-mesos

I've spent over a week on this and I'm hitting a wall.


Thanks!

(Y)



Re: Alternate HDFS Filesystems + Hadoop on Mesos

2014-08-18 Thread John Omernik
Adam - I am new to using Jira properly. (I couldn't find the JIRA for the
Tachyon change as an example, so I linked to the code... is that ok?)

I created

https://issues.apache.org/jira/browse/MESOS-1711

If you wouldn't mind taking a quick look to make sure I filled things out
correctly to get addressed I'd appreciate it. If you want to hit me up off
list with any recommendations on what I did to make it better in the
future, I'd appreciate it as well.

Thanks!

John



On Mon, Aug 18, 2014 at 4:43 AM, Adam Bordelon a...@mesosphere.io wrote:

 Okay, I guess MapRFS is protocol compatible with HDFS, but not
 uri-compatible. I know the MapR guys have gotten MapR on Mesos working.
 They may have more answers for you on how they accomplished this.

  why hard code the file prefixes?
 We allow any uri, so we need to have handlers coded for each type of
 protocol group, which so far includes hdfs/hftp/s3/s3n which use
 hdfs::copyToLocal, or http/https/ftp/ftps which use net::download, or
 file:// or an absolute/relative path for files pre-populated on the machine
 (uses 'cp'). MapRFS (and Tachyon) would probably fit into the
 hdfs::copyToLocal group so easily that it would be a one-line fix each.

  I really think the hdfs vs other prefixes should be looked at
 I agree. Could you file a JIRA with your request? It should be an easy
 enough change for us to pick up. I would also like to see Tachyon as a
 possible filesystem for the fetcher.


 On Fri, Aug 15, 2014 at 5:16 PM, John Omernik j...@omernik.com wrote:

 I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples
 below) I really think the hdfs vs other prefixes should be looked at. Like
 I said above, the tachyon project just added a env variable to address
 this.



 hdfs://cldbnode:7222/

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
 files.
 -copyToLocal: Wrong FS: 
 maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: 
 hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)




 hdfs:///





 I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
 files.
 -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, 
 expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)



 On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote:

 I am away from my cluster right now, I trued doing a hadoop fs -ls
 maprfs:// and that worked.   When I tries hadoop fs -ls hdfs:/// it failed
 with wrong fs type.  With that error I didn't try it in the mapred-site.  I
 will try it.  Still...why hard code the file prefixes? I guess I am curious
 on how glusterfs would work, or others as they 

Re: Mesos + storm on top of Docker

2014-08-18 Thread Vinod Kone
Can you paste the slave/executor log related to the executor failure?

@vinodkone

 On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote:
 
 Hi
 
 I have created a Docker based Mesos setup, including chronos, marathon, and 
 storm.
 Following advice I saw previously on this mailing list, I have run all 
 frameworks directly on the Mesos master (is this correct? is it guaranteed to 
 have only one master at any given time?)
 
 Chronos and marathon work perfectly, but storm doesn't. UI works, but it 
 seems like supervisors are not able to communicate with nimbus. I can deploy 
 topologies, but the executors fail.
 
 Here's the project on github:
 https://github.com/yaronr/docker-mesos
 
 I've spent over a week on this and I'm hitting a wall.
 
 
 Thanks!
 
 (Y)
 


Re: MesosCon attendee introduction thread

2014-08-18 Thread Nic Grayson
Hi All,

My name is Nic Grayson (@nicgrayson). I'm an infrastructure engineer at
Banno (banno.com). Zach Cox from Banno will also be also be attending.

We are in the process migrating hosting of our web applications and api to
docker on mesos with marathon. We are really looking forward to seeing how
this stack is used elsewhere and make sure we get it setup correctly the
first time. Managing the incoming access to the cluster is our current
focus.

Nic


On Thu, Aug 14, 2014 at 6:05 PM, Dave Lester daveles...@gmail.com wrote:

 Hi All,

 I thought it would be nice to kickoff a thread for folks to introduce
 themselves in advance of #MesosCon
 http://events.linuxfoundation.org/events/mesoscon, so here goes:

 My name is Dave Lester, and I am Open Source Advocate at Twitter. Twitter
 is an organizing sponsor for #MesosCon, and I've worked closely with Chris
 Aniszczyk, the Linux Foundation, and a great team of volunteers to
 hopefully make this an awesome community event.

 I'm interested in meeting more companies using Mesos that we can add to
 our #PoweredByMesos list
 http://mesos.apache.org/documentation/latest/powered-by-mesos/, and
 chatting with folks about Apache Aurora
 http://aurora.incubator.apache.org. Right now my Thursday and Friday
 evenings are free, so let's grab a beer and chat more.

 I'm also on Twitter: @davelester

 Next!



Re: Mesos + storm on top of Docker

2014-08-18 Thread Yaron Rosenbaum
Hi @vinodkone

nimbus log:
2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] 
not alive
2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] 
not alive
2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] 
not alive
2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] 
not alive

for all the executors.
On the mesos slave, there are no storm related logs.
Which leads me to believe that there's no supervisor to be found, even-though 
there's obviously an executor that's assigned to the job.

My understanding is that Mesos is responsible for spawning the supervisors 
(although that's not explicitly stated anywhere). The documentation is not very 
clear. But if I run the supervisors, then Mesos can't do the resource 
allocation as it's supposed to.

(Y)

On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote:

 Can you paste the slave/executor log related to the executor failure?
 
 @vinodkone
 
 On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote:
 
 Hi
 
 I have created a Docker based Mesos setup, including chronos, marathon, and 
 storm.
 Following advice I saw previously on this mailing list, I have run all 
 frameworks directly on the Mesos master (is this correct? is it guaranteed 
 to have only one master at any given time?)
 
 Chronos and marathon work perfectly, but storm doesn't. UI works, but it 
 seems like supervisors are not able to communicate with nimbus. I can deploy 
 topologies, but the executors fail.
 
 Here's the project on github:
 https://github.com/yaronr/docker-mesos
 
 I've spent over a week on this and I'm hitting a wall.
 
 
 Thanks!
 
 (Y)
 



Re: Mesos + storm on top of Docker

2014-08-18 Thread Yaron Rosenbaum
@vinodkone

Finally found some relevant logs..
Let's start with the slave:

slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task 
82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002
slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task 
'82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework 
'20140818-161802-2214597036-5050-10-0002
slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537] Starting 
container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor 
'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002'
slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task 
82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002
slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task 
'82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework 
'20140818-161802-2214597036-5050-10-0002
slave_1 | I0818 16:18:51.70804413 launcher.cpp:117] Forked child with 
pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
slave_1 | I0818 16:18:51.71742711 mesos_containerizer.cpp:647] Fetching 
URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using command 
'/usr/local/libexec/mesos/mesos-fetcher'
slave_1 | I0818 16:19:01.10964414 slave.cpp:2873] Current usage 37.40%. 
Max allowed age: 3.681899907883981days
slave_1 | I0818 16:19:09.76684512 slave.cpp:2355] Monitoring executor 
'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' 
in container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
slave_1 | I0818 16:19:10.76505814 mesos_containerizer.cpp:1112] 
Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited
slave_1 | I0818 16:19:10.76538814 mesos_containerizer.cpp:996] 
Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'

So the executor gets started, and then exists.
Found the stderr of the framework/run
I0818 16:23:53.42701650 fetcher.cpp:61] Extracted resource 
'/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz'
 into 
'/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0'
--2014-08-18 16:23:54--  http://7df8d3d507a1:41765/conf/storm.yaml
Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known.
wget: unable to resolve host address '7df8d3d507a1'

So the problem is with host resolution. It's trying to resolve 7df8d3d507a1 and 
fails.
Obviously this node is not in the /etc/hosts. Why would it be able to resolve 
it?

(Y)

On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote:

 Hi @vinodkone
 
 nimbus log:
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] 
 not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] 
 not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] 
 not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] 
 not alive
 
 for all the executors.
 On the mesos slave, there are no storm related logs.
 Which leads me to believe that there's no supervisor to be found, even-though 
 there's obviously an executor that's assigned to the job.
 
 My understanding is that Mesos is responsible for spawning the supervisors 
 (although that's not explicitly stated anywhere). The documentation is not 
 very clear. But if I run the supervisors, then Mesos can't do the resource 
 allocation as it's supposed to.
 
 (Y)
 
 On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote:
 
 Can you paste the slave/executor log related to the executor failure?
 
 @vinodkone
 
 On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com 
 wrote:
 
 Hi
 
 I have created a Docker based Mesos setup, including chronos, marathon, and 
 storm.
 Following advice I saw previously on this mailing list, I have run all 
 frameworks directly on the Mesos master (is this correct? is it guaranteed 
 to have only one master at any given time?)
 
 Chronos and marathon work perfectly, but storm doesn't. UI works, but it 
 seems like supervisors are not able to communicate with nimbus. I can 
 deploy topologies, but the executors fail.
 
 Here's the project on github:
 https://github.com/yaronr/docker-mesos
 
 I've spent over a week on this and I'm hitting a wall.
 
 
 Thanks!
 
 (Y)
 
 



Re: Mesos + storm on top of Docker

2014-08-18 Thread Brenden Matthews
Is the hostname set correctly on the machine running nimbus?  It looks like
that may not be correct.


On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum yaron.rosenb...@gmail.com
wrote:

 @vinodkone

 Finally found some relevant logs..
 Let's start with the slave:

 slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task
 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task
 '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework
 '20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537]
 Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor
 'wordcount-1-1408378726' of framework
 '20140818-161802-2214597036-5050-10-0002'
 slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task
 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task
 '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework
 '20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.70804413 launcher.cpp:117] Forked child
 with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
 slave_1 | I0818 16:18:51.71742711 mesos_containerizer.cpp:647]
 Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using
 command '/usr/local/libexec/mesos/mesos-fetcher'
 slave_1 | I0818 16:19:01.10964414 slave.cpp:2873] Current usage
 37.40%. Max allowed age: 3.681899907883981days
 slave_1 | I0818 16:19:09.76684512 slave.cpp:2355] Monitoring
 executor 'wordcount-1-1408378726' of framework
 '20140818-161802-2214597036-5050-10-0002' in container
 '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
 slave_1 | I0818 16:19:10.76505814 mesos_containerizer.cpp:1112]
 Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited
 slave_1 | I0818 16:19:10.76538814 mesos_containerizer.cpp:996]
 Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'

 So the executor gets started, and then exists.
 Found the stderr of the framework/run
 I0818 16:23:53.42701650 fetcher.cpp:61] Extracted resource
 '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz'
 into
 '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0'
 --2014-08-18 16:23:54--  http://7df8d3d507a1:41765/conf/storm.yaml
 Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known.
 wget: unable to resolve host address '7df8d3d507a1'

 So the problem is with host resolution. It's trying to resolve
 7df8d3d507a1 and fails.
 Obviously this node is not in the /etc/hosts. Why would it be able to
 resolve it?

 (Y)

 On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum yaron.rosenb...@gmail.com
 wrote:

 Hi @vinodkone

 nimbus log:
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2
 2] not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2
 2] not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3
 3] not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3
 3] not alive

 for all the executors.
 On the mesos slave, there are no storm related logs.
 Which leads me to believe that there's no supervisor to be found,
 even-though there's obviously an executor that's assigned to the job.

 My understanding is that Mesos is responsible for spawning the supervisors
 (although that's not explicitly stated anywhere). The documentation is not
 very clear. But if I run the supervisors, then Mesos can't do the resource
 allocation as it's supposed to.

 (Y)

 On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote:

 Can you paste the slave/executor log related to the executor failure?

 @vinodkone

 On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com
 wrote:

 Hi

 I have created a Docker based Mesos setup, including chronos, marathon,
 and storm.
 Following advice I saw previously on this mailing list, I have run all
 frameworks directly on the Mesos master (is this correct? is it guaranteed
 to have only one master at any given time?)

 Chronos and marathon work perfectly, but storm doesn't. UI works, but it
 seems like supervisors are not able to communicate with nimbus. I can
 deploy topologies, but the executors fail.

 Here's the project on github:
 https://github.com/yaronr/docker-mesos

 I've spent over a week on this and I'm hitting a wall.


 Thanks!

 (Y)






Re: Mesos + storm on top of Docker

2014-08-18 Thread Michael Babineau
Including --hostname=host in your docker run command should help with the
resolution problem (so long as host is resolvable)


On Mon, Aug 18, 2014 at 9:42 AM, Brenden Matthews 
brenden.matth...@airbedandbreakfast.com wrote:

 Is the hostname set correctly on the machine running nimbus?  It looks
 like that may not be correct.


 On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum 
 yaron.rosenb...@gmail.com wrote:

 @vinodkone

 Finally found some relevant logs..
 Let's start with the slave:

 slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task
 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task
 '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework
 '20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537]
 Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor
 'wordcount-1-1408378726' of framework
 '20140818-161802-2214597036-5050-10-0002'
 slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task
 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task
 '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework
 '20140818-161802-2214597036-5050-10-0002
 slave_1 | I0818 16:18:51.70804413 launcher.cpp:117] Forked child
 with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
 slave_1 | I0818 16:18:51.71742711 mesos_containerizer.cpp:647]
 Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using
 command '/usr/local/libexec/mesos/mesos-fetcher'
 slave_1 | I0818 16:19:01.10964414 slave.cpp:2873] Current usage
 37.40%. Max allowed age: 3.681899907883981days
 slave_1 | I0818 16:19:09.76684512 slave.cpp:2355] Monitoring
 executor 'wordcount-1-1408378726' of framework
 '20140818-161802-2214597036-5050-10-0002' in container
 '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
 slave_1 | I0818 16:19:10.76505814 mesos_containerizer.cpp:1112]
 Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited
 slave_1 | I0818 16:19:10.76538814 mesos_containerizer.cpp:996]
 Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'

 So the executor gets started, and then exists.
 Found the stderr of the framework/run
 I0818 16:23:53.42701650 fetcher.cpp:61] Extracted resource
 '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz'
 into
 '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0'
 --2014-08-18 16:23:54--  http://7df8d3d507a1:41765/conf/storm.yaml
 Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not
 known.
 wget: unable to resolve host address '7df8d3d507a1'

 So the problem is with host resolution. It's trying to resolve
 7df8d3d507a1 and fails.
 Obviously this node is not in the /etc/hosts. Why would it be able to
 resolve it?

 (Y)

 On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum yaron.rosenb...@gmail.com
 wrote:

 Hi @vinodkone

 nimbus log:
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
 wordcount-1-1408376868:[2 2] not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
 wordcount-1-1408376868:[2 2] not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
 wordcount-1-1408376868:[3 3] not alive
 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
 wordcount-1-1408376868:[3 3] not alive

 for all the executors.
 On the mesos slave, there are no storm related logs.
 Which leads me to believe that there's no supervisor to be found,
 even-though there's obviously an executor that's assigned to the job.

 My understanding is that Mesos is responsible for spawning the
 supervisors (although that's not explicitly stated anywhere). The
 documentation is not very clear. But if I run the supervisors, then Mesos
 can't do the resource allocation as it's supposed to.

 (Y)

 On Aug 18, 2014, at 6:13 PM, Vinod Kone vinodk...@gmail.com wrote:

 Can you paste the slave/executor log related to the executor failure?

 @vinodkone

 On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com
 wrote:

 Hi

 I have created a Docker based Mesos setup, including chronos, marathon,
 and storm.
 Following advice I saw previously on this mailing list, I have run all
 frameworks directly on the Mesos master (is this correct? is it guaranteed
 to have only one master at any given time?)

 Chronos and marathon work perfectly, but storm doesn't. UI works, but it
 seems like supervisors are not able to communicate with nimbus. I can
 deploy topologies, but the executors fail.

 Here's the project on github:
 https://github.com/yaronr/docker

Re: mesos scheduling

2014-08-18 Thread Benjamin Mahler
Mesos also provides the ability to reserve resources, if you need
guarantees about the resources available to a particular framework.

For now, resources can be reserved at the per-slave level and they will
*only* be offered to the role that has them reserved.


On Mon, Aug 18, 2014 at 2:13 AM, Adam Bordelon a...@mesosphere.io wrote:

 That's correct (for now). We're looking into features that would support
 preemption of running tasks, but currently a user/admin would have to
 manually kill long-running tasks to scale down an over-provisioned
 framework. Marathon also has a nice API (web or REST) for scaling down the
 number of instances of a long-running service.


 On Mon, Aug 18, 2014 at 1:43 AM, Jun Feng Liu liuj...@cn.ibm.com wrote:

 Thanks, Adam.. Sounds like it is going to be pretty effective when all
 the framework running a short tasks, then mesos can balance the resource
 allocation based on the DRF among the framework quickly. If one framework
 is happening to run some long tasks and take too many resources, mesos have
 to wait until some resource being free up to assign to other framework. Is
 it correct?

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information]
 *Phone: *86-10-82452683
 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China





  *Adam Bordelon a...@mesosphere.io a...@mesosphere.io*

 2014/08/18 16:26
  Please respond to
 user@mesos.apache.org

   To
 user@mesos.apache.org user@mesos.apache.org,
 cc
 Jay Buffington jaybuffing...@gmail.com
 Subject
 Re: mesos scheduling




 Mesos uses a fair-sharing algorithm[1] to ensure that each framework
 registered with Mesos is ensured its fair share of resources. If you want
 more control over the groupings and weights of different frameworks, check
 out the roles and weights parameters: mesos-master --roles=services,batch
 and --weights=services=2,batch=1 as described at
 *http://mesosphere.io/docs/mesos/deep-dive/mesos-master/*
 http://mesosphere.io/docs/mesos/deep-dive/mesos-master/

 Mesos uses these algorithms and parameters to decide which framework gets
 the next offer, so it won't affect already running tasks if one framework
 is already hogging the cluster when you start a new framework. But if you
 start killing tasks from the over-provisioned framework, those resources
 will be offered to the new framework(s) until it reaches its fair share.

 [1] *http://static.usenix.org/event/nsdi11/tech/full_papers/Ghodsi.pdf*
 http://static.usenix.org/event/nsdi11/tech/full_papers/Ghodsi.pdf


 On Sun, Aug 17, 2014 at 7:06 PM, Jun Feng Liu *liuj...@cn.ibm.com*
 liuj...@cn.ibm.com wrote:
 Thanks Jay.. Dose it mean if one of scheduler/frame need a lot resource
 and keep ask for more resources from mesos, then it will cause other
 framework/scheduler hard to get resources? Any way I can configure the
 mesos to setup a resource consuming boundary for each framework?
 Best Regards


 * Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  *Phone: *86-10-82452683
 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China




   *Jay Buffington **m...@jaybuff.com* m...@jaybuff.com**
 Sent by: *jaybuffing...@gmail.com* jaybuffing...@gmail.com

 2014/08/18 02:44


   Please respond to
 *user@mesos.apache.org* user@mesos.apache.org

   To
 *user@mesos.apache.org* user@mesos.apache.org,
 cc
   Subject
 Re: mesos scheduling






 On Sun, Aug 17, 2014 at 6:13 AM, Jun Feng Liu *liuj...@cn.ibm.com*
 liuj...@cn.ibm.com wrote:
 I am trying to better understand how mesos allocator works. In the offer
 resource model, will mesos send the same offer to multiple framework? Or it
 just send all resource to one framework then wait the response from the the
 framework then try the next one?

 Mesos sends an offer to one scheduler (a scheduler is part of a
 framework) at a time.  That scheduler will have the offer until it uses it,
 gives it back or mesos rescinds it.

 This strategy was referred to as pessimistic by Google's Omega paper
 [1] and has drawbacks.  In order to address these points a new type of
 offer, an Optimistic Offer, is being considered.  See
 *https://issues.apache.org/jira/browse/MESOS-1607*
 https://issues.apache.org/jira/browse/MESOS-1607

 Jay

 [1]
 *http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf*
 http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf





Re: [VOTE] Release Apache Mesos 0.20.0 (rc1)

2014-08-18 Thread Jie Yu
OK, I can confirm it is a bug due to the new docker stuff. Partially my bad
not test it on mac.

I need to have rc2 with the bug fix. I'll submit a bug fix shortly.

- Jie


On Mon, Aug 18, 2014 at 1:53 PM, Vinod Kone vinodk...@gmail.com wrote:

 make check succeed on Centos 5.5 but failed on Python framework on OSX
 Mavericks.

 environment details:

 ➜  mesos-0.20.0  gcc --version

 Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr
 --with-gxx-include-dir=/usr/include/c++/4.2.1

 Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)

 Target: x86_64-apple-darwin13.3.0

 Thread model: posix

 ➜  mesos-0.20.0  g++ --version

 Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr
 --with-gxx-include-dir=/usr/include/c++/4.2.1

 Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)

 Target: x86_64-apple-darwin13.3.0

 Thread model: posix

 ➜  mesos-0.20.0  python --version

 Python 2.7.7




 [ RUN  ] ExamplesTest.PythonFramework

 Using temporary directory '/tmp/ExamplesTest_PythonFramework_nX85Jw'

 Traceback (most recent call last):

   File /tmp/mesos-0.20.0/src/examples/python/test_framework.py, line 25,
 in module

 import mesos.native

   File build/bdist.macosx-10.9-x86_64/egg/mesos/native/__init__.py, line
 17, in module

   File build/bdist.macosx-10.9-x86_64/egg/mesos/native/_mesos.py, line
 7, in module

   File build/bdist.macosx-10.9-x86_64/egg/mesos/native/_mesos.py, line
 6, in __bootstrap__

 ImportError:
 dlopen(/Users/vinod/.python-eggs/mesos.native-0.20.0-py2.7-macosx-10.9-x86_64.egg-tmp/mesos/native/_mesos.so,
 2): Symbol not found:
 __ZN7cgroups9hierarchyERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIc

   Referenced from:
 /Users/vinod/.python-eggs/mesos.native-0.20.0-py2.7-macosx-10.9-x86_64.egg-tmp/mesos/native/_mesos.so

   Expected in: flat namespace

  in
 /Users/vinod/.python-eggs/mesos.native-0.20.0-py2.7-macosx-10.9-x86_64.egg-tmp/mesos/native/_mesos.so

 tests/script.cpp:83: Failure

 Failed

 python_framework_test.sh exited with status 1

 [  FAILED  ] ExamplesTest.PythonFramework (674 ms)








 On Sun, Aug 17, 2014 at 12:05 AM, Jie Yu yujie@gmail.com wrote:

 Hi all,

 Please vote on releasing the following candidate as Apache Mesos 0.20.0.


 0.20.0 includes the following:

 
 This release includes a lot of new cool features. The major new features
 are
 listed below:

 * Docker support in Mesos.
   * Users now can launch executors/tasks within Docker containers.
   * Mesos now supports running multiple containerizers simultaneously.
 The slave
 can dynamically choose a containerizer to launch containers based on
 the
 configuration of executors/tasks.

 * Container level network monitoring for mesos containerizer.
   * Network statistics for each active container can be retrieved through
 the
 /monitor/statistics.json endpoint on the slave.
   * Completely transparent to the tasks running on the slave. No need to
 change
 the service discovery mechanism for tasks.

 * Framework authorization.
   * Allows frameworks to (re-)register with authorized roles.
   * Allows frameworks to launch tasks/executors as authorized users.
   * Allows authorized principals to shutdown framework(s) through HTTP
 endpoint.

 * Framework rate limiting.
   * In a multi-framework environment, this feature aims to protect the
 throughput of high-SLA (e.g., production, service) frameworks by
 having the
 master throttle messages from other (e.g., development, batch)
 frameworks.

 * Enable building against installed third-party dependencies.

 This release also includes several bug fixes and stability improvements.

 

 The candidate for Mesos 0.20.0 release is available at:

 https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc1/mesos-0.20.0.tar.gz

 The tag to be voted on is 0.20.0-rc1:
 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.20.0-rc1

 The MD5 checksum of the tarball can be found at:

 https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc1/mesos-0.20.0.tar.gz.md5

 The signature of the tarball can be found at:

 https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc1/mesos-0.20.0.tar.gz.asc

 The PGP key used to sign the release is here:
 https://dist.apache.org/repos/dist/release/mesos/KEYS

 The JAR is up in Maven in a staging repository here:
 https://repository.apache.org/content/repositories/orgapachemesos-1028

 Please vote on releasing this package as Apache Mesos 0.20.0!

 The vote is open until Wed Aug 20 00:03:55 PDT 2014 and passes if a
 majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Mesos 0.20.0
 [ ] -1 Do not release this package because ...

 Thanks,
 - Jie





Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-18 Thread Vinod Kone
On Sat, Aug 16, 2014 at 4:26 AM, John Omernik j...@omernik.com wrote:

 I've confirmed on the package I am using that when I untar it using tar
 zxf as root, that the task-controller does NOT lose the setuid bit.  But on
 the lost tasks in Mesos I get the error below.  What's interesting is that
 if drill down to the directory, the owner is root:root, but just the
 setuid bit is missing.


What user is the slave running as? root?