Hi Tim Do you have any materials/blog for running Spark in a container in Mesos cluster environment? I have googled it but couldn't find info on it. Spark documentation says it is possible, but no details provided.. Please help
Thanks Sathish On Mon, Sep 21, 2015 at 11:54 AM Tim Chen <t...@mesosphere.io> wrote: > Hi John, > > There is no other blog post yet, I'm thinking to do a series of posts but > so far haven't get time to do that yet. > > Running Spark in docker containers makes distributing spark versions easy, > it's simple to upgrade and automatically caches on the slaves so the same > image just runs right away. Most of the docker perf is usually related to > network and filesystem overheads, but I think with recent changes in Spark > to make Mesos sandbox the default temp dir filesystem won't be a big > concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos > uses host network by default so network is affected much. > > Most of the cluster mode limitation is that you need to make the spark job > files available somewhere that all the slaves can access remotely (http, > s3, hdfs, etc) or available on all slaves locally by path. > > I'll try to make more doc efforts once I get my existing patches and > testing infra work done. > > Let me know if you have more questions, > > Tim > > On Sat, Sep 19, 2015 at 5:42 AM, John Omernik <j...@omernik.com> wrote: > >> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and >> just found you CAN run it this way. Are there any user posts, blog posts, >> etc on why and how you'd do this? >> >> Basically, at first I was questioning why you'd run spark in a docker >> container, i.e., if you run with tar balled executor, what are you really >> gaining? And in this setup, are you losing out on performance somehow? (I >> am guessing smarter people than I have figured that out). >> >> Then I came along a situation where I wanted to use a python library with >> spark, and it had to be installed on every node, and I realized one big >> advantage of dockerized spark would be that spark apps that needed other >> libraries could be contained and built well. >> >> OK, that's huge, let's do that. For my next question there are lot of >> "questions" have on how this actually works. Does Clustermode/client mode >> apply here? If so, how? Is there a good walk through on getting this >> setup? Limitations? Gotchas? Should I just dive in an start working with >> it? Has anyone done any stories/rough documentation? This seems like a >> really helpful feature to scaling out spark, and letting developers truly >> build what they need without tons of admin overhead, so I really want to >> explore. >> >> Thanks! >> >> John >> > >