Hi Tim

Do you have any materials/blog for running Spark in a container in Mesos
cluster environment? I have googled it but couldn't find info on it. Spark
documentation says it is possible, but no details provided.. Please help


Thanks

Sathish



On Mon, Sep 21, 2015 at 11:54 AM Tim Chen <t...@mesosphere.io> wrote:

> Hi John,
>
> There is no other blog post yet, I'm thinking to do a series of posts but
> so far haven't get time to do that yet.
>
> Running Spark in docker containers makes distributing spark versions easy,
> it's simple to upgrade and automatically caches on the slaves so the same
> image just runs right away. Most of the docker perf is usually related to
> network and filesystem overheads, but I think with recent changes in Spark
> to make Mesos sandbox the default temp dir filesystem won't be a big
> concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
> uses host network by default so network is affected much.
>
> Most of the cluster mode limitation is that you need to make the spark job
> files available somewhere that all the slaves can access remotely (http,
> s3, hdfs, etc) or available on all slaves locally by path.
>
> I'll try to make more doc efforts once I get my existing patches and
> testing infra work done.
>
> Let me know if you have more questions,
>
> Tim
>
> On Sat, Sep 19, 2015 at 5:42 AM, John Omernik <j...@omernik.com> wrote:
>
>> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and
>> just found you CAN run it this way.  Are there any user posts, blog posts,
>> etc on why and how you'd do this?
>>
>> Basically, at first I was questioning why you'd run spark in a docker
>> container, i.e., if you run with tar balled executor, what are you really
>> gaining?  And in this setup, are you losing out on performance somehow? (I
>> am guessing smarter people than I have figured that out).
>>
>> Then I came along a situation where I wanted to use a python library with
>> spark, and it had to be installed on every node, and I realized one big
>> advantage of dockerized spark would be that spark apps that needed other
>> libraries could be contained and built well.
>>
>> OK, that's huge, let's do that.  For my next question there are lot of
>> "questions" have on how this actually works.  Does Clustermode/client mode
>> apply here? If so, how?  Is there a good walk through on getting this
>> setup? Limitations? Gotchas?  Should I just dive in an start working with
>> it? Has anyone done any stories/rough documentation? This seems like a
>> really helpful feature to scaling out spark, and letting developers truly
>> build what they need without tons of admin overhead, so I really want to
>> explore.
>>
>> Thanks!
>>
>> John
>>
>
>

Reply via email to