from:"Tim Chen"

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Tim Chen

Hi Eran,

I need to investigate but perhaps that's true, we're using SPARK_JAVA_OPTS
to pass all the options and not --conf.

I'll take a look at the bug, but if you can try the workaround and see if
that fixes your problem.

Tim

On Thu, Mar 10, 2016 at 10:08 AM, Eran Chinthaka Withana <
eran.chinth...@gmail.com> wrote:

> Hi Timothy
>
> What version of spark are you guys running?
>>
>
> I'm using Spark 1.6.0. You can see the Dockerfile I used here:
> https://github.com/echinthaka/spark-mesos-docker/blob/master/docker/mesos-spark/Dockerfile
>
>
>
>> And also did you set the working dir in your image to be spark home?
>>
>
> Yes I did. You can see it here: https://goo.gl/8PxtV8
>
> Can it be because of this:
> https://issues.apache.org/jira/browse/SPARK-13258 as Guillaume pointed
> out above? As you can see, I'm passing in the docker image URI through
> spark-submit (--conf spark.mesos.executor.docker.
> image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6)
>
> Thanks,
> Eran
>
>
>

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Tim Chen

Here is an example dockerfile, although it's a bit dated now if you build
it today it should still work:

https://github.com/tnachen/spark/tree/dockerfile/mesos_docker

Tim

On Thu, Mar 10, 2016 at 8:06 AM, Ashish Soni  wrote:

> Hi Tim ,
>
> Can you please share your dockerfiles and configuration as it will help a
> lot , I am planing to publish a blog post on the same .
>
> Ashish
>
> On Thu, Mar 10, 2016 at 10:34 AM, Timothy Chen  wrote:
>
>> No you don't need to install spark on each slave, we have been running
>> this setup in Mesosphere without any problem at this point, I think most
>> likely configuration problem and perhaps a chance something is missing in
>> the code to handle some cases.
>>
>> What version of spark are you guys running? And also did you set the
>> working dir in your image to be spark home?
>>
>> Tim
>>
>>
>> On Mar 10, 2016, at 3:11 AM, Ashish Soni  wrote:
>>
>> You need to install spark on each mesos slave and then while starting
>> container make a workdir to your spark home so that it can find the spark
>> class.
>>
>> Ashish
>>
>> On Mar 10, 2016, at 5:22 AM, Guillaume Eynard Bontemps <
>> g.eynard.bonte...@gmail.com> wrote:
>>
>> For an answer to my question see this:
>> http://stackoverflow.com/a/35660466?noredirect=1.
>>
>> But for your problem did you define  the  Spark.mesos.docker. home or
>> something like  that property?
>>
>> Le jeu. 10 mars 2016 04:26, Eran Chinthaka Withana <
>> eran.chinth...@gmail.com> a écrit :
>>
>>> Hi
>>>
>>> I'm also having this issue and can not get the tasks to work inside
>>> mesos.
>>>
>>> In my case, the spark-submit command is the following.
>>>
>>> $SPARK_HOME/bin/spark-submit \
>>>  --class com.mycompany.SparkStarter \
>>>  --master mesos://mesos-dispatcher:7077 \ --name SparkStarterJob \
>>> --driver-memory 1G \
>>>  --executor-memory 4G \
>>> --deploy-mode cluster \
>>>  --total-executor-cores 1 \
>>>  --conf 
>>> spark.mesos.executor.docker.image=echinthaka/mesos-spark:0.23.1-1.6.0-2.6 \
>>>  http://abc.com/spark-starter.jar
>>>
>>>
>>> And the error I'm getting is the following.
>>>
>>> I0310 03:13:11.417009 131594 exec.cpp:132] Version: 0.23.1
>>> I0310 03:13:11.419452 131601 exec.cpp:206] Executor registered on slave 
>>> 20160223-000314-3439362570-5050-631-S0
>>> sh: 1: /usr/spark-1.6.0-bin-hadoop2.6/bin/spark-class: not found
>>>
>>>
>>> (Looked into Spark JIRA and I found that
>>> https://issues.apache.org/jira/browse/SPARK-11759 is marked as closed
>>> since https://issues.apache.org/jira/browse/SPARK-12345 is marked as
>>> resolved)
>>>
>>> Really appreciate if I can get some help here.
>>>
>>> Thanks,
>>> Eran Chinthaka Withana
>>>
>>> On Wed, Feb 17, 2016 at 2:00 PM, g.eynard.bonte...@gmail.com <
>>> g.eynard.bonte...@gmail.com> wrote:
>>>
 Hi everybody,

 I am testing the use of Docker for executing Spark algorithms on MESOS.
 I
 managed to execute Spark in client mode with executors inside Docker,
 but I
 wanted to go further and have also my Driver running into a Docker
 Container. Here I ran into a behavior that I'm not sure is normal, let
 me
 try to explain.

 I submit my spark application through MesosClusterDispatcher using a
 command
 like:
 $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 mesos://spark-master-1:7077 --deploy-mode cluster --conf
 spark.mesos.executor.docker.image=myuser/myimage:0.0.2

 https://storage.googleapis.com/some-bucket/spark-examples-1.5.2-hadoop2.6.0.jar
 10

 My driver is running fine, inside its docker container, but the
 executors
 fail:
 "sh: /some/spark/home/bin/spark-class: No such file or directory"

 Looking on MESOS slaves log, I think that the executors do not run
 inside
 docker: "docker.cpp:775] No container info found, skipping launch". As
 my
 Mesos slaves do not have spark installed, it fails.

 *It seems that the spark conf that I gave in the first spark-submit is
 not
 transmitted to the Driver submitted conf*, when launched in the docker
 container. The only workaround I found is to modify my Docker image in
 order
 to define inside its spark conf the spark.mesos.executor.docker.image
 property. This way, my executors get the conf well and are launched
 inside
 docker on Mesos. This seems a little complicated to me, and I feel the
 configuration passed to the early spark-submit should be transmitted to
 the
 Driver submit...



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Problem-mixing-MESOS-Cluster-Mode-and-Docker-task-execution-tp26258.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com
 .

 -
 To

Re: Spark 1.5 on Mesos

2016-03-03 Thread Tim Chen

Ah I see, I think it's because you've launched the Mesos slave in a docker
container, and when you launch also the executor in a container it's not
able to mount in the sandbox to the other container since the slave is in a
chroot.

Can you try mounting in a volume from the host when you launch the slave
for your slave's workdir?
docker run -v /tmp/mesos/slave:/tmp/mesos/slave mesos_image mesos-slave
--work_dir=/tmp/mesos/slave 

Tim

On Thu, Mar 3, 2016 at 4:42 AM, Ashish Soni <asoni.le...@gmail.com> wrote:

> Hi Tim ,
>
>
> I think I know the problem but i do not have a solution , *The Mesos
> Slave supposed to download the Jars from the URI specified and placed in
> $MESOS_SANDBOX location but it is not downloading not sure why* .. see
> below logs
>
> My command looks like below
>
> docker run -it --rm -m 2g -e SPARK_MASTER="mesos://10.0.2.15:7077"  -e
> SPARK_IMAGE="spark_driver:latest" spark_driver:latest ./bin/spark-submit
> --deploy-mode cluster --class org.apache.spark.examples.SparkPi
> http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar
>
> [root@Mindstorm spark-1.6.0]# docker logs d22d8e897b79
> *Warning: Local jar
> /mnt/mesos/sandbox/spark-examples-1.6.0-hadoop2.6.0.jar does not exist,
> skipping.*
> java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:278)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> When i do docker inspect i see below command gets issued
>
> "Cmd": [
> "-c",
> "./bin/spark-submit --name org.apache.spark.examples.SparkPi
> --master mesos://10.0.2.15:5050 --driver-cores 1.0 --driver-memory 1024M
> --class org.apache.spark.examples.SparkPi 
> $*MESOS_SANDBOX*/spark-examples-1.6.0-hadoop2.6.0.jar
> "
>
>
>
> On Thu, Mar 3, 2016 at 12:09 AM, Tim Chen <t...@mesosphere.io> wrote:
>
>> You shouldn't need to specify --jars at all since you only have one jar.
>>
>> The error is pretty odd as it suggests it's trying to load
>> /opt/spark/Example but that doesn't really seem to be anywhere in your
>> image or command.
>>
>> Can you paste your stdout from the driver task launched by the cluster
>> dispatcher, that shows you the spark-submit command it eventually ran?
>>
>>
>> Tim
>>
>>
>>
>> On Wed, Mar 2, 2016 at 5:42 PM, Ashish Soni <asoni.le...@gmail.com>
>> wrote:
>>
>>> See below  and Attached the Dockerfile to build the spark image  (
>>> between i just upgraded to 1.6 )
>>>
>>> I am running below setup -
>>>
>>> Mesos Master - Docker Container
>>> Mesos Slave 1 - Docker Container
>>> Mesos Slave 2 - Docker Container
>>> Marathon - Docker Container
>>> Spark MESOS Dispatcher - Docker Container
>>>
>>> when i submit the Spark PI Example Job using below command
>>>
>>> *docker run -it --rm -m 2g -e SPARK_MASTER="mesos://10.0.2.15:7077
>>> <http://10.0.2.15:7077>"  -e SPARK_IMAGE="spark_driver:**latest"
>>> spark_driver:latest ./bin/spark-submit  --deploy-mode cluster --name "PI
>>> Example" --class org.apache.spark.examples.**SparkPi
>>> http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar
>>> <http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar> --jars
>>> /opt/spark/lib/spark-examples-**1.6.0-hadoop2.6.0.jar --verbose*
>>>
>>> Below is the ERROR
>>> Error: Cannot load main class from JAR file:/opt/spark/Example
>>> Run with --help for usage help or --verbose for debug output
>>>
>>>
>>> When i docker Inspect for the stopped / dead container i see below
>>> output what is interesting to see is some one or executor replaced by
>>> original

Re: Spark 1.5 on Mesos

2016-03-02 Thread Tim Chen

You shouldn't need to specify --jars at all since you only have one jar.

The error is pretty odd as it suggests it's trying to load
/opt/spark/Example but that doesn't really seem to be anywhere in your
image or command.

Can you paste your stdout from the driver task launched by the cluster
dispatcher, that shows you the spark-submit command it eventually ran?


Tim



On Wed, Mar 2, 2016 at 5:42 PM, Ashish Soni <asoni.le...@gmail.com> wrote:

> See below  and Attached the Dockerfile to build the spark image  ( between
> i just upgraded to 1.6 )
>
> I am running below setup -
>
> Mesos Master - Docker Container
> Mesos Slave 1 - Docker Container
> Mesos Slave 2 - Docker Container
> Marathon - Docker Container
> Spark MESOS Dispatcher - Docker Container
>
> when i submit the Spark PI Example Job using below command
>
> *docker run -it --rm -m 2g -e SPARK_MASTER="mesos://10.0.2.15:7077
> <http://10.0.2.15:7077>"  -e SPARK_IMAGE="spark_driver:**latest"
> spark_driver:latest ./bin/spark-submit  --deploy-mode cluster --name "PI
> Example" --class org.apache.spark.examples.**SparkPi
> http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar
> <http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar> --jars
> /opt/spark/lib/spark-examples-**1.6.0-hadoop2.6.0.jar --verbose*
>
> Below is the ERROR
> Error: Cannot load main class from JAR file:/opt/spark/Example
> Run with --help for usage help or --verbose for debug output
>
>
> When i docker Inspect for the stopped / dead container i see below output
> what is interesting to see is some one or executor replaced by original
> command with below in highlighted and i do not see Executor is downloading
> the JAR -- IS this a BUG i am hitting or not sure if that is supposed to
> work this way and i am missing some configuration
>
> "Env": [
> "SPARK_IMAGE=spark_driver:latest",
> "SPARK_SCALA_VERSION=2.10",
> "SPARK_VERSION=1.6.0",
> "SPARK_EXECUTOR_URI=
> http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz;,
> "MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-0.25.0.so",
> "SPARK_MASTER=mesos://10.0.2.15:7077",
>
> "SPARK_EXECUTOR_OPTS=-Dspark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/
> libmesos-0.25.0.so -Dspark.jars=
> http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar
> -Dspark.mesos.mesosExecutor.cores=0.1 -Dspark.driver.supervise=false -
> Dspark.app.name=PI Example -Dspark.mesos.uris=
> http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar
> -Dspark.mesos.executor.docker.image=spark_driver:latest
> -Dspark.submit.deployMode=cluster -Dspark.master=mesos://10.0.2.15:7077
> -Dspark.driver.extraClassPath=/opt/spark/custom/lib/*
> -Dspark.executor.extraClassPath=/opt/spark/custom/lib/*
> -Dspark.executor.uri=
> http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
> -Dspark.mesos.executor.home=/opt/spark",
> "MESOS_SANDBOX=/mnt/mesos/sandbox",
>
> "MESOS_CONTAINER_NAME=mesos-e47f8d4c-5ee1-4d01-ad07-0d9a03ced62d-S1.43c08f82-e508-4d57-8c0b-fa05bee77fd6",
>
> "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
> "HADOOP_VERSION=2.6",
> "SPARK_HOME=/opt/spark"
> ],
> "Cmd": [
> "-c",
>* "./bin/spark-submit --name PI Example --master
> mesos://10.0.2.15:5050 <http://10.0.2.15:5050> --driver-cores 1.0
> --driver-memory 1024M --class org.apache.spark.examples.SparkPi
> $MESOS_SANDBOX/spark-examples-1.6.0-hadoop2.6.0.jar --jars
> /opt/spark/lib/spark-examples-1.6.0-hadoop2.6.0.jar --verbose"*
> ],
> "Image": "spark_driver:latest",
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Mar 2, 2016 at 5:49 PM, Charles Allen <
> charles.al...@metamarkets.com> wrote:
>
>> @Tim yes, this is asking about 1.5 though
>>
>> On Wed, Mar 2, 2016 at 2:35 PM Tim Chen <t...@mesosphere.io> wrote:
>>
>>> Hi Charles,
>>>
>>> I thought that's fixed with your patch in latest master now right?
>>>
>>> Ashish, yes please give me your docker image name (if it's in the public
>>> registry) and what you've tried and I can see what's wrong. I think it's
>>> most likely just the configuration of where the Spark home folder is in the
>>> image.
>>>
>>> Tim
>>>
>>> On Wed, Mar 2, 2016 at 2:28 PM, Charles Allen <
>>> charles.al...@metamarkets.com> wrote:
>>>
>

Re: Spark 1.5 on Mesos

2016-03-02 Thread Tim Chen

Hi Charles,

I thought that's fixed with your patch in latest master now right?

Ashish, yes please give me your docker image name (if it's in the public
registry) and what you've tried and I can see what's wrong. I think it's
most likely just the configuration of where the Spark home folder is in the
image.

Tim

On Wed, Mar 2, 2016 at 2:28 PM, Charles Allen <charles.al...@metamarkets.com
> wrote:

> Re: Spark on Mesos Warning regarding disk space:
> https://issues.apache.org/jira/browse/SPARK-12330
>
> That's a spark flaw I encountered on a very regular basis on mesos. That
> and a few other annoyances are fixed in
> https://github.com/metamx/spark/tree/v1.5.2-mmx
>
> Here's another mild annoyance I've encountered:
> https://issues.apache.org/jira/browse/SPARK-11714
>
> On Wed, Mar 2, 2016 at 1:31 PM Ashish Soni <asoni.le...@gmail.com> wrote:
>
>> I have no luck and i would to ask the question to spark committers will
>> this be ever designed to run on mesos ?
>>
>> spark app as a docker container not working at all on mesos  ,if any one
>> would like the code i can send it over to have a look.
>>
>> Ashish
>>
>> On Wed, Mar 2, 2016 at 12:23 PM, Sathish Kumaran Vairavelu <
>> vsathishkuma...@gmail.com> wrote:
>>
>>> Try passing jar using --jars option
>>>
>>> On Wed, Mar 2, 2016 at 10:17 AM Ashish Soni <asoni.le...@gmail.com>
>>> wrote:
>>>
>>>> I made some progress but now i am stuck at this point , Please help as
>>>> looks like i am close to get it working
>>>>
>>>> I have everything running in docker container including mesos slave and
>>>> master
>>>>
>>>> When i try to submit the pi example i get below error
>>>> *Error: Cannot load main class from JAR file:/opt/spark/Example*
>>>>
>>>> Below is the command i use to submit as a docker container
>>>>
>>>> docker run -it --rm -e SPARK_MASTER="mesos://10.0.2.15:7077"  -e
>>>> SPARK_IMAGE="spark_driver:latest" spark_driver:latest ./bin/spark-submit
>>>> --deploy-mode cluster --name "PI Example" --class
>>>> org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory
>>>> 512m --executor-cores 1
>>>> http://10.0.2.15/spark-examples-1.6.0-hadoop2.6.0.jar
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 2:59 PM, Timothy Chen <t...@mesosphere.io> wrote:
>>>>
>>>>> Can you go through the Mesos UI and look at the driver/executor log
>>>>> from steer file and see what the problem is?
>>>>>
>>>>> Tim
>>>>>
>>>>> On Mar 1, 2016, at 8:05 AM, Ashish Soni <asoni.le...@gmail.com> wrote:
>>>>>
>>>>> Not sure what is the issue but i am getting below error  when i try to
>>>>> run spark PI example
>>>>>
>>>>> Blacklisting Mesos slave value: "5345asdasdasdkas234234asdasdasdasd"
>>>>>due to too many failures; is Spark installed on it?
>>>>> WARN TaskSchedulerImpl: Initial job has not accepted any resources; 
>>>>> check your cluster UI to ensure that workers are registered and have 
>>>>> sufficient resources
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 1:39 PM, Sathish Kumaran Vairavelu <
>>>>> vsathishkuma...@gmail.com> wrote:
>>>>>
>>>>>> May be the Mesos executor couldn't find spark image or the
>>>>>> constraints are not satisfied. Check your Mesos UI if you see Spark
>>>>>> application in the Frameworks tab
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 12:23 PM Ashish Soni <asoni.le...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> What is the Best practice , I have everything running as docker
>>>>>>> container in single host ( mesos and marathon also as docker container )
>>>>>>>  and everything comes up fine but when i try to launch the spark shell i
>>>>>>> get below error
>>>>>>>
>>>>>>>
>>>>>>> SQL context available as sqlContext.
>>>>>>>
>>>>>>> scala> val data = sc.parallelize(1 to 100)
>>>>>>> data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
>>>>>>> parallelize at :27
>>>>>>>
>>>>>

Re: Spark 1.5 on Mesos

2016-02-29 Thread Tim Chen

No you don't have to run Mesos in docker containers to run Spark in docker
containers.

Once you have Mesos cluster running you can then specfiy the Spark
configurations in your Spark job (i.e:
spark.mesos.executor.docker.image=mesosphere/spark:1.6)
and Mesos will automatically launch docker containers for you.

Tim

On Mon, Feb 29, 2016 at 7:36 AM, Ashish Soni <asoni.le...@gmail.com> wrote:

> Yes i read that and not much details here.
>
> Is it true that we need to have spark installed on each mesos docker
> container ( master and slave ) ...
>
> Ashish
>
> On Fri, Feb 26, 2016 at 2:14 PM, Tim Chen <t...@mesosphere.io> wrote:
>
>> https://spark.apache.org/docs/latest/running-on-mesos.html should be the
>> best source, what problems were you running into?
>>
>> Tim
>>
>> On Fri, Feb 26, 2016 at 11:06 AM, Yin Yang <yy201...@gmail.com> wrote:
>>
>>> Have you read this ?
>>> https://spark.apache.org/docs/latest/running-on-mesos.html
>>>
>>> On Fri, Feb 26, 2016 at 11:03 AM, Ashish Soni <asoni.le...@gmail.com>
>>> wrote:
>>>
>>>> Hi All ,
>>>>
>>>> Is there any proper documentation as how to run spark on mesos , I am
>>>> trying from the last few days and not able to make it work.
>>>>
>>>> Please help
>>>>
>>>> Ashish
>>>>
>>>
>>>
>>
>

Re: Spark 1.5 on Mesos

2016-02-26 Thread Tim Chen

https://spark.apache.org/docs/latest/running-on-mesos.html should be the
best source, what problems were you running into?

Tim

On Fri, Feb 26, 2016 at 11:06 AM, Yin Yang  wrote:

> Have you read this ?
> https://spark.apache.org/docs/latest/running-on-mesos.html
>
> On Fri, Feb 26, 2016 at 11:03 AM, Ashish Soni 
> wrote:
>
>> Hi All ,
>>
>> Is there any proper documentation as how to run spark on mesos , I am
>> trying from the last few days and not able to make it work.
>>
>> Please help
>>
>> Ashish
>>
>
>

Re: Standalone vs. Mesos for production installation on a smallish cluster

2016-02-26 Thread Tim Chen

Mesos does provide some benefits and features, such as the ability to
launch all the Spark pieces in Docker and also Mesos resource scheduling
features (weights, roles), and if you plan to also use HDFS/Cassandra there
are existing frameworks that are actively maintained by us.

That said when there is just 5 nodes and you just want to use Spark without
any other frameworks and not to add complexity I would also suggest use
Standalone.

Tim

On Fri, Feb 26, 2016 at 3:51 AM, Igor Berman  wrote:

> Imho most of production clusters are standalone
> there was some presentation from spark summit with some stats inside(can't
> find right now), so standalone was at 1st place
> it was from Matei
> https://databricks.com/resources/slides
>
> On 26 February 2016 at 13:40, Petr Novak  wrote:
>
>> Hi all,
>> I believe that it used to be in documentation that Standalone mode is not
>> for production. I'm either wrong or it was already removed.
>>
>> Having a small cluster between 5-10 nodes is Standalone recommended for
>> production? I would like to go with Mesos but the question is if there is
>> real add-on value for production, mainly from stability perspective.
>>
>> Can I expect that adding Mesos will improve stability compared to
>> Standalone to the extent to justify itself according to somewhat increased
>> complexity?
>>
>> I know it is hard to answer because Mesos layer itself is going to add
>> some bugs as well.
>>
>> Are there unique features enabled by Mesos specific to Spark? E.g.
>> adaptive resources for jobs or whatever?
>>
>> In the future once cluster will grow and more services running on Mesos,
>> we plan to use Mesos. The question is if it does worth to go with it
>> immediately even maybe its utility is not directly needed at this point.
>>
>> Many thanks,
>> Petr
>>
>
>

Re: can't kill spark job in supervise mode

2016-01-30 Thread Tim Chen

Hi Duc,

Are you running Spark on Mesos with cluster mode? And what's your cluster
mode submission, and version of Spark are you running?

Tim

On Sat, Jan 30, 2016 at 8:19 AM, PhuDuc Nguyen 
wrote:

> I have a spark job running on Mesos in multi-master and supervise mode. If
> I kill it, it is resilient as expected and respawns on another node.
> However, I cannot kill it when I need to. I have tried 2 methods:
>
> 1) ./bin/spark-class org.apache.spark.deploy.Client kill
>  
>
> 2) ./bin/spark-submit --master mesos:// --kill 
>
> Method 2, accepts the kill request but is respawned on another node.
> Ultimately, I can't get either method to kill the job. I suspect I have
> the wrong port for the master URL during the kill request for method 1?
> I've tried every combination of IP and port I can think of, is there one I
> am missing?
>
> Ports I've tried:
> 5050 = mesos UI
> 8080 = marathon
> 7077 = spark dispatcher
> 8081 = spark drivers UI
> 4040 = spark job UI
>
> thanks,
> Duc
>

Re: Docker/Mesos with Spark

2016-01-19 Thread Tim Chen

Hi Sathish,

Sorry about that, I think that's a good idea and I'll write up a section in
the Spark documentation page to explain how it can work. We (Mesosphere)
have been doing this for our DCOS spark for our past releases and has been
working well so far.

Thanks!

Tim

On Tue, Jan 19, 2016 at 12:28 PM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Hi Tim
>
> Do you have any materials/blog for running Spark in a container in Mesos
> cluster environment? I have googled it but couldn't find info on it. Spark
> documentation says it is possible, but no details provided.. Please help
>
>
> Thanks
>
> Sathish
>
>
>
>
> On Mon, Sep 21, 2015 at 11:54 AM Tim Chen <t...@mesosphere.io> wrote:
>
>> Hi John,
>>
>> There is no other blog post yet, I'm thinking to do a series of posts but
>> so far haven't get time to do that yet.
>>
>> Running Spark in docker containers makes distributing spark versions
>> easy, it's simple to upgrade and automatically caches on the slaves so the
>> same image just runs right away. Most of the docker perf is usually related
>> to network and filesystem overheads, but I think with recent changes in
>> Spark to make Mesos sandbox the default temp dir filesystem won't be a big
>> concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
>> uses host network by default so network is affected much.
>>
>> Most of the cluster mode limitation is that you need to make the spark
>> job files available somewhere that all the slaves can access remotely
>> (http, s3, hdfs, etc) or available on all slaves locally by path.
>>
>> I'll try to make more doc efforts once I get my existing patches and
>> testing infra work done.
>>
>> Let me know if you have more questions,
>>
>> Tim
>>
>> On Sat, Sep 19, 2015 at 5:42 AM, John Omernik <j...@omernik.com> wrote:
>>
>>> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities
>>> and just found you CAN run it this way.  Are there any user posts, blog
>>> posts, etc on why and how you'd do this?
>>>
>>> Basically, at first I was questioning why you'd run spark in a docker
>>> container, i.e., if you run with tar balled executor, what are you really
>>> gaining?  And in this setup, are you losing out on performance somehow? (I
>>> am guessing smarter people than I have figured that out).
>>>
>>> Then I came along a situation where I wanted to use a python library
>>> with spark, and it had to be installed on every node, and I realized one
>>> big advantage of dockerized spark would be that spark apps that needed
>>> other libraries could be contained and built well.
>>>
>>> OK, that's huge, let's do that.  For my next question there are lot of
>>> "questions" have on how this actually works.  Does Clustermode/client mode
>>> apply here? If so, how?  Is there a good walk through on getting this
>>> setup? Limitations? Gotchas?  Should I just dive in an start working with
>>> it? Has anyone done any stories/rough documentation? This seems like a
>>> really helpful feature to scaling out spark, and letting developers truly
>>> build what they need without tons of admin overhead, so I really want to
>>> explore.
>>>
>>> Thanks!
>>>
>>> John
>>>
>>
>>

Re: Weird Spark Dispatcher Offers?

2015-10-02 Thread Tim Chen

Do you have jobs enqueued? And if none of the jobs matches any offer it
will just decline it.

What's your job resource specifications?

Tim

On Fri, Oct 2, 2015 at 11:34 AM, Alan Braithwaite 
wrote:

> Hey All,
>
> Using spark with mesos and docker.
>
> I'm wondering if anybody's seen the behavior of spark dispatcher where it
> just continually requests resources and immediately declines the offer.
>
> https://gist.github.com/anonymous/41e7c91899b0122b91a7
>
> I'm trying to debug some issues with spark and I'm having trouble figuring
> out if this is part of the problem or if it's safe to ignore it.
>
> Any help or pointers would be appreciated.
>
> Thanks!
> - Alan
>

Re: Weird Spark Dispatcher Offers?

2015-10-02 Thread Tim Chen

Hi Alan,

The dispatcher is a Mesos framework and all frameworks in Mesos receives
offers from the master. Mesos is different than most schedulers where
we don't issue containers based on requests, but we offer available
resources to all frameworks and they in turn decide if they want to use
these resources.

In the Mesos dispatcher case we just decline offers coming in so it's
available for other frameworks.

Tim

On Fri, Oct 2, 2015 at 11:51 AM, Alan Braithwaite <a...@cloudflare.com>
wrote:

> So if there is no jobs to run the dispatcher will decline all offers by
>> default.
>>
>
> So would this be a bug in mesos then?  I'm not sure I understand how this
> offer is appearing in the first place.  It only shows up in the master logs
> when I start the dispatcher.
>
>
>> Also we list all the jobs enqueued and it's specifications in the Spark
>> dispatcher UI, you should see the port in the dispatcher logs itself.
>
>
> Yes, this job is not listed under that UI.  Hence my confusion.
>
> Thanks,
> - Alan
>
> On Fri, Oct 2, 2015 at 11:49 AM, Tim Chen <t...@mesosphere.io> wrote:
>
>> So if there is no jobs to run the dispatcher will decline all offers by
>> default.
>>
>> Also we list all the jobs enqueued and it's specifications in the Spark
>> dispatcher UI, you should see the port in the dispatcher logs itself.
>>
>> Tim
>>
>> On Fri, Oct 2, 2015 at 11:46 AM, Alan Braithwaite <a...@cloudflare.com>
>> wrote:
>>
>>> This happened right after blowing away /var/lib/mesos zk://mesos and
>>> zk://spark_mesos_dispatcher and before I've submitted anything new to it so
>>> I _shouldn't_ have anything enqueued.  Unless there's state being stored
>>> somewhere besides those places that I don't know about.
>>>
>>> I'm not sure what the resource specifications are for this one because I
>>> didn't submit it directly.  If you have a way for me to grab a specific
>>> offer configuration, I'd be delighted to provide it.  I just can't seem to
>>> figure out how to get that information after digging through the mesos docs
>>> :-(
>>>
>>> Also, I can't read the docker logs because:
>>>
>>> Oct 02 11:39:59 sparky docker[556]:
>>> time="2015-10-02T11:39:59.165474049-07:00" level=error msg="Error streaming
>>> logs: invalid character '\\x00' looking for beginning of value"
>>>
>>> (that's coming from the spark-dispatcher docker).
>>>
>>> Thanks!
>>> - Alan
>>>
>>> On Fri, Oct 2, 2015 at 11:36 AM, Tim Chen <t...@mesosphere.io> wrote:
>>>
>>>> Do you have jobs enqueued? And if none of the jobs matches any offer it
>>>> will just decline it.
>>>>
>>>> What's your job resource specifications?
>>>>
>>>> Tim
>>>>
>>>> On Fri, Oct 2, 2015 at 11:34 AM, Alan Braithwaite <a...@cloudflare.com>
>>>> wrote:
>>>>
>>>>> Hey All,
>>>>>
>>>>> Using spark with mesos and docker.
>>>>>
>>>>> I'm wondering if anybody's seen the behavior of spark dispatcher where
>>>>> it just continually requests resources and immediately declines the offer.
>>>>>
>>>>> https://gist.github.com/anonymous/41e7c91899b0122b91a7
>>>>>
>>>>> I'm trying to debug some issues with spark and I'm having trouble
>>>>> figuring out if this is part of the problem or if it's safe to ignore it.
>>>>>
>>>>> Any help or pointers would be appreciated.
>>>>>
>>>>> Thanks!
>>>>> - Alan
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Weird Spark Dispatcher Offers?

2015-10-02 Thread Tim Chen

So if there is no jobs to run the dispatcher will decline all offers by
default.

Also we list all the jobs enqueued and it's specifications in the Spark
dispatcher UI, you should see the port in the dispatcher logs itself.

Tim

On Fri, Oct 2, 2015 at 11:46 AM, Alan Braithwaite <a...@cloudflare.com>
wrote:

> This happened right after blowing away /var/lib/mesos zk://mesos and
> zk://spark_mesos_dispatcher and before I've submitted anything new to it so
> I _shouldn't_ have anything enqueued.  Unless there's state being stored
> somewhere besides those places that I don't know about.
>
> I'm not sure what the resource specifications are for this one because I
> didn't submit it directly.  If you have a way for me to grab a specific
> offer configuration, I'd be delighted to provide it.  I just can't seem to
> figure out how to get that information after digging through the mesos docs
> :-(
>
> Also, I can't read the docker logs because:
>
> Oct 02 11:39:59 sparky docker[556]:
> time="2015-10-02T11:39:59.165474049-07:00" level=error msg="Error streaming
> logs: invalid character '\\x00' looking for beginning of value"
>
> (that's coming from the spark-dispatcher docker).
>
> Thanks!
> - Alan
>
> On Fri, Oct 2, 2015 at 11:36 AM, Tim Chen <t...@mesosphere.io> wrote:
>
>> Do you have jobs enqueued? And if none of the jobs matches any offer it
>> will just decline it.
>>
>> What's your job resource specifications?
>>
>> Tim
>>
>> On Fri, Oct 2, 2015 at 11:34 AM, Alan Braithwaite <a...@cloudflare.com>
>> wrote:
>>
>>> Hey All,
>>>
>>> Using spark with mesos and docker.
>>>
>>> I'm wondering if anybody's seen the behavior of spark dispatcher where
>>> it just continually requests resources and immediately declines the offer.
>>>
>>> https://gist.github.com/anonymous/41e7c91899b0122b91a7
>>>
>>> I'm trying to debug some issues with spark and I'm having trouble
>>> figuring out if this is part of the problem or if it's safe to ignore it.
>>>
>>> Any help or pointers would be appreciated.
>>>
>>> Thanks!
>>> - Alan
>>>
>>
>>
>

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-01 Thread Tim Chen

Hi Utkarsh,

I replied earlier asking what is your task assignment like with fine vs
coarse grain mode look like?

Tim

On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:

> Bumping it up, its not really a blocking issue.
> But fine grain mode eats up uncertain number of resources in mesos and
> launches tons of tasks, so I would prefer using the coarse grained mode if
> only it didn't run out of memory.
>
> Thanks,
> -Utkarsh
>
> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
> wrote:
>
>> Hi Tim,
>>
>> 1. spark.mesos.coarse:false (fine grain mode)
>> This is the data dump for config and executors assigned:
>> https://gist.github.com/utkarsh2012/6401d5526feccab14687
>>
>> 2. spark.mesos.coarse:true (coarse grain mode)
>> Dump for coarse mode:
>> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188
>>
>> As you can see, exactly the same code works fine in fine grained, goes
>> out of memory in coarse grained mode. First an executor was lost and then
>> the driver went out of memory.
>> So I am trying to understand what is different in fine grained vs coarse
>> mode other than allocation of multiple mesos tasks vs 1 mesos task. Clearly
>> spark is not managing memory in the same way.
>>
>> Thanks,
>> -Utkarsh
>>
>>
>> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> Hi Utkarsh,
>>>
>>> What is your job placement like when you run fine grain mode? You said
>>> coarse grain mode only ran with one node right?
>>>
>>> And when the job is running could you open the Spark webui and get stats
>>> about the heap size and other java settings?
>>>
>>> Tim
>>>
>>> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
>>> wrote:
>>>
>>>> Bumping this one up, any suggestions on the stacktrace?
>>>> spark.mesos.coarse=true is not working and the driver crashed with the
>>>> error.
>>>>
>>>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Missed to do a reply-all.
>>>>>
>>>>> Tim,
>>>>>
>>>>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
>>>>> works (sorry there was a typo in my last email, I meant "when I do
>>>>> "spark.mesos.coarse=false", the job works like a charm. ").
>>>>>
>>>>> I get this exception with spark.mesos.coarse = true:
>>>>>
>>>>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>>>>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>>>>> "55af5a61e8a42806f47546c1"}
>>>>>
>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611337>15/09/22
>>>>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
>>>>> "55af5a61e8a42806f47546c1"}, max= null
>>>>>
>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611453>Exception
>>>>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611524>
>>>>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>>>
>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611599>
>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>
>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611671>
>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>
>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-u

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-25 Thread Tim Chen

il/stderr#615056>I0922
>> 20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615125>I0922
>> 20:18:17.794739 143 sched.cpp:835] Stopping framework
>> '20150803-224832-1577534986-5050-1614-0016'
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615231>15/09/22
>> 20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned with code
>> DRIVER_STOPPED
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615330>15/09/22
>> 20:18:17 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615425>15/09/22
>> 20:18:17 INFO Utils: path =
>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252/blockmgr-0e0e1a1c-894e-4e79-beac-ead0dff43166,
>> already present as root for deletion.
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615595>15/09/22
>> 20:18:17 INFO MemoryStore: MemoryStore cleared
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615651>15/09/22
>> 20:18:17 INFO BlockManager: BlockManager stopped
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615709>15/09/22
>> 20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615779>15/09/22
>> 20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>> OutputCommitCoordinator stopped!
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615892>15/09/22
>> 20:18:17 INFO SparkContext: Successfully stopped SparkContext
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615963>15/09/22
>> 20:18:17 INFO Utils: Shutdown hook called
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616014>15/09/22
>> 20:18:17 INFO Utils: Deleting directory
>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616111>15/09/22
>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down
>> remote daemon.
>>
>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616206>15/09/22
>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
>> down; proceeding with flushing remote transports.
>>
>>
>>
>>
>> On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> Hi Utkarsh,
>>>
>>> Just to be sure you originally set coarse to false but then to true? Or
>>> is it the other way around?
>>>
>>> Also what's the exception/stack trace when the driver crashed?
>>>
>>> Coarse grain mode per-starts all the Spark executor backends, so has the
>>> least overhead comparing to fine grain. There is no single answer for which
>>> mode you should use, otherwise we would have removed one of those modes
>>> since it depends on your use case.
>>>
>>> There are quite some factor why there could be huge G

Re: Mesos Tasks only run on one node

2015-09-22 Thread Tim Chen

What configuration have you used, and what are the slaves configuration?

Possiblity all other nodes either don't have enough resources, are is using
a another role that's preventing from the executor to be launched.

Tim

On Mon, Sep 21, 2015 at 1:58 PM, John Omernik  wrote:

> I have a happy healthy Mesos cluster (0.24) running in my lab.  I've
> compiled spark-1.5.0 and it seems to be working fine, except for one small
> issue, my tasks all seem to run on one node. (I have 6 in the cluster).
>
> Basically, I have directory of compressed text files.  Compressed, these
> 25 files add up to 1.2 GB of data, in bin/pyspark I do:
>
> txtfiles = sc.textFile("/path/to/my/data/*")
> txtfiles.count()
>
> This goes through and gives me the correct count, but all my tasks (25 of
> them) run on one node, let's call it node4.
>
> Interesting.
>
> So I was running spark from node4, but I would have thought it would have
> hit up more nodes.
>
> So I ran it on node5.  In executors tab on the spark UI, there is only one
> registered, and it's node4, and once again all tasks ran on node4.
>
> I am running in fine grain mode... is there a setting somewhere to allow
> for more executors? This seems weird. I've been away from Spark from 1.2.x
> but I don't seem to remember this...
>
>
>

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-22 Thread Tim Chen

Hi Utkarsh,

Just to be sure you originally set coarse to false but then to true? Or is
it the other way around?

Also what's the exception/stack trace when the driver crashed?

Coarse grain mode per-starts all the Spark executor backends, so has the
least overhead comparing to fine grain. There is no single answer for which
mode you should use, otherwise we would have removed one of those modes
since it depends on your use case.

There are quite some factor why there could be huge GC pauses, but I don't
think if you switch to standalone your GC pauses go away.

Tim

On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar 
wrote:

> I am running Spark 1.4.1 on mesos.
>
> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of
> size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
>
> Creation of "aRdd" needs data pull from multiple data sources, merging it
> and creating a tuple of JavaRdd, finally aRDD looks something like this:
> JavaRDD>
> bRdd, cRdd and dRdds are just List<> of values.
>
> Then apply a transformation on prouctRDD and finally call "saveAsTextFile"
> to save the result of my transformation.
>
> Problem:
> By setting "spark.mesos.coarse=true", creation of "aRdd" works fine but
> driver crashes while doing the cartesian but when I do
> "spark.mesos.coarse=true", the job works like a charm. I am running spark
> on mesos.
>
> Comments:
> So I wanted to understand what role does "spark.mesos.coarse=true" plays
> in terms of memory and compute performance. My findings look counter
> intuitive since:
>
>1. "spark.mesos.coarse=true" just runs on 1 mesos task, so there
>should be an overhead of spinning up mesos tasks which should impact the
>performance.
>2. What config for "spark.mesos.coarse" recommended for running spark
>on mesos? Or there is no best answer and it depends on usecase?
>3. Also by setting "spark.mesos.coarse=true", I notice that I get huge
>GC pauses even with small dataset but a long running job (but this can be a
>separate discussion).
>
> Let me know if I am missing something obvious, we are learning spark
> tuning as we move forward :)
>
> --
> Thanks,
> -Utkarsh
>

Re: Python Packages in Spark w/Mesos

2015-09-21 Thread Tim Chen

Hi John,

Sorry haven't get time to respond to your questions over the weekend.

If you're running client mode, to use the Docker/Mesos integration
minimally you just need to set the image configuration
'spark.mesos.executor.docker.image' as stated in the documentation, which
Spark will use this image to run each Spark executor.

Therefore, if you want to include your python dependencies, you can also
pre-install them in that image and it should be able to find it if you set
the PYTHON env variables pointing to those. I'm not very familiar with
python, but I recently got Mesos cluster mode with python to work and it's
merged into master.

Tim

On Mon, Sep 21, 2015 at 8:34 AM, John Omernik  wrote:

> Hey all -
>
> Curious at the best way to include python packages in my Spark
> installation. (Such as NLTK). Basically I am running on Mesos, and would
> like to find a way to include the package in the binary distribution in
> that I don't want to install packages on all nodes.  We should be able to
> include in the distribution, right?.
>
> I thought of using the Docker Mesos integration, but I have been unable to
> find information on this (see my other question on Docker/Mesos/Spark).
> Any other thoughts on the best way to include packages in Spark WITHOUT
> installing on each node would be appreciated!
>
> John
>

Re: Docker/Mesos with Spark

2015-09-21 Thread Tim Chen

Hi John,

There is no other blog post yet, I'm thinking to do a series of posts but
so far haven't get time to do that yet.

Running Spark in docker containers makes distributing spark versions easy,
it's simple to upgrade and automatically caches on the slaves so the same
image just runs right away. Most of the docker perf is usually related to
network and filesystem overheads, but I think with recent changes in Spark
to make Mesos sandbox the default temp dir filesystem won't be a big
concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
uses host network by default so network is affected much.

Most of the cluster mode limitation is that you need to make the spark job
files available somewhere that all the slaves can access remotely (http,
s3, hdfs, etc) or available on all slaves locally by path.

I'll try to make more doc efforts once I get my existing patches and
testing infra work done.

Let me know if you have more questions,

Tim

On Sat, Sep 19, 2015 at 5:42 AM, John Omernik  wrote:

> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and
> just found you CAN run it this way.  Are there any user posts, blog posts,
> etc on why and how you'd do this?
>
> Basically, at first I was questioning why you'd run spark in a docker
> container, i.e., if you run with tar balled executor, what are you really
> gaining?  And in this setup, are you losing out on performance somehow? (I
> am guessing smarter people than I have figured that out).
>
> Then I came along a situation where I wanted to use a python library with
> spark, and it had to be installed on every node, and I realized one big
> advantage of dockerized spark would be that spark apps that needed other
> libraries could be contained and built well.
>
> OK, that's huge, let's do that.  For my next question there are lot of
> "questions" have on how this actually works.  Does Clustermode/client mode
> apply here? If so, how?  Is there a good walk through on getting this
> setup? Limitations? Gotchas?  Should I just dive in an start working with
> it? Has anyone done any stories/rough documentation? This seems like a
> really helpful feature to scaling out spark, and letting developers truly
> build what they need without tons of admin overhead, so I really want to
> explore.
>
> Thanks!
>
> John
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-19 Thread Tim Chen

I guess I need a bit more clarification, what kind of assumptions was the
dispatcher making?

Tim


On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com>
wrote:

> Hi Tim,
>
> Thanks for the follow up.  It's not so much that I expect the executor to
> inherit the configuration of the dispatcher as I* don't *expect the
> dispatcher to make assumptions about the system environment of the executor
> (since it lives in a docker).  I could potentially see a case where you
> might want to explicitly forbid the defaults, but I can't think of any
> right now.
>
> Otherwise, I'm confused as to why the defaults in the docker image for the
> executor are just ignored.  I suppose that it's the dispatchers job to
> ensure the *exact* configuration of the executor, regardless of the
> defaults set on the executors machine?  Is that the assumption being made?
> I can understand that in contexts which aren't docker driven since jobs
> could be rolling out in the middle of a config update.  Trying to think of
> this outside the terms of just mesos/docker (since I'm fully aware that
> docker doesn't rule the world yet).
>
> So I can see this from both perspectives now and passing in the properties
> file will probably work just fine for me, but for my better understanding:
> When the executor starts, will it read any of the environment that it's
> executing in or will it just take only the properties given to it by the
> dispatcher and nothing more?
>
> Lemme know if anything needs more clarification and thanks for your mesos
> contribution to spark!
>
> - Alan
>
> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote:
>
>> Hi Alan,
>>
>> If I understand correctly, you are setting executor home when you launch
>> the dispatcher and not on the configuration when you submit job, and expect
>> it to inherit that configuration?
>>
>> When I worked on the dispatcher I was assuming all configuration is
>> passed to the dispatcher to launch the job exactly how you will need to
>> launch it with client mode.
>>
>> But indeed it shouldn't crash dispatcher, I'll take a closer look when I
>> get a chance.
>>
>> Can you recommend changes on the documentation, either in email or a PR?
>>
>> Thanks!
>>
>> Tim
>>
>> Sent from my iPhone
>>
>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com>
>> wrote:
>>
>> Hey All,
>>
>> To bump this thread once again, I'm having some trouble using the
>> dispatcher as well.
>>
>> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed the
>> dispatcher as Marathon job.  When I submit a job using spark submit, the
>> dispatcher writes back that the submission was successful and then promptly
>> dies in marathon.  Looking at the logs reveals it was hitting the following
>> line:
>>
>> 398:  throw new SparkException("Executor Spark home
>> `spark.mesos.executor.home` is not set!")
>>
>> Which is odd because it's set in multiple places (SPARK_HOME,
>> spark.mesos.executor.home, spark.home, etc).  Reading the code, it
>> appears that the driver desc pulls only from the request and disregards any
>> other properties that may be configured.  Testing by passing --conf
>> spark.mesos.executor.home=/usr/local/spark on the command line to
>> spark-submit confirms this.  We're trying to isolate the number of places
>> where we have to set properties within spark and were hoping that it will
>> be possible to have this pull in the spark-defaults.conf from somewhere, or
>> at least allow the user to inform the dispatcher through spark-submit that
>> those properties will be available once the job starts.
>>
>> Finally, I don't think the dispatcher should crash in this event.  It
>> seems not exceptional that a job is misconfigured when submitted.
>>
>> Please direct me on the right path if I'm headed in the wrong direction.
>> Also let me know if I should open some tickets for these issues.
>>
>> Thanks,
>> - Alan
>>
>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> Yes you can create an issue, or actually contribute a patch to update it
>>> :)
>>>
>>> Sorry the docs is a bit light, I'm going to make it more complete along
>>> the way.
>>>
>>> Tim
>>>
>>>
>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
>>> tomwa...@cisco.com> wrote:
>>>
>>>> Tim,
>>>>
>>>> Than

Re: Trouble using dynamic allocation and shuffle service.

2015-09-14 Thread Tim Chen

Hi Philip,

I've included documentation in the Spark/Mesos doc (
http://spark.apache.org/docs/latest/running-on-mesos.html), where you can
start the MesosShuffleService with sbin/start-mesos-shuffle-service.sh
script.

The shuffle service needs to be started manually for Mesos on each slave
(one way is via Marathon with unique hostname constraint), and then you
need to enable dynamicAllocation and shuffle service flag on the driver and
it should work.

Let me know if that's not clear.

Tim

On Mon, Sep 14, 2015 at 11:36 AM, Philip Weaver 
wrote:

> Hello, I am trying to use dynamic allocation which requires the shuffle
> service. I am running Spark on mesos.
>
> Whenever I set spark.shuffle.service.enabled=true, my Spark driver fails
> with an error like this:
>
> Caused by: java.net.ConnectException: Connection refused: devspark1/
> 172.26.21.70:7337
>
> It's not clear from the documentation if the shuffle service starts
> automatically just by having it enabled, or if I need to do something else.
> There are instructions for running the shuffle service in YARN, but not
> mesos.
>
> - Philip
>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-11 Thread Tim Chen

Yes you can create an issue, or actually contribute a patch to update it :)

Sorry the docs is a bit light, I'm going to make it more complete along the
way.

Tim


On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
tomwa...@cisco.com> wrote:

> Tim,
>
> Thank you for the explanation.  You are correct, my Mesos experience is
> very light, and I haven’t deployed anything via Marathon yet.  What you
> have stated here makes sense, I will look into doing this.
>
> Adding this info to the docs would be great.  Is the appropriate action to
> create an issue regarding improvement of the docs?  For those of us who are
> gaining the experience having such a pointer is very helpful.
>
> Tom
>
> From: Tim Chen <t...@mesosphere.io>
> Date: Thursday, September 10, 2015 at 10:25 AM
> To: Tom Waterhouse <tomwa...@cisco.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>
> Hi Tom,
>
> Sorry the documentation isn't really rich, since it's probably assuming
> users understands how Mesos and framework works.
>
> First I need explain the rationale of why create the dispatcher. If you're
> not familiar with Mesos yet, each node in your datacenter is installed a
> Mesos slave where it's responsible for publishing resources and
> running/watching tasks, and Mesos master is responsible for taking the
> aggregated resources and scheduling them among frameworks.
>
> Frameworks are not managed by Mesos, as Mesos master/slave doesn't launch
> and maintain framework but assume they're launched and kept running on its
> own. All the existing frameworks in the ecosystem therefore all have their
> own ways to deploy, HA and persist state (e.g: Aurora, Marathon, etc).
>
> Therefore, to introduce cluster mode with Mesos, we must create a
> framework that is long running that can be running in your datacenter, and
> can handle launching spark drivers on demand and handle HA, etc. This is
> what the dispatcher is all about.
>
> So the idea is that you should launch the dispatcher not on the client,
> but on a machine in your datacenter. In Mesosphere's DCOS we launch all
> frameworks and long running services with Marathon, and you can use
> Marathon to launch the Spark dispatcher.
>
> Then all clients instead of specifying the Mesos master URL (e.g:
> mesos://mesos.master:2181), then just talks to the dispatcher only
> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start
> and watch the driver for you.
>
> Tim
>
>
>
> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
> tomwa...@cisco.com> wrote:
>
>> After spending most of yesterday scouring the Internet for sources of
>> documentation for submitting Spark jobs in cluster mode to a Spark cluster
>> managed by Mesos I was able to do just that, but I am not convinced that
>> how I have things setup is correct.
>>
>> I used the Mesos published
>> <https://open.mesosphere.com/getting-started/datacenter/install/>
>> instructions for setting up my Mesos cluster.  I have three Zookeeper
>> instances, three Mesos master instances, and three Mesos slave instances.
>> This is all running in Openstack.
>>
>> The documentation on the Spark documentation site states that “To use
>> cluster mode, you must start the MesosClusterDispatcher in your cluster via
>> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master
>> url (e.g: mesos://host:5050).”  That is it, no more information than
>> that.  So that is what I did: I have one machine that I use as the Spark
>> client for submitting jobs.  I started the Mesos dispatcher with script as
>> described, and using the client machine’s IP address and port as the target
>> for the job submitted the job.
>>
>> The job is currently running in Mesos as expected.  This is not however
>> how I would have expected to configure the system.  As running there is one
>> instance of the Spark Mesos dispatcher running outside of Mesos, so not a
>> part of the sphere of Mesos resource management.
>>
>> I used the following Stack Overflow posts as guidelines:
>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>
>> There must be better documentation on how to deploy Spark in Mesos with
>> jobs able to be deployed in cluster mode.
>>
>> I can follow up with more specific information regarding my deployment
>> if necessary.
>>
>> Tom
>>
>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-10 Thread Tim Chen

Hi Tom,

Sorry the documentation isn't really rich, since it's probably assuming
users understands how Mesos and framework works.

First I need explain the rationale of why create the dispatcher. If you're
not familiar with Mesos yet, each node in your datacenter is installed a
Mesos slave where it's responsible for publishing resources and
running/watching tasks, and Mesos master is responsible for taking the
aggregated resources and scheduling them among frameworks.

Frameworks are not managed by Mesos, as Mesos master/slave doesn't launch
and maintain framework but assume they're launched and kept running on its
own. All the existing frameworks in the ecosystem therefore all have their
own ways to deploy, HA and persist state (e.g: Aurora, Marathon, etc).

Therefore, to introduce cluster mode with Mesos, we must create a framework
that is long running that can be running in your datacenter, and can handle
launching spark drivers on demand and handle HA, etc. This is what the
dispatcher is all about.

So the idea is that you should launch the dispatcher not on the client, but
on a machine in your datacenter. In Mesosphere's DCOS we launch all
frameworks and long running services with Marathon, and you can use
Marathon to launch the Spark dispatcher.

Then all clients instead of specifying the Mesos master URL (e.g:
mesos://mesos.master:2181), then just talks to the dispatcher only
(mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start
and watch the driver for you.

Tim

On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
tomwa...@cisco.com> wrote:

> After spending most of yesterday scouring the Internet for sources of
> documentation for submitting Spark jobs in cluster mode to a Spark cluster
> managed by Mesos I was able to do just that, but I am not convinced that
> how I have things setup is correct.
>
> I used the Mesos published
> 
> instructions for setting up my Mesos cluster.  I have three Zookeeper
> instances, three Mesos master instances, and three Mesos slave instances.
> This is all running in Openstack.
>
> The documentation on the Spark documentation site states that “To use
> cluster mode, you must start the MesosClusterDispatcher in your cluster via
> the sbin/start-mesos-dispatcher.sh script, passing in the Mesos master
> url (e.g: mesos://host:5050).”  That is it, no more information than
> that.  So that is what I did: I have one machine that I use as the Spark
> client for submitting jobs.  I started the Mesos dispatcher with script as
> described, and using the client machine’s IP address and port as the target
> for the job submitted the job.
>
> The job is currently running in Mesos as expected.  This is not however
> how I would have expected to configure the system.  As running there is one
> instance of the Spark Mesos dispatcher running outside of Mesos, so not a
> part of the sphere of Mesos resource management.
>
> I used the following Stack Overflow posts as guidelines:
> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>
> There must be better documentation on how to deploy Spark in Mesos with
> jobs able to be deployed in cluster mode.
>
> I can follow up with more specific information regarding my deployment if
> necessary.
>
> Tom
>

Re: JNI issues with mesos

2015-09-09 Thread Tim Chen

Hi Adrian,

Spark is expecting a specific naming of the tgz and also the folder name
inside, as this is generated by running make-distribution.sh --tgz in the
Spark source folder.

If you use a Spark 1.4 tgz generated with that script with the same name
and upload to HDFS again, fix the URI then it should work.

Tim

On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett 
wrote:

> 5mins later...
>
> Trying 1.5 with a fairly plain build:
> ./make-distribution.sh --tgz --name os1 -Phadoop-2.6
>
> and on my first attempt stderr showed:
> I0909 15:16:49.392144  1619 fetcher.cpp:441] Fetched
> 'hdfs:///apps/spark/spark15.tgz' to
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S1/frameworks/20150826-133446-3217621258-5050-4064-211204/executors/20150826-133446-3217621258-5050-4064-S1/runs/43026ba8-6624-4817-912c-3d7573433102/spark15.tgz'
> sh: 1: cd: can't cd to spark15.tgz
> sh: 1: ./bin/spark-class: not found
>
> Aha, let's rename the file in hdfs (and the two configs) from spark15.tgz
> to spark-1.5.0-bin-os1.tgz...
> Success!!!
>
> The same trick with 1.4 doesn't work, but now that I have something that
> does I can make progress.
>
> Hopefully this helps someone else :-)
>
> Adrian
>
>
> On 09/09/2015 16:59, Adrian Bridgett wrote:
>
> I'm trying to run spark (1.4.1) on top of mesos (0.23).  I've followed the
> instructions (uploaded spark tarball to HDFS, set executor uri in both
> places etc) and yet on the slaves it's failing to lauch even the SparkPi
> example with a JNI error.  It does run with a local master.  A day of
> debugging later and it's time to ask for help!
>
>  bin/spark-submit --master mesos://10.1.201.191:5050 --class
> org.apache.spark.examples.SparkPi /tmp/examples.jar
>
> (I'm putting the jar outside hdfs  - on both client box + slave (turned
> off other slaves for debugging) - due to
> 
> http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html.
> I should note that I had the same JNI errors when using the mesos cluster
> dispatcher).
>
> I'm using Oracle Java 8 (no other java - even openjdk - is installed)
>
> As you can see, the slave is downloading the framework fine (you can even
> see it extracted on the slave).  Can anyone shed some light on what's going
> on - e.g. how is it attempting to run the executor?
>
> I'm going to try a different JVM (and try a custom spark distribution) but
> I suspect that the problem is much more basic. Maybe it can't find the
> hadoop native libs?
>
> Any light would be much appreciated :)  I've included the slaves's stderr
> below:
>
> I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR
> I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"}
> I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI
> 'hdfs:///apps/spark/spark.tgz'
> I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly into the
> sandbox directory
> I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI
> 'hdfs:///apps/spark/spark.tgz'
> I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource with
> Hadoop client from 'hdfs:///apps/spark/spark.tgz' to
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with command: tar
> -C
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
> -xf
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> into
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
> W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of extracting
> resource from URI with 'extract'

Re: Controlling number of executors on Mesos vs YARN

2015-08-12 Thread Tim Chen

You're referring to both fine grain and coarse grain?

Desirable number of executors per node could be interesting but it can't be
guaranteed (or we could try to and when failed abort the job).

How would you imagine this new option to actually work?

Tim

On Wed, Aug 12, 2015 at 11:48 AM, Ajay Singal asinga...@gmail.com wrote:

Hi Tim,

An option like spark.mesos.executor.max to cap the number of executors
per node/application would be very useful. However, having an option like
spark.mesos.executor.num
to specify desirable number of executors per node would provide even/much
better control.

Thanks,
Ajay

On Wed, Aug 12, 2015 at 4:18 AM, Tim Chen t...@mesosphere.io wrote:

Yes the options are not that configurable yet but I think it's not hard
to change it.

I have a patch out actually specifically able to configure amount of cpus
per executor in coarse grain mode, and hopefully merged next release.

I think the open question now is for fine grain mode can we limit the
number of maximum concurrent executors, and I think we can definitely just
add a new option like spark.mesos.executor.max to cap it.

I'll file a jira and hopefully to get this change in soon too.

Tim

On Tue, Aug 11, 2015 at 6:21 AM, Haripriya Ayyalasomayajula
aharipriy...@gmail.com wrote:

Spark evolved as an example framework for Mesos - thats how I know it.
It is surprising to see that the options provided by mesos in this case are
less. Tweaking the source code, haven't done it yet but I would love to see
what options could be there!

On Tue, Aug 11, 2015 at 5:42 AM, Jerry Lam chiling...@gmail.com wrote:

My experience with Mesos + Spark is not great. I saw one executor with
30 CPU and the other executor with 6. So I don't think you can easily
configure it without some tweaking at the source code.

Sent from my iPad

On 2015-08-11, at 2:38, Haripriya Ayyalasomayajula
aharipriy...@gmail.com wrote:

Hi Tim,

Spark on Yarn allows us to do it using --num-executors and
--executor_cores commandline arguments. I just got a chance to look at a
similar spark user list mail, but no answer yet. So does mesos allow
setting the number of executors and cores? Is there a default number it
assumes?

On Mon, Jan 5, 2015 at 5:07 PM, Tim Chen t...@mesosphere.io wrote:

Forgot to hit reply-all.

-- Forwarded message --
From: Tim Chen t...@mesosphere.io
Date: Sun, Jan 4, 2015 at 10:46 PM
Subject: Re: Controlling number of executors on Mesos vs YARN
To: mvle m...@us.ibm.com

Hi Mike,

You're correct there is no such setting in for Mesos coarse grain
mode, since the assumption is that each node is launched with one
container
and Spark is launching multiple tasks in that container.

In fine-grain mode there isn't a setting like that, as it currently
will launch an executor as long as it satisfies the minimum container
resource requirement.

I've created a JIRA earlier about capping the number of executors or
better distribute the # of executors launched in each node. Since the
decision of choosing what node to launch containers is all in the Spark
scheduler side, it's very easy to modify it.

Btw, what's the configuration to set the # of executors on YARN side?

Thanks,

Tim

On Sun, Jan 4, 2015 at 9:37 PM, mvle m...@us.ibm.com wrote:

I'm trying to compare the performance of Spark running on Mesos vs
YARN.
However, I am having problems being able to configure the Spark
workload to
run in a similar way on Mesos and YARN.

When running Spark on YARN, you can specify the number of executors
per
node. So if I have a node with 4 CPUs, I can specify 6 executors on
that
node. When running Spark on Mesos, there doesn't seem to be an
equivalent
way to specify this. In Mesos, you can somewhat force this by
specifying the
number of CPU resources to be 6 when running the slave daemon.
However, this
seems to be a static configuration of the Mesos cluster rather
something
that can be configured in the Spark framework.

So here is my question:

For Spark on Mesos, am I correct that there is no way to control the
number
of executors per node (assuming an idle cluster)? For Spark on Mesos
coarse-grained mode, there is a way to specify max_cores but that is
still
not equivalent to specifying the number of executors per node as when
Spark
is run on YARN.

If I am correct, then it seems Spark might be at a disadvantage
running on
Mesos compared to YARN (since it lacks the fine tuning ability
provided by
YARN).

Thanks,
Mike

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Controlling-number-of-executors-on-Mesos-vs-YARN-tp20966.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

--
Regards

Re: Spark - Standalone Vs YARN Vs Mesos

2015-08-12 Thread Tim Chen

I'm not sure what you're looking for, since you can't really compare
Standalone with YARN or Mesos, as Standalone is assuming the Spark
workers/master owns the cluster, and YARN/Mesos is trying to share the
cluster among different applications/frameworks.

And when you refer to resource utilization, what exactly does it mean to
you? Is it the ability to maximize the usage of your resources with
multiple applications in mind, or just how much configuration Spark allows
you to in each mode?

Tim

On Wed, Aug 12, 2015 at 2:16 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:

 Do we have any comparisons in terms of resource utilization, scheduling of
 running Spark in the below three modes
 1) Standalone
 2) over YARN
 3) over Mesos

 Can some one share resources (thoughts/URLs) on this area.


 --
 Deepak

Re: Controlling number of executors on Mesos vs YARN

2015-08-12 Thread Tim Chen

Yes the options are not that configurable yet but I think it's not hard to
change it.

I have a patch out actually specifically able to configure amount of cpus
per executor in coarse grain mode, and hopefully merged next release.

I'll file a jira and hopefully to get this change in soon too.

Tim

On Tue, Aug 11, 2015 at 6:21 AM, Haripriya Ayyalasomayajula
aharipriy...@gmail.com wrote:

Spark evolved as an example framework for Mesos - thats how I know it. It
is surprising to see that the options provided by mesos in this case are
less. Tweaking the source code, haven't done it yet but I would love to see
what options could be there!

On Tue, Aug 11, 2015 at 5:42 AM, Jerry Lam chiling...@gmail.com wrote:

My experience with Mesos + Spark is not great. I saw one executor with 30
CPU and the other executor with 6. So I don't think you can easily
configure it without some tweaking at the source code.

Sent from my iPad

On 2015-08-11, at 2:38, Haripriya Ayyalasomayajula
aharipriy...@gmail.com wrote:

Hi Tim,

On Mon, Jan 5, 2015 at 5:07 PM, Tim Chen t...@mesosphere.io wrote:

Forgot to hit reply-all.

-- Forwarded message --
From: Tim Chen t...@mesosphere.io
Date: Sun, Jan 4, 2015 at 10:46 PM
Subject: Re: Controlling number of executors on Mesos vs YARN
To: mvle m...@us.ibm.com

Hi Mike,

You're correct there is no such setting in for Mesos coarse grain mode,
since the assumption is that each node is launched with one container and
Spark is launching multiple tasks in that container.

In fine-grain mode there isn't a setting like that, as it currently will
launch an executor as long as it satisfies the minimum container resource
requirement.

Btw, what's the configuration to set the # of executors on YARN side?

Thanks,

Tim

On Sun, Jan 4, 2015 at 9:37 PM, mvle m...@us.ibm.com wrote:

I'm trying to compare the performance of Spark running on Mesos vs YARN.
However, I am having problems being able to configure the Spark
workload to
run in a similar way on Mesos and YARN.

When running Spark on YARN, you can specify the number of executors per
node. So if I have a node with 4 CPUs, I can specify 6 executors on that
node. When running Spark on Mesos, there doesn't seem to be an
equivalent
way to specify this. In Mesos, you can somewhat force this by
specifying the
number of CPU resources to be 6 when running the slave daemon. However,
this
seems to be a static configuration of the Mesos cluster rather something
that can be configured in the Spark framework.

So here is my question:

If I am correct, then it seems Spark might be at a disadvantage running
on
Mesos compared to YARN (since it lacks the fine tuning ability provided
by
YARN).

Thanks,
Mike

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

--
Regards,
Haripriya Ayyalasomayajula

Re: Spark WebUI link problem in Mesos Master

2015-07-29 Thread Tim Chen

Hi Anton,

Client mode we haven't populated the webui link and only did so for cluster
mode.

If you like you can open a JIRA and it should be a easy ticket for anyone
to work on.

Tim

On Wed, Jul 29, 2015 at 4:27 AM, Anton Kirillov antonv.kiril...@gmail.com
wrote:

  Hi everyone,

 I’m trying to get access to Spark web UI from Mesos Master but with no
 success: the host name displayed properly, but the link is not active, just
 text. Maybe it’s a well-known issue or I misconfigured something, but this
 problem is really annoying.

 Jobs are executed in client mode, framework registers successfully and
 execution happens as expected. Spark UI is available on client node but in
 Mesos Master’s interface hostname is just text. When launching in cluster
 mode (dispatcher is already launched) there’s only drivers information
 available with no reference to driver UI.

 I’m using Mesos 0.23.0, Spark 1.4.1 with binary executor (I’m using
 precompiled one) deployed to S3. Here’s the contents of spark-env.sh:

 export MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so
 export SPARK_EXECUTOR_URI=https://S3 Path/spark-1.4.1-bin-hadoop2.6.tgz
 export MASTER=mesos://zk://zookeeper hosts/mesos

 export SPARK_PUBLIC_DNS=client machine EC2 public DNS name

 Maybe I miss something, but really appreciate any help!

 --
 Anton Kirillov
 Sent with Sparrow http://www.sparrowmailapp.com/?sig

Re: Spark on Mesos - Shut down failed while running spark-shell

2015-07-28 Thread Tim Chen

Hi Haripriya,

Your master has registered it's public ip to be 127.0.0.1:5050 which won't
be able to be reached by the slave node.

If mesos didn't pick up the right ip you can specifiy one yourself via the
--ip flag.

Tim

On Mon, Jul 27, 2015 at 8:32 PM, Haripriya Ayyalasomayajula 
aharipriy...@gmail.com wrote:

 Hi all,

 I am running Spark 1.4.1 on mesos 0.23.0


 While I am able to start spark-shell on the node with mesos-master
 running, it works fine. But when I try to start spark-shell on mesos-slave
 nodes, I'm encounter this error. I greatly appreciate any help.



 15/07/27 22:14:44 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.

 15/07/27 22:14:44 INFO SparkUI: Started SparkUI at
 http://10.142.0.140:4040

 Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
 instead. Future releases will not support JNI bindings via
 MESOS_NATIVE_LIBRARY.

 Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
 instead. Future releases will not support JNI bindings via
 MESOS_NATIVE_LIBRARY.

 WARNING: Logging before InitGoogleLogging() is written to STDERR

 W0727 22:14:45.091286 33441 sched.cpp:1326]

 **

 Scheduler driver bound to loopback interface! Cannot communicate with
 remote master(s). You might want to set 'LIBPROCESS_IP' environment
 variable to use a routable IP address.

 **

 2015-07-27 22:14:45,091:33222(0x7fff9e1fc700):ZOO_INFO@log_env@712:
 Client environment:zookeeper.version=zookeeper C client 3.4.5

 2015-07-27 22:14:45,091:33222(0x7fff9e1fc700):ZOO_INFO@log_env@716:
 Client environment:host.name=nid00011

 I0727 22:14:45.091995 33441 sched.cpp:157] Version: 0.23.0

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@log_env@723:
 Client environment:os.name=Linux

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@log_env@724:
 Client environment:os.arch=2.6.32-431.el6_1..8785-cray_ari_athena_c_cos

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@log_env@725:
 Client environment:os.version=#1 SMP Wed Jun 24 19:34:50 UTC 2015

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@log_env@733:
 Client environment:user.name=root

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@log_env@741:
 Client environment:user.home=/root

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@log_env@753:
 Client environment:user.dir=/opt/spark-1.4.1/spark-source

 2015-07-27 22:14:45,092:33222(0x7fff9e1fc700):ZOO_INFO@zookeeper_init@786:
 Initiating client connection, host=192.168.0.10:2181 sessionTimeout=1
 watcher=0x7fffb561a8e0 sessionId=0
 sessionPasswd=nullcontext=0x7ffdd930 flags=0

 2015-07-27 22:14:45,092:33222(0x7fff6ebfd700):ZOO_INFO@check_events@1703:
 initiated connection to server [192.168.0.10:2181]

 2015-07-27 22:14:45,096:33222(0x7fff6ebfd700):ZOO_INFO@check_events@1750:
 session establishment complete on server [192.168.0.10:2181],
 sessionId=0x14ed296a0fd000a, negotiated timeout=1

 I0727 22:14:45.096891 33479 group.cpp:313] Group process (group(1)@
 127.0.0.1:45546) connected to ZooKeeper

 I0727 22:14:45.096914 33479 group.cpp:787] Syncing group operations: queue
 size (joins, cancels, datas) = (0, 0, 0)

 I0727 22:14:45.096923 33479 group.cpp:385] Trying to create path '/mesos'
 in ZooKeeper

 I0727 22:14:45.099181 33471 detector.cpp:138] Detected a new leader:
 (id='4')

 I0727 22:14:45.099298 33483 group.cpp:656] Trying to get
 '/mesos/info_04' in ZooKeeper

 W0727 22:14:45.100443 33453 detector.cpp:444] Leading master
 master@127.0.0.1:5050 is using a Protobuf binary format when registering
 with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see
 MESOS-2340)

 I0727 22:14:45.100544 33453 detector.cpp:481] A new leading master (UPID=
 master@127.0.0.1:5050) is detected

 I0727 22:14:45.100739 33478 sched.cpp:254] New master detected at
 master@127.0.0.1:5050

 I0727 22:14:45.101104 33478 sched.cpp:264] No credentials provided.
 Attempting to register without authentication

 E0727 22:14:45.101210 33490 socket.hpp:107] Shutdown failed on fd=88:
 Transport endpoint is not connected [107]

 E0727 22:14:45.101380 33490 socket.hpp:107] Shutdown failed on fd=89:
 Transport endpoint is not connected [107]

 E0727 22:14:46.643348 33490 socket.hpp:107] Shutdown failed on fd=88:
 Transport endpoint is not connected [107]

 E0727 22:14:47.111336 33490 socket.hpp:107] Shutdown failed on fd=88:
 Transport endpoint is not connected [107]

 15/07/27 22:14:50 INFO DiskBlockManager: Shutdown hook called

 15/07/27 22:14:50 INFO Utils: path =
 /tmp/spark-3f94442b-7873-463f-91dd-3ee62ed5b263/blockmgr-74a5ed25-025b-4186-b1d8-dc395f287a8f,
 already present as root for deletion.

 15/07/27 22:14:50 INFO Utils: Shutdown hook called

 15/07/27 22:14:50 INFO Utils: Deleting directory
 /tmp/spark-3f94442b-7873-463f-91dd-3ee62ed5b263/httpd-5d2a71e5-1d36-47f7-b122-31f1dd12a0f0

Re: Spark Mesos Dispatcher

2015-07-19 Thread Tim Chen

Depends on how you run 1.3/1.4 versions of Spark, if you're giving it
different Docker images / tar balls of Spark, technically it should work
since it's just launching a driver for you at the end of the day.

However, I haven't really tried it so let me know if you run into problems
with it.

Tim

On Sun, Jul 19, 2015 at 9:23 PM, Jerry Lam chiling...@gmail.com wrote:

 I only used client mode both 1.3 and 1.4 versions on mesos.
 I skimmed through
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala.
 I would actually backport the Cluster Mode feature. Sorry, I don't have an
 answer for this.


 On Sun, Jul 19, 2015 at 11:16 PM, Jahagirdar, Madhu 
 madhu.jahagir...@philips.com wrote:

  1.3 does not have MesosDisptacher or does not have support for Mesos
 cluster mode , is it still possible to create a Dispatcher using 1.4 and
 run 1.3 using that dispatcher ?
  --
 *From:* Jerry Lam [chiling...@gmail.com]
 *Sent:* Monday, July 20, 2015 8:27 AM
 *To:* Jahagirdar, Madhu
 *Cc:* user; d...@spark.apache.org
 *Subject:* Re: Spark Mesos Dispatcher

   Yes.

 Sent from my iPhone

 On 19 Jul, 2015, at 10:52 pm, Jahagirdar, Madhu 
 madhu.jahagir...@philips.com wrote:

   All,

  Can we run different version of Spark using the same Mesos Dispatcher.
 For example we can run drivers with Spark 1.3 and Spark 1.4 at the same
 time ?

  Regards,
 Madhu Jahagirdar

 --
 The information contained in this message may be confidential and legally
 protected under applicable law. The message is intended solely for the
 addressee(s). If you are not the intended recipient, you are hereby
 notified that any use, forwarding, dissemination, or reproduction of this
 message is strictly prohibited and may be unlawful. If you are not the
 intended recipient, please contact the sender by return e-mail and destroy
 all copies of the original message.

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-27 Thread Tim Chen

Does YARN provide the token through that env variable you mentioned? Or how
does YARN do this?

Tim

On Fri, Jun 26, 2015 at 3:51 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Fri, Jun 26, 2015 at 3:44 PM, Dave Ariens dari...@blackberry.com
 wrote:

  Fair. I will look into an alternative with a generated delegation
 token.   However the same issue exists.   How can I have the executor run
 some arbitrary code when it gets a task assignment and before it proceeds
 to process it's resources?


 Hmm, good question. If it doesn't already, Mesos could have its own
 implementation of CoarseGrainedExecutorBackend that provides that
 functionality. The only difference is that you'd run something before the
 executor starts up, not before each task.

 YARN actually doesn't do it that way; YARN provides the tokens to the
 executor before the process starts, so that when you call
 UserGroupInformation.getCurrentUser() the tokens are already there.

 One way of doing that is by writing the tokens to a file and setting the
 KRB5CCNAME env variable when starting the process. You can check the Hadoop
 sources for details. Not sure if there's another way.




 *From: *Marcelo Vanzin
 *Sent: *Friday, June 26, 2015 6:20 PM
 *To: *Dave Ariens
 *Cc: *Tim Chen; Olivier Girardot; user@spark.apache.org
 *Subject: *Re: Accessing Kerberos Secured HDFS Resources from Spark on
 Mesos

   On Fri, Jun 26, 2015 at 3:09 PM, Dave Ariens dari...@blackberry.com
 wrote:

  Would there be any way to have the task instances in the slaves call
 the UGI login with a principal/keytab provided to the driver?


  That would only work with a very small number of executors. If you have
 many login requests in a short period of time with the same principal, the
 KDC will start to deny logins. That's why delegation tokens are used
 instead of explicit logins.

  --
 Marcelo




 --
 Marcelo

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-26 Thread Tim Chen

Mesos do support running containers as specific users passed to it.

Thanks for chiming in, what else does YARN do with Kerberos besides keytab
file and user?

Tim

On Fri, Jun 26, 2015 at 1:20 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen t...@mesosphere.io wrote:

 So correct me if I'm wrong, sounds like all you need is a principal user
 name and also a keytab file downloaded right?


 I'm not familiar with Mesos so don't know what kinds of features it has,
 but at the very least it would need to start containers as the requesting
 users (like YARN does when running with Kerberos enabled), to avoid users
 being able to read each other's credentials.

 --
 Marcelo

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-26 Thread Tim Chen

So correct me if I'm wrong, sounds like all you need is a principal user
name and also a keytab file downloaded right?

I'm adding support from spark framework to download additional files along
side your executor and driver, and one workaround is to specify a user
principal and keytab file that can be downloaded and then used in your
driver as you can expect it to be in the current working directory.

I suspect there might be other setup needed, but if you guys are available
we can work together to get something working.


Tim

On Fri, Jun 26, 2015 at 12:23 PM, Olivier Girardot ssab...@gmail.com
wrote:

 I would pretty much need exactly this kind of feature too

 Le ven. 26 juin 2015 à 21:17, Dave Ariens dari...@blackberry.com a
 écrit :

  Hi Timothy,



 Because I'm running Spark on Mesos alongside a secured Hadoop cluster, I
 need to ensure that my tasks running on the slaves perform a Kerberos login
 before accessing any HDFS resources.  To login, they just need the name of
 the principal (username) and a keytab file.  Then they just need to invoke
 the following java:



 import org.apache.hadoop.security.UserGroupInformation

 UserGroupInformation.loginUserFromKeytab(adminPrincipal, adminKeytab)



 This is done in the driver in my Gist below, but I don't know how to run
 it within each executor on the slaves as tasks are ran.



 Any help would be appreciated!





 *From:* Timothy Chen [mailto:t...@mesosphere.io]
 *Sent:* Friday, June 26, 2015 12:50 PM
 *To:* Dave Ariens
 *Cc:* user@spark.apache.org
 *Subject:* Re: Accessing Kerberos Secured HDFS Resources from Spark on
 Mesos



 Hi Dave,



 I don't understand Keeberos much but if you know the exact steps that
 needs to happen I can see how we can make that happen with the Spark
 framework.



 Tim


 On Jun 26, 2015, at 8:49 AM, Dave Ariens dari...@blackberry.com wrote:

  I understand that Kerberos support for accessing Hadoop resources in Spark 
 only works when running Spark on YARN.  However, I'd really like to hack 
 something together for Spark on Mesos running alongside a secured Hadoop 
 cluster.  My simplified appplication (gist: 
 https://gist.github.com/ariens/2c44c30e064b1790146a) receives a Kerberos 
 principal and keytab when submitted.  The static main method called 
 currently then performs a UserGroupInformation. 
 loginUserFromKeytab(userPrincipal, userKeytab) and authenticates to the 
 Hadoop.  This works on YARN (curiously without even without having to kinit 
 first), but not on Mesos.  Is there a way to have the slaves  running the 
 tasks perform the same kerberos login before they attempt to access HDFS?



 Putting aside the security of Spark/Mesos and how that keytab would get 
 distributed, I'm just looking for a working POC.



 Is there a way to leverage the Broadcast capability to send a function that 
 performs this?



 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.broadcast.Broadcast



 Ideally, I'd love for this to not incur much overhead and just simply allow 
 me to work around the absent Kerberos support...



 Thanks,



 Dave

Re: Spark 1.3.1 On Mesos Issues.

2015-06-05 Thread Tim Chen

It seems like there is another thread going on:

http://answers.mapr.com/questions/163353/spark-from-apache-downloads-site-for-mapr.html

I'm not particularly sure why, seems like the problem is that getting the
current context class loader is returning null in this instance.

Do you have some repro steps or config we can try this?

Tim

On Fri, Jun 5, 2015 at 3:40 AM, Steve Loughran ste...@hortonworks.com
wrote:


  On 2 Jun 2015, at 00:14, Dean Wampler deanwamp...@gmail.com wrote:

  It would be nice to see the code for MapR FS Java API, but my google foo
 failed me (assuming it's open source)...


  I know that MapRFS is closed source, don't know about the java JAR. Why
 not ask Ted Dunning (cc'd)  nicely to see if he can track down the stack
 trace for you.

   So, shooting in the dark ;) there are a few things I would check, if
 you haven't already:

  1. Could there be 1.2 versions of some Spark jars that get picked up at
 run time (but apparently not in local mode) on one or more nodes? (Side
 question: Does your node experiment fail on all nodes?) Put another way,
 are the classpaths good for all JVM tasks?
 2. Can you use just MapR and Spark 1.3.1 successfully, bypassing Mesos?

  Incidentally, how are you combining Mesos and MapR? Are you running
 Spark in Mesos, but accessing data in MapR-FS?

  Perhaps the MapR shim library doesn't support Spark 1.3.1.

  HTH,

  dean

  Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com/
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Mon, Jun 1, 2015 at 2:49 PM, John Omernik j...@omernik.com wrote:

 All -

  I am facing and odd issue and I am not really sure where to go for
 support at this point.  I am running MapR which complicates things as it
 relates to Mesos, however this HAS worked in the past with no issues so I
 am stumped here.

  So for starters, here is what I am trying to run. This is a simple show
 tables using the Hive Context:

  from pyspark import SparkContext, SparkConf
 from pyspark.sql import SQLContext, Row, HiveContext
 sparkhc = HiveContext(sc)
 test = sparkhc.sql(show tables)
 for r in test.collect():
   print r

  When I run it on 1.3.1 using ./bin/pyspark --master local  This works
 with no issues.

  When I run it using Mesos with all the settings configured (as they had
 worked in the past) I get lost tasks and when I zoom in them, the error
 that is being reported is below.  Basically it's a NullPointerException on
 the com.mapr.fs.ShimLoader.  What's weird to me is is I took each instance
 and compared both together, the class path, everything is exactly the same.
 Yet running in local mode works, and running in mesos fails.  Also of note,
 when the task is scheduled to run on the same node as when I run locally,
 that fails too! (Baffling).

  Ok, for comparison, how I configured Mesos was to download the mapr4
 package from spark.apache.org.  Using the exact same configuration file
 (except for changing the executor tgz from 1.2.0 to 1.3.1) from the 1.2.0.
 When I run this example with the mapr4 for 1.2.0 there is no issue in
 Mesos, everything runs as intended. Using the same package for 1.3.1 then
 it fails.

  (Also of note, 1.2.1 gives a 404 error, 1.2.2 fails, and 1.3.0 fails as
 well).

  So basically When I used 1.2.0 and followed a set of steps, it worked
 on Mesos and 1.3.1 fails.  Since this is a current version of Spark, MapR
 is supports 1.2.1 only.  (Still working on that).

  I guess I am at a loss right now on why this would be happening, any
 pointers on where I could look or what I could tweak would be greatly
 appreciated. Additionally, if there is something I could specifically draw
 to the attention of MapR on this problem please let me know, I am perplexed
 on the change from 1.2.0 to 1.3.1.

  Thank you,

  John




  Full Error on 1.3.1 on Mesos:
 15/05/19 09:31:26 INFO MemoryStore: MemoryStore started with capacity
 1060.3 MB java.lang.NullPointerException at
 com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:96) at
 com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:232) at
 com.mapr.fs.ShimLoader.load(ShimLoader.java:194) at
 org.apache.hadoop.conf.CoreDefaultProperties.(CoreDefaultProperties.java:60)
 at java.lang.Class.forName0(Native Method) at
 java.lang.Class.forName(Class.java:274) at
 org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1847)
 at
 org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2062)
 at
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2272)
 at
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2224)
 at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2141)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:992) at
 org.apache.hadoop.conf.Configuration.set(Configuration.java:966) at

Re: [Streaming] Configure executor logging on Mesos

2015-05-30 Thread Tim Chen

So sounds like some generic downloadable uris support can solve this
problem, that Mesos automatically places in your sandbox and you can refer
to it.

If so please file a jira and this is a pretty simple fix on the Spark side.

Tim

On Sat, May 30, 2015 at 7:34 AM, andy petrella andy.petre...@gmail.com
wrote:

Hello,

I'm currently exploring DCOS for the spark notebook, and while looking at
the spark configuration I found something interesting which is actually
converging to what we've discovered:

https://github.com/mesosphere/universe/blob/master/repo/packages/S/spark/0/marathon.json

So the logging is working fine here because the spark package is using the
spark-class which is able to configure the log4j file. But the interesting
part comes with the fact that the `uris` parameter is filled in with a
downloadable path to the log4j file!

However, it's not possible when creating the spark context ourselfves and
relying on the mesos sheduler backend only. Unles the spark.executor.uri
(or a another one) can take more than one downloadable path.

my.2¢

andy

On Fri, May 29, 2015 at 5:09 PM Gerard Maas gerard.m...@gmail.com wrote:

Hi Tim,

Thanks for the info. We (Andy Petrella and myself) have been diving a
bit deeper into this log config:

The log line I was referring to is this one (sorry, I provided the others
just for context)

*Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties*

That line comes from Logging.scala [1] where a default config is loaded
is none is found in the classpath upon the startup of the Spark Mesos
executor in the Mesos sandbox. At that point in time, none of the
application-specific resources have been shipped yet as the executor JVM is
just starting up. To load a custom configuration file we should have it
already on the sandbox before the executor JVM starts and add it to the
classpath on the startup command. Is that correct?

For the classpath customization, It looks like it should be possible to
pass a -Dlog4j.configuration property by using the
'spark.executor.extraClassPath' that will be picked up at [2] and that
should be added to the command that starts the executor JVM, but the
resource must be already on the host before we can do that. Therefore we
also need some means of 'shipping' the log4j.configuration file to the
allocated executor.

This all boils down to your statement on the need of shipping extra files
to the sandbox. Bottom line: It's currently not possible to specify a
config file for your mesos executor. (ours grows several GB/day).

The only workaround I found so far is to open up the Spark assembly,
replace the log4j-default.properties and pack it up again. That would
work, although kind of rudimentary as we use the same assembly for many
jobs. Probably, accessing the log4j API programmatically should also work
(I didn't try that yet)

Should we open a JIRA for this functionality?

-kr, Gerard.

[1]
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L128
[2]
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L77

On Thu, May 28, 2015 at 7:50 PM, Tim Chen t...@mesosphere.io wrote:

-- Forwarded message --
From: Tim Chen t...@mesosphere.io
Date: Thu, May 28, 2015 at 10:49 AM
Subject: Re: [Streaming] Configure executor logging on Mesos
To: Gerard Maas gerard.m...@gmail.com

Hi Gerard,

The log line you referred to is not Spark logging but Mesos own logging,
which is using glog.

Our own executor logs should only contain very few lines though.

Most of the log lines you'll see is from Spark, and it can be controled
by specifiying a log4j.properties to be downloaded with your Mesos task.
Alternatively if you are downloading Spark executor via spark.executor.uri,
you can include log4j.properties in that tar ball.

I think we probably need some more configurations for Spark scheduler to
pick up extra files to be downloaded into the sandbox.

Tim

On Thu, May 28, 2015 at 6:46 AM, Gerard Maas gerard.m...@gmail.com
wrote:

Hi,

I'm trying to control the verbosity of the logs on the Mesos executors
with no luck so far. The default behaviour is INFO on stderr dump with an
unbounded growth that gets too big at some point.

I noticed that when the executor is instantiated, it locates a default
log configuration in the spark assembly:

I0528 13:36:22.958067 26890 exec.cpp:206] Executor registered on slave
20150528-063307-780930314-5050-8152-S5
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties

So, no matter what I provide in my job jar files (or also tried with
(spark.executor.extraClassPath=log4j.properties) takes effect in the
executor's configuration.

How should I configure the log

Fwd: [Streaming] Configure executor logging on Mesos

2015-05-28 Thread Tim Chen

-- Forwarded message --
From: Tim Chen t...@mesosphere.io
Date: Thu, May 28, 2015 at 10:49 AM
Subject: Re: [Streaming] Configure executor logging on Mesos
To: Gerard Maas gerard.m...@gmail.com

Hi Gerard,

The log line you referred to is not Spark logging but Mesos own logging,
which is using glog.

Our own executor logs should only contain very few lines though.

Most of the log lines you'll see is from Spark, and it can be controled by
specifiying a log4j.properties to be downloaded with your Mesos task.
Alternatively if you are downloading Spark executor via spark.executor.uri,
you can include log4j.properties in that tar ball.

I think we probably need some more configurations for Spark scheduler to
pick up extra files to be downloaded into the sandbox.

Tim

On Thu, May 28, 2015 at 6:46 AM, Gerard Maas gerard.m...@gmail.com wrote:

 Hi,

 I'm trying to control the verbosity of the logs on the Mesos executors
 with no luck so far. The default behaviour is INFO on stderr dump with an
 unbounded growth that gets too big at some point.

 I noticed that when the executor is instantiated, it locates a default log
 configuration in the spark assembly:

 I0528 13:36:22.958067 26890 exec.cpp:206] Executor registered on slave
 20150528-063307-780930314-5050-8152-S5
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties

 So, no matter what I provide in my job jar files (or also tried with
 (spark.executor.extraClassPath=log4j.properties) takes effect in the
 executor's configuration.

 How should I configure the log on the executors?

 thanks, Gerard.

Re: Mesos Spark Tasks - Lost

2015-05-20 Thread Tim Chen

Can you share your exact spark-submit command line?

And also cluster mode is not yet released yet (1.4) and doesn't support
spark-shell, so I think you're just using client mode unless you're using
latest master.

Tim

On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis panga...@gmail.com
wrote:

 Hello all,

 I am facing a weird issue for the last couple of days running Spark on top
 of Mesos and I need your help. I am running Mesos in a private cluster and
 managed to deploy successfully  hdfs, cassandra, marathon and play but
 Spark is not working for a reason. I have tried so far:
 different java versions (1.6 and 1.7 oracle and openjdk), different
 spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
 different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.

 More specifically while local tasks complete fine, in cluster mode all the
 tasks get lost.
 (both using spark-shell and spark-submit)
 From the worker log I see something like this:

 ---
 I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
 'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
 'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
 Client
 I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
 'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 into
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
 *Error: Could not find or load main class two*

 ---

 And from the Spark Terminal:

 ---
 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
 SparkPi.scala:35
 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
 SparkPi.scala:35
 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
 failure: Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
 (executor lost)
 Driver stacktrace: at
 org.apache.spark.scheduler.DAGScheduler.org
 http://org.apache.spark.scheduler.dagscheduler.org/$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 ..
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 ---

 Any help will be greatly appreciated!

 Regards,
 Panagiotis

Re: Spark on Mesos vs Yarn

2015-05-15 Thread Tim Chen

Hi Ankur,

This is a great question as I've heard similar concerns about Spark on
Mesos.

At the time when I started to contribute to Spark on Mesos approx half year
ago, the Mesos scheduler and related code hasn't really got much attention
from anyone and it was pretty much in maintenance mode.

As a Mesos PMC that is really interested in Spark I started to refactor and
check out different JIRAs and PRs around the Mesos scheduler, and after
that started to fix various bugs in Spark, added documentation and also in
fix related Mesos issues as well.

Just recently for 1.4 we've merged in Cluster mode and Docker support, and
there are also pending PRs around framework authentication, multi-role
support, dynamic allocation, more finer tuned coarse grain mode scheduling
configurations, etc.

And finally just want to mention that Mesosphere and Typesafe is
collaborating to bring a certified distribution (
https://databricks.com/spark/certification/certified-spark-distribution) of
Spark on Mesos and DCOS, and we will be pouring resources into not just
maintain Spark on Mesos but drive more features into the Mesos scheduler
and also in Mesos so stateful services can leverage new APIs and features
to make better scheduling decisions and optimizations.

I don't have a solidified roadmap to share yet, but we will be discussing
this and hopefully can share with the community soon.

In summary Spark on Mesos is not dead or in maintenance mode, and look
forward to see a lot more changes from us and the community.

Tim

On Thu, May 14, 2015 at 11:30 PM, Ankur Chauhan an...@malloc64.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 This is both a survey type as well as a roadmap query question. It
 seems like of the cluster options to run spark (i.e. via YARN and
 Mesos), YARN seems to be getting a lot more attention and patches when
 compared to Mesos.

 Would it be correct to assume that spark on mesos is more or less a
 dead or something like a maintenance-only feature and YARN is the
 recommended way to go?

 What is the roadmap for spark on mesos? and what is the roadmap for
 spark on yarn. I like mesos so as much as I would like to see it
 thrive I don't think spark community is active (or maybe it just
 appears that way).

 Another more community oriented question: what do most people use to
 run spark in production or more-than-POC products? Why did you make
 that decision?

 There was a similar post form early 2014 where Metei answered that
 mesos and yarn were equally important, but has this changed as spark
 has now reached almost 1.4.0 stage?

 - -- Ankur Chauhan
 -BEGIN PGP SIGNATURE-

 iQEcBAEBAgAGBQJVVZKGAAoJEOSJAMhvLp3L0vEIAI4edLB2rMGk+OTI4WujxX6k
 Ud5NyFUpaQ8WDjOhwcWB9RK5EoM7X3wGzRcGza1HLVnvdSUBG8Ltabt47GsP2lo0
 7H9y2GluUZg/RJXbN0Ehp6moWjAU1W/55POD3t87qeUdydUJVbgDYA/KovNa6i8s
 Z/e8mfvOrFSJyuJi8KW2KcfOmB1i8VZH7b/zZqtfJKNGo/0dac/gez19vVPaXPa4
 WNUN8dHcp0yiZnZ0PUTYNLhI58BXBCSmkEl2Ex7X3NBUGUgJ5HGHn6dpqqNhGvf3
 yPw0B0q93NcExK/E4/I75nn4vh5wKLPLWT8U5btphmc7S6h8gWFMEJRHQCdtaUk=
 =uYXZ
 -END PGP SIGNATURE-

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark on Mesos

2015-05-13 Thread Tim Chen

Hi Stephen,

You probably didn't run the Spark driver/shell as root, as Mesos scheduler
will pick up your local user and tries to impersonate as the same user and
chown the directory before executing any task.

If you try to run Spark driver as root it should resolve the problem. No
switch user can also work as it won't try to switch user for you.

Tim

On Wed, May 13, 2015 at 10:50 AM, Stephen Carman scar...@coldlight.com
wrote:

 Sander,

 I eventually solved this problem via the --[no-]switch_user flag, which is
 set to true by default. I set this to false, which would have the user that
 owns the process run the job, otherwise it was my username (scarman)
 running the job, which would fail because obviously my username didn’t
 exist there. When ran as root, it ran totally fine with no problems what so
 ever.

 Hopefully this works for you too,

 Steve
  On May 13, 2015, at 11:45 AM, Sander van Dijk sgvand...@gmail.com
 wrote:
 
  Hey all,
 
  I seem to be experiencing the same thing as Stephen. I run Spark 1.2.1
 with Mesos 0.22.1, with Spark coming from the spark-1.2.1-bin-hadoop2.4.tgz
 prebuilt package, and Mesos installed from the Mesosphere repositories. I
 have been running with Spark standalone successfully for a while and now
 trying to setup Mesos. Mesos is up and running, the UI at port 5050 reports
 all slaves alive. I then run Spark shell with: `spark-shell --master
 mesos://1.1.1.1:5050` (with 1.1.1.1 the master's ip address), which
 starts up fine, with output:
 
  I0513 15:02:45.340287 28804 sched.cpp:448] Framework registered with
 20150512-150459-2618695596-5050-3956-0009 15/05/13 15:02:45 INFO
 mesos.MesosSchedulerBackend: Registered as framework ID
 20150512-150459-2618695596-5050-3956-0009
 
  and the framework shows up in the Mesos UI. Then when trying to run
 something (e.g. 'val rdd = sc.txtFile(path); rdd.count') fails with lost
 executors. In /var/log/mesos-slave.ERROR on the slave instances there are
 entries like:
 
  E0513 14:57:01.198995 13077 slave.cpp:3112] Container
 'eaf33d36-dde5-498a-9ef1-70138810a38c' for executor
 '20150512-145720-2618695596-5050-3082-S10' of framework
 '20150512-150459-2618695596-5050-3956-0009' failed to start: Failed to
 execute mesos-fetcher: Failed to chown work directory
 
  From what I can find, the work directory is in /tmp/mesos, where indeed
 I see a directory structure with executor and framework IDs, with at the
 leaves stdout and stderr files of size 0. Everything there is owned by
 root, but I assume the processes are also run by root, so any chowning in
 there should be possible.
 
  I was thinking maybe it fails to fetch the Spark package executor? I
 uploaded spark-1.2.1-bin-hadoop2.4.tgz to hdfs, SPARK_EXECUTOR_URI is set
 in spark-env.sh, and in the Environment section of the web UI I see this
 picked up in the spark.executor.uriparameter. I checked and the URI is
 reachable by the slaves: an `hdfs dfs -stat $SPARK_EXECUTOR_URI` is
 successful.
 
  Any pointers?
 
  Many thanks,
  Sander
 
  On Fri, May 1, 2015 at 8:35 AM Tim Chen t...@mesosphere.io wrote:
  Hi Stephen,
 
  It looks like Mesos slave was most likely not able to launch some mesos
 helper processes (fetcher probably?).
 
  How did you install Mesos? Did you build from source yourself?
 
  Please install Mesos through a package or actually from source run make
 install and run from the installed binary.
 
  Tim
 
  On Mon, Apr 27, 2015 at 11:11 AM, Stephen Carman scar...@coldlight.com
 wrote:
  So I installed spark on each of the slaves 1.3.1 built with hadoop2.6 I
 just basically got the pre-built from the spark website…
 
  I placed those compiled spark installs on each slave at /opt/spark
 
  My spark properties seem to be getting picked up on my side fine…
 
  Screen Shot 2015-04-27 at 10.30.01 AM.png
  The framework is registered in Mesos, it shows up just fine, it doesn’t
 matter if I turn off the executor uri or not, but I always get the same
 error…
 
  org.apache.spark.SparkException: Job aborted due to stage failure: Task
 6 in stage 0.0 failed 4 times, most recent failure: Lost task 6.3 in stage
 0.0 (TID 23, 10.253.1.117): ExecutorLostFailure (executor
 20150424-104711-1375862026-5050-20113-S1 lost)
  Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
  at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
  at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
  at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
  at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
  at
 org.apache.spark.scheduler.DAGScheduler$$anonfun

Re: Spark on Mesos

2015-05-01 Thread Tim Chen

Hi Stephen,

It looks like Mesos slave was most likely not able to launch some mesos
helper processes (fetcher probably?).

How did you install Mesos? Did you build from source yourself?

Please install Mesos through a package or actually from source run make
install and run from the installed binary.

Tim

On Mon, Apr 27, 2015 at 11:11 AM, Stephen Carman scar...@coldlight.com
wrote:

  So I installed spark on each of the slaves 1.3.1 built with hadoop2.6 I
 just basically got the pre-built from the spark website…

  I placed those compiled spark installs on each slave at /opt/spark

  My spark properties seem to be getting picked up on my side fine…

  The framework is registered in Mesos, it shows up just fine, it doesn’t
 matter if I turn off the executor uri or not, but I always get the same
 error…

  org.apache.spark.SparkException: Job aborted due to stage failure: Task
 6 in stage 0.0 failed 4 times, most recent failure: Lost task 6.3 in stage
 0.0 (TID 23, 10.253.1.117): ExecutorLostFailure (executor
 20150424-104711-1375862026-5050-20113-S1 lost)
 Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

  These boxes are totally open to one another so they shouldn’t have any
 firewall issues, everything seems to show up in mesos and spark just fine,
 but actually running stuff totally blows up.

  There is nothing in the stderr or stdout, it downloads the package and
 untars it but doesn’t seem to do much after that. Any insights?

  Steve


  On Apr 24, 2015, at 5:50 PM, Yang Lei genia...@gmail.com wrote:

 SPARK_PUBLIC_DNS, SPARK_LOCAL_IP, SPARK_LOCAL_HOST


 This e-mail is intended solely for the above-mentioned recipient and it
 may contain confidential or privileged information. If you have received it
 in error, please notify us immediately and delete the e-mail. You must not
 copy, distribute, disclose or take any action in reliance on it. In
 addition, the contents of an attachment to this e-mail may contain software
 viruses which could damage your own computer system. While ColdLight
 Solutions, LLC has taken every reasonable precaution to minimize this risk,
 we cannot accept liability for any damage which you sustain as a result of
 software viruses. You should perform your own virus checks before opening
 the attachment.

Re: Spark on Mesos

2015-04-24 Thread Tim Chen

Hi Stephen,

Sometimes it's just missing something simple, either like a user name
problem or file dependency, etc.

Can you share what's in the stdout/stderr in your task sandbox directory
(available via Mesos UI, clicking on the task and sandbox)?

And also super helpful if you can find in the slave.log that ran one of
your failed task, find the logs when it reported TASK_FAILED or TASK_LOST
of that task and it should say reasons why mesos slave couldn't run the
task.

Thanks,

Tim



On Fri, Apr 24, 2015 at 2:15 PM, Stephen Carman scar...@coldlight.com
wrote:

 So I can’t for the life of me to get something even simple working for
 Spark on Mesos.

 I installed a 3 master, 3 slave mesos cluster, which is all configured,
 but I can’t for the life of me even get the spark shell to work properly.

 I get errors like this
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 5
 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
 0.0 (TID 23, 10.253.1.117): ExecutorLostFailure (executor
 20150424-104711-1375862026-5050-20113-S1 lost)
 Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

 I tried both mesos 0.21 and 0.22 and they both produce the same error…

 My version of spark is 1.3.1 with hadoop 2.6, I just downloaded the
 pre-build from the site, or is that wrong and i have to build it myself?

 I have my mesos_native_java_library, spark executor URI and mesos master
 set in my spark-env.sh, they to the best of my abilities seem correct.

 Does anyone have any insight into this at all? I’m running this on red hat
 7 with 8 CPU cores and 14gb of ram per slave, so 24 cores total and 42gb of
 ram total.

 Anyone have any idea at all what is going on here?

 Thanks,
 Steve
 This e-mail is intended solely for the above-mentioned recipient and it
 may contain confidential or privileged information. If you have received it
 in error, please notify us immediately and delete the e-mail. You must not
 copy, distribute, disclose or take any action in reliance on it. In
 addition, the contents of an attachment to this e-mail may contain software
 viruses which could damage your own computer system. While ColdLight
 Solutions, LLC has taken every reasonable precaution to minimize this risk,
 we cannot accept liability for any damage which you sustain as a result of
 software viruses. You should perform your own virus checks before opening
 the attachment.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Tim Chen

Linux OOM throws SIGTERM, but if I remember correctly JVM handles heap
memory limits differently and throws OutOfMemoryError and eventually sends
SIGINT.

Not sure what happened but the worker simply received a SIGTERM signal, so
perhaps the daemon was terminated by someone or a parent process. Just my
guess.

Tim

On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel guillaume.pi...@exensa.com
 wrote:

  Very likely to be this :

 http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=2

 Your worker ran out of memory = maybe you're asking for too much memory
 for the JVM, or something else is running on the worker

 Guillaume

  Any idea what this means, many thanks

  ==
 logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1
 ==
 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4
 cores, 6.6 GB RAM
 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0
 15/04/13 07:07:22 INFO Worker: Spark home:
 /remote/users//work/tools/spark-1.3.0-bin-hadoop2.4
 15/04/13 07:07:22 INFO Server: jetty-8.y.z-SNAPSHOT
 15/04/13 07:07:22 INFO AbstractConnector: Started
 SelectChannelConnector@0.0.0.0:8081
 15/04/13 07:07:22 INFO Utils: Successfully started service 'WorkerUI' on
 port 8081.
 15/04/13 07:07:22 INFO WorkerWebUI: Started WorkerWebUI at
 http://09:8081
 15/04/13 07:07:22 INFO Worker: Connecting to master
 akka.tcp://sparkMaster@nceuhamnr08:7077/user/Master...
 15/04/13 07:07:22 INFO Worker: Successfully registered with master
 spark://08:7077
 *15/04/13 08:35:07 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM*



 --
[image: eXenSa]
  *Guillaume PITEL, Président*
 +33(0)626 222 431

 eXenSa S.A.S. http://www.exensa.com/
  41, rue Périer - 92120 Montrouge - FRANCE
 Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Spark on Mesos / Executor Memory

2015-04-11 Thread Tim Chen

(Adding spark user list)

Hi Tom,

If I understand correctly you're saying that you're running into memory
problems because the scheduler is allocating too much CPUs and not enough
memory to acoomodate them right?

In the case of fine grain mode I don't think that's a problem since we have
a fixed amount of CPU and memory per task.
However, in coarse grain you can run into that problem if you're with in
the spark.cores.max limit, and memory is a fixed number.

I have a patch out to configure how much max cpus should coarse grain
executor use, and it also allows multiple executors in coarse grain mode.
So you could say try to launch multiples of max 4 cores with
spark.executor.memory (+ overhead and etc) in a slave. (
https://github.com/apache/spark/pull/4027)

It also might be interesting to include a cores to memory multiplier so
that with a larger amount of cores we try to scale the memory with some
factor, but I'm not entirely sure that's intuitive to use and what people
know what to set it to, as that can likely change with different workload.

Tim







On Sat, Apr 11, 2015 at 9:51 AM, Tom Arnfeld t...@duedil.com wrote:

 We're running Spark 1.3.0 (with a couple of patches over the top for
 docker related bits).

 I don't think SPARK-4158 is related to what we're seeing, things do run
 fine on the cluster, given a ridiculously large executor memory
 configuration. As for SPARK-3535 although that looks useful I think we'e
 seeing something else.

 Put a different way, the amount of memory required at any given time by
 the spark JVM process is directly proportional to the amount of CPU it has,
 because more CPU means more tasks and more tasks means more memory. Even if
 we're using coarse mode, the amount of executor memory should be
 proportionate to the amount of CPUs in the offer.

 On 11 April 2015 at 17:39, Brenden Matthews bren...@diddyinc.com wrote:

 I ran into some issues with it a while ago, and submitted a couple PRs to
 fix it:

 https://github.com/apache/spark/pull/2401
 https://github.com/apache/spark/pull/3024

 Do these look relevant? What version of Spark are you running?

 On Sat, Apr 11, 2015 at 9:33 AM, Tom Arnfeld t...@duedil.com wrote:

 Hey,

 Not sure whether it's best to ask this on the spark mailing list or the
 mesos one, so I'll try here first :-)

 I'm having a bit of trouble with out of memory errors in my spark
 jobs... it seems fairly odd to me that memory resources can only be set at
 the executor level, and not also at the task level. For example, as far as
 I can tell there's only a *spark.executor.memory* config option.

 Surely the memory requirements of a single executor are quite
 dramatically influenced by the number of concurrent tasks running? Given a
 shared cluster, I have no idea what % of an individual slave my executor is
 going to get, so I basically have to set the executor memory to a value
 that's correct when the whole machine is in use...

 Has anyone else running Spark on Mesos come across this, or maybe
 someone could correct my understanding of the config options?

 Thanks!

 Tom.

Re: spark mesos deployment : starting workers based on attributes

2015-04-03 Thread Tim Chen

Hi Ankur,

There isn't a way to do that yet, but it's simple to add.

Can you create a JIRA in Spark for this?

Thanks!

Tim

On Fri, Apr 3, 2015 at 1:08 PM, Ankur Chauhan achau...@brightcove.com
wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I am trying to figure out if there is a way to tell the mesos
 scheduler in spark to isolate the workers to a set of mesos slaves
 that have a given attribute such as `tachyon:true`.

 Anyone knows if that is possible or how I could achieve such a behavior.

 Thanks!
 - -- Ankur Chauhan
 -BEGIN PGP SIGNATURE-

 iQEcBAEBAgAGBQJVHvMlAAoJEOSJAMhvLp3LaV0H/jtX+KQDyorUESLIKIxFV9KM
 QjyPtVquwuZYcwLqCfQbo62RgE/LeTjjxzifTzMM5D6cf4ULBH1TcS3Is2EdOhSm
 UTMfJyvK06VFvYMLiGjqN4sBG3DFdamQif18qUJoKXX/Z9cUQO9SaSjIezSq2gd8
 0lM3NLEQjsXY5uRJyl9GYDxcFsXPVzt1crXAdrtVsIYAlFmhcrm1n/5+Peix89Oh
 vgK1J7e0ei7Rc4/3BR2xr8f9us+Jfqym/xe+45h1YYZxZWrteCa48NOGixuUJjJe
 zb1MxNrTFZhPrKFT7pz9kCUZXl7DW5hzoQCH07CXZZI3B7kFS+5rjuEIB9qZXPE=
 =cadl
 -END PGP SIGNATURE-

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Mesos coarse mode not working (fine grained does)

2015-02-08 Thread Tim Chen

Hi there,

It looks like while trying to launch the executor (or one of the process
like the fetcher to fetch the uris) was failing because of the dependencies
problem you see. Your mesos-slave shouldn't be able to run though, were you
running 0.20.0 slave and upgraded to 0.21.0? We introduced the dependencies
for libapr and libsvn for Mesos 0.21.0.

What's the stdout for the task like?

Tim




On Mon, Feb 9, 2015 at 4:10 AM, Hans van den Bogert hansbog...@gmail.com
wrote:

 I wasn’t thorough, the complete stderr includes:

 g++: /usr/lib64/libaprutil-1.so: No such file or directory
 g++: /usr/lib64/libapr-1.so: No such file or directoryn
 (including that trailing ’n')

 Though I can’t figure out how the process indirection is going from the
 frontend spark application to mesos executors and where this shared library
 error comes from.

 Hope someone can shed some light,

 Thanks

 On 08 Feb 2015, at 14:15, Hans van den Bogert hansbog...@gmail.com
 wrote:

  Hi,
 
 
  I’m trying to get coarse mode to work under mesos(0.21.0), I thought
 this would be a trivial change as Mesos was working well in fine-grained
 mode.
 
  However the mesos tasks fail, I can’t pinpoint where things go wrong.
 
  This is a mesos stderr log from a slave:
 
 Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz'
 I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading '
 http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
 I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
 into
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’
 
  Mesos slaves' stdout are empty.
 
 
  And I can confirm the spark distro is correctly extracted:
 $ ls
 spark-1.2.0-bin-hadoop2.4  spark-1.2.0-bin-hadoop2.4.tgz  stderr
 stdout
 
  The spark-submit log is here:
  http://pastebin.com/ms3uZ2BK
 
  Mesos-master
  http://pastebin.com/QH2Vn1jX
 
  Mesos-slave
  http://pastebin.com/DXFYemix
 
 
  Can somebody pinpoint me to logs, etc to further investigate this, I’m
 feeling kind of blind.
  Furthermore, do the executors on mesos inherit all configs from the
 spark application/submit? E.g. I’ve given my executors 20GB of memory
 through a spark-submit —conf”  parameter. Should these settings also be
 present in the spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs?
 
  If, in order to be helped here, I need to present more logs etc, please
 let me know.
 
  Regards,
 
  Hans van den Bogert


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark (Streaming?) holding on to Mesos resources

2015-01-27 Thread Tim Chen

Hi Gerard,

As others has mentioned I believe you're hitting Mesos-1688, can you
upgrade to the latest Mesos release (0.21.1) and let us know if it resolves
your problem?

Thanks,

Tim

On Tue, Jan 27, 2015 at 10:39 AM, Sam Bessalah samkiller@gmail.com
wrote:

 Hi Geraard,
 isn't this the same issueas this?
 https://issues.apache.org/jira/browse/MESOS-1688

 On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas gerard.m...@gmail.com
 wrote:

 Hi,

 We are observing with certain regularity that our Spark  jobs, as Mesos
 framework, are hoarding resources and not releasing them, resulting in
 resource starvation to all jobs running on the Mesos cluster.

 For example:
 This is a job that has spark.cores.max = 4 and spark.executor.memory=3g

 IDFrameworkHostCPUsMem…5050-16506-1146497FooStreamingdnode-4.hdfs.private
 713.4 GB…5050-16506-1146495FooStreaming
 dnode-0.hdfs.private16.4 GB…5050-16506-1146491FooStreaming
 dnode-5.hdfs.private711.9 GB…5050-16506-1146449FooStreaming
 dnode-3.hdfs.private74.9 GB…5050-16506-1146247FooStreaming
 dnode-1.hdfs.private0.55.9 GB…5050-16506-1146226FooStreaming
 dnode-2.hdfs.private37.9 GB…5050-16506-1144069FooStreaming
 dnode-3.hdfs.private18.7 GB…5050-16506-1133091FooStreaming
 dnode-5.hdfs.private11.7 GB…5050-16506-1133090FooStreaming
 dnode-2.hdfs.private55.2 GB…5050-16506-1133089FooStreaming
 dnode-1.hdfs.private6.56.3 GB…5050-16506-1133088FooStreaming
 dnode-4.hdfs.private1251 MB…5050-16506-1133087FooStreaming
 dnode-0.hdfs.private6.46.8 GB
 The only way to release the resources is by manually finding the process
 in the cluster and killing it. The jobs are often streaming but also batch
 jobs show this behavior. We have more streaming jobs than batch, so stats
 are biased.
 Any ideas of what's up here? Hopefully some very bad ugly bug that has
 been fixed already and that will urge us to upgrade our infra?

 Mesos 0.20 +  Marathon 0.7.4 + Spark 1.1.0

 -kr, Gerard.

Re: dockerized spark executor on mesos?

2015-01-15 Thread Tim Chen

Just throwing this out here, there is existing PR to add docker support for
spark framework to launch executors with docker image.

https://github.com/apache/spark/pull/3074

Hopefully this will be merged sometime.

Tim

On Thu, Jan 15, 2015 at 9:18 AM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 The AMPLab maintains a bunch of Docker files for Spark here:
 https://github.com/amplab/docker-scripts

 Hasn't been updated since 1.0.0, but might be a good starting point.

 On Wed Jan 14 2015 at 12:14:13 PM Josh J joshjd...@gmail.com wrote:

 We have dockerized Spark Master and worker(s) separately and are using it
 in
 our dev environment.


 Is this setup available on github or dockerhub?

 On Tue, Dec 9, 2014 at 3:50 PM, Venkat Subramanian vsubr...@gmail.com
 wrote:

 We have dockerized Spark Master and worker(s) separately and are using
 it in
 our dev environment. We don't use Mesos though, running it in Standalone
 mode, but adding Mesos should not be that difficult I think.

 Regards

 Venkat



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/dockerized-spark-executor-on-mesos-tp20276p20603.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Framework handling of Mesos master change

2015-01-12 Thread Tim Chen

Hi Ethan,

How are you specifying the master to spark?

Able to recover from master failover is already handled by the underlying
Mesos scheduler, but you have to use zookeeper instead of directly passing
in the master uris.

Tim

On Mon, Jan 12, 2015 at 12:44 PM, Ethan Wolf ethan.w...@alum.mit.edu
wrote:

 We are running Spark and Spark Streaming on Mesos (with multiple masters
 for
 HA).
 At launch, our Spark jobs successfully look up the current Mesos master
 from
 zookeeper and spawn tasks.

 However, when the Mesos master changes while the spark job is executing,
 the
 spark driver seems to interact with the old Mesos master, and therefore
 fails to launch any new tasks.
 We are running long running Spark streaming jobs, so we have temporarily
 switched to coarse grained as a work around, but it prevents us from
 running
 in fine grained mode which we would prefer for some job.

 Looking at the code for MesosSchedulerBackend, it it has an empty
 implementation of the reregistered (and disconnected) methods, which I
 believe would be called when the master changes:

 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L202


 http://mesos.apache.org/documentation/latest/app-framework-development-guide/

 Are there any plans to implement master reregistration in the Spark
 framework, or does anyone have any suggested workarounds for long running
 jobs to deal with the mesos master changing?  (Or is there something we are
 doing wrong?)

 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Framework-handling-of-Mesos-master-change-tp21107.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Data locality running Spark on Mesos

2015-01-08 Thread Tim Chen

How did you run this benchmark, and is there a open version I can try it
with?

And what is your configurations, like spark.locality.wait, etc?

Tim

On Thu, Jan 8, 2015 at 11:44 AM, mvle m...@us.ibm.com wrote:

Hi,

I've noticed running Spark apps on Mesos is significantly slower compared
to
stand-alone or Spark on YARN.
I don't think it should be the case, so I am posting the problem here in
case someone has some explanation
or can point me to some configuration options i've missed.

I'm running the LinearRegression benchmark with a dataset of 48.8GB.
On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
I can finish the workload in about 5min (I don't remember exactly).
The data is loaded into HDFS spanning the same 10-node cluster.
There are 6 worker instances per node.

However, when running the same workload on the same cluster but now with
Spark on Mesos (course-grained mode), the execution time is somewhere
around
15min. Actually, I tried with find-grained mode and giving each Mesos node
6
VCPUs (to hopefully get 6 executors like the stand-alone test), I still get
roughly 15min.

I've noticed that when Spark is running on Mesos, almost all tasks execute
with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
stand-alone, the locality is mostly PROCESS_LOCAL.

I think this locality issue might be the reason for the slow down but I
can't figure out why, especially for coarse-grained mode as the executors
supposedly do not go away until job completion.

Any ideas?

Thanks,
Mike

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Can spark supports task level resource management?

2015-01-07 Thread Tim Chen

Hi Xuelin,

I can only speak about Mesos mode. There are two modes of management in
Spark's Mesos scheduler, which are fine-grain mode and coarse-grain mode.

In fine grain mode, each spark task launches one or more spark executors
that only live through the life time of the task. So it's comparable to
what you spoke about.

In coarse grain mode it's going to support dynamic allocation of executors
but that's being at a higher level than tasks.

As for resource management recommendation, I think it's important to see
what other applications you want to be running besides Spark in the same
cluster and also your use cases, to see what resource management fits your
need.

Tim


On Wed, Jan 7, 2015 at 10:55 PM, Xuelin Cao xuelincao2...@gmail.com wrote:


 Hi,

  Currently, we are building up a middle scale spark cluster (100
 nodes) in our company. One thing bothering us is, the how spark manages the
 resource (CPU, memory).

  I know there are 3 resource management modes: stand-along, Mesos, Yarn

  In the stand along mode, the cluster master simply allocates the
 resource when the application is launched. In this mode, suppose an
 engineer launches a spark-shell, claiming 100 CPU cores and 100G memory,
 but doing nothing. But the cluster master simply allocates the resource to
 this app even if the spark-shell does nothing. This is definitely not what
 we want.

  What we want is, the resource is allocated when the actual task is
 about to run. For example, in the map stage, the app may need 100 cores
 because the RDD has 100 partitions, while in the reduce stage, only 20
 cores is needed because the RDD is shuffled into 20 partitions.

  I'm not very clear about the granularity of the spark resource
 management. In the stand-along mode, the resource is allocated when the app
 is launched. What about Mesos and Yarn? Can they support task level
 resource management?

  And, what is the recommended mode for resource management? (Mesos?
 Yarn?)

  Thanks

Re: Can spark supports task level resource management?

2015-01-07 Thread Tim Chen

In coarse grain mode, the spark executors are launched and kept running
while the scheduler is running. So if you have a spark shell launched and
remained open, the executors are running and won't finish until the shell
is exited.

In fine grain mode, the overhead time mostly comes from downloading the
spark tar (if it's not already deployed in the slaves) and launching the
spark executor. I suggest you try it out and look at the latency to see if
it fits your use case or not.

Tim

On Wed, Jan 7, 2015 at 11:19 PM, Xuelin Cao xuelincao2...@gmail.com wrote:


 Hi,

  Thanks for the information.

  One more thing I want to clarify, when does Mesos or Yarn allocate
 and release the resource? Aka, what is the resource life time?

  For example, in the stand-along mode, the resource is allocated when
 the application is launched, resource released when the application
 finishes.

  Then, it looks like, in the Mesos fine-grain mode, the resource is
 allocated when the task is about to run; and released when the task
 finishes.

  How about Mesos coarse-grain mode and Yarn mode?  Is the resource
 managed on the Job level? Aka, the resource life time equals the job life
 time? Or on the stage level?

  One more question for the Mesos fine-grain mode. How is the overhead
 of resource allocation and release? In MapReduce, a noticeable time is
 spend on waiting the resource allocation. What is Mesos fine-grain mode?



 On Thu, Jan 8, 2015 at 3:07 PM, Tim Chen t...@mesosphere.io wrote:

 Hi Xuelin,

 I can only speak about Mesos mode. There are two modes of management in
 Spark's Mesos scheduler, which are fine-grain mode and coarse-grain mode.

 In fine grain mode, each spark task launches one or more spark executors
 that only live through the life time of the task. So it's comparable to
 what you spoke about.

 In coarse grain mode it's going to support dynamic allocation of
 executors but that's being at a higher level than tasks.

 As for resource management recommendation, I think it's important to see
 what other applications you want to be running besides Spark in the same
 cluster and also your use cases, to see what resource management fits your
 need.

 Tim


 On Wed, Jan 7, 2015 at 10:55 PM, Xuelin Cao xuelincao2...@gmail.com
 wrote:


 Hi,

  Currently, we are building up a middle scale spark cluster (100
 nodes) in our company. One thing bothering us is, the how spark manages the
 resource (CPU, memory).

  I know there are 3 resource management modes: stand-along, Mesos,
 Yarn

  In the stand along mode, the cluster master simply allocates the
 resource when the application is launched. In this mode, suppose an
 engineer launches a spark-shell, claiming 100 CPU cores and 100G memory,
 but doing nothing. But the cluster master simply allocates the resource to
 this app even if the spark-shell does nothing. This is definitely not what
 we want.

  What we want is, the resource is allocated when the actual task is
 about to run. For example, in the map stage, the app may need 100 cores
 because the RDD has 100 partitions, while in the reduce stage, only 20
 cores is needed because the RDD is shuffled into 20 partitions.

  I'm not very clear about the granularity of the spark resource
 management. In the stand-along mode, the resource is allocated when the app
 is launched. What about Mesos and Yarn? Can they support task level
 resource management?

  And, what is the recommended mode for resource management? (Mesos?
 Yarn?)

  Thanks

Fwd: Controlling number of executors on Mesos vs YARN

2015-01-05 Thread Tim Chen

Forgot to hit reply-all.

-- Forwarded message --
From: Tim Chen t...@mesosphere.io
Date: Sun, Jan 4, 2015 at 10:46 PM
Subject: Re: Controlling number of executors on Mesos vs YARN
To: mvle m...@us.ibm.com

Hi Mike,

You're correct there is no such setting in for Mesos coarse grain mode,
since the assumption is that each node is launched with one container and
Spark is launching multiple tasks in that container.

In fine-grain mode there isn't a setting like that, as it currently will
launch an executor as long as it satisfies the minimum container resource
requirement.

I've created a JIRA earlier about capping the number of executors or better
distribute the # of executors launched in each node. Since the decision of
choosing what node to launch containers is all in the Spark scheduler side,
it's very easy to modify it.

Btw, what's the configuration to set the # of executors on YARN side?

Thanks,

Tim

On Sun, Jan 4, 2015 at 9:37 PM, mvle m...@us.ibm.com wrote:

 I'm trying to compare the performance of Spark running on Mesos vs YARN.
 However, I am having problems being able to configure the Spark workload to
 run in a similar way on Mesos and YARN.

 When running Spark on YARN, you can specify the number of executors per
 node. So if I have a node with 4 CPUs, I can specify 6 executors on that
 node. When running Spark on Mesos, there doesn't seem to be an equivalent
 way to specify this. In Mesos, you can somewhat force this by specifying
 the
 number of CPU resources to be 6 when running the slave daemon. However,
 this
 seems to be a static configuration of the Mesos cluster rather something
 that can be configured in the Spark framework.

 So here is my question:

 For Spark on Mesos, am I correct that there is no way to control the number
 of executors per node (assuming an idle cluster)? For Spark on Mesos
 coarse-grained mode, there is a way to specify max_cores but that is still
 not equivalent to specifying the number of executors per node as when Spark
 is run on YARN.

 If I am correct, then it seems Spark might be at a disadvantage running on
 Mesos compared to YARN (since it lacks the fine tuning ability provided by
 YARN).

 Thanks,
 Mike

 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Controlling-number-of-executors-on-Mesos-vs-YARN-tp20966.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Mesos resource allocation

2015-01-05 Thread Tim Chen

Hi Josh,

I see, I haven't heard folks using larger JVM heap size than you mentioned
(30gb), but in your scenario what you're proposing does make sense.

I've created SPARK-5095 and we can continue our discussion about how to
address this.

Tim


On Mon, Jan 5, 2015 at 1:22 AM, Josh Devins j...@soundcloud.com wrote:

 Hey Tim, sorry for the delayed reply, been on vacation for a couple weeks.

 The reason we want to control the number of executors is that running
 executors with JVM heaps over 30GB causes significant garbage
 collection problems. We have observed this through much
 trial-and-error for jobs that are a dozen-or-so stages, running for
 more than ~20m. For example, if we run 8 executors with 60GB heap each
 (for example, we have also other values larger than 30GB), even after
 much tuning of heap parameters (% for RDD cache, etc.) we run into GC
 problems. Effectively GC becomes so high that it takes over all
 compute time from the JVM. If we then halve the heap (30GB) but double
 the number of executors (16), all GC problems are relieved and we get
 to use the full memory resources of the cluster. We talked with some
 engineers from Databricks at Strata in Barcelona recently and received
 the same advice — do not run executors with more than 30GB heaps.
 Since our machines are 64GB machines and we are typically only running
 one or two jobs at a time on the cluster (for now), we can only use
 half the cluster memory with the current configuration options
 available in Mesos.

 Happy to hear your thoughts and actually very curious about how others
 are running Spark on Mesos with large heaps (as a result of large
 memory machines). Perhaps this is a non-issue when we have more
 multi-tenancy in the cluster, but for now, this is not the case.

 Thanks,

 Josh


 On 24 December 2014 at 06:22, Tim Chen t...@mesosphere.io wrote:
 
  Hi Josh,
 
  If you want to cap the amount of memory per executor in Coarse grain
 mode, then yes you only get 240GB of memory as you mentioned. What's the
 reason you don't want to raise the capacity of memory you use per executor?
 
  In coarse grain mode the Spark executor is long living and it internally
 will get tasks distributed by Spark internal Coarse grained scheduler. I
 think the assumption is that it already allocated the maximum available on
 that slave and don't really assume we need another one.
 
  I think it's worth considering having a configuration of number of cores
 per executor, especially when Mesos have inverse offers and optimistic
 offers so we can choose to launch more executors when resources becomes
 available even in coarse grain mode and then support giving the executors
 back but more higher priority tasks arrive.
 
  For fine grain mode, the spark executors are started by Mesos executors
 that is configured from Mesos scheduler backend. I believe the RDD is
 cached as long as the Mesos executor is running as the BlockManager is
 created on executor registration.
 
  Let me know if you need any more info.
 
  Tim
 
 
 
  -- Forwarded message --
  From: Josh Devins j...@soundcloud.com
  Date: 22 December 2014 at 17:23
  Subject: Mesos resource allocation
  To: user@spark.apache.org
 
 
  We are experimenting with running Spark on Mesos after running
  successfully in Standalone mode for a few months. With the Standalone
  resource manager (as well as YARN), you have the option to define the
  number of cores, number of executors and memory per executor. In
  Mesos, however, it appears as though you cannot specify the number of
  executors, even in coarse-grained mode. If this is the case, how do
  you define the number of executors to run with?
 
  Here's an example of why this matters (to us). Let's say we have the
  following cluster:
 
  num nodes: 8
  num cores: 256 (32 per node)
  total memory: 512GB (64GB per node)
 
  If I set my job to require 256 cores and per-executor-memory to 30GB,
  then Mesos will schedule a single executor per machine (8 executors
  total) and each executor will get 32 cores to work with. This means
  that we have 8 executors * 32GB each for a total of 240G of cluster
  memory in use — less than half of what is available. If you want
  actually 16 executors in order to increase the amount of memory in use
  across the cluster, how can you do this with Mesos? It seems that a
  parameter is missing (or I haven't found it yet) which lets me tune
  this for Mesos:
   * number of executors per n-cores OR
   * number of executors total
 
  Furthermore, in fine-grained mode in Mesos, how are the executors
  started/allocated? That is, since Spark tasks map to Mesos tasks, when
  and how are executors started? If they are transient and an executor
  per task is created, does this mean we cannot have cached RDDs?
 
  Thanks for any advice or pointers,
 
  Josh

Fwd: Mesos resource allocation

2014-12-23 Thread Tim Chen

Hi Josh,

If you want to cap the amount of memory per executor in Coarse grain mode,
then yes you only get 240GB of memory as you mentioned. What's the reason
you don't want to raise the capacity of memory you use per executor?

In coarse grain mode the Spark executor is long living and it internally
will get tasks distributed by Spark internal Coarse grained scheduler. I
think the assumption is that it already allocated the maximum available on
that slave and don't really assume we need another one.

I think it's worth considering having a configuration of number of cores
per executor, especially when Mesos have inverse offers and optimistic
offers so we can choose to launch more executors when resources becomes
available even in coarse grain mode and then support giving the executors
back but more higher priority tasks arrive.

For fine grain mode, the spark executors are started by Mesos executors
that is configured from Mesos scheduler backend. I believe the RDD is
cached as long as the Mesos executor is running as the BlockManager is
created on executor registration.

Let me know if you need any more info.

Tim



 -- Forwarded message --
 From: Josh Devins j...@soundcloud.com
 Date: 22 December 2014 at 17:23
 Subject: Mesos resource allocation
 To: user@spark.apache.org


 We are experimenting with running Spark on Mesos after running
 successfully in Standalone mode for a few months. With the Standalone
 resource manager (as well as YARN), you have the option to define the
 number of cores, number of executors and memory per executor. In
 Mesos, however, it appears as though you cannot specify the number of
 executors, even in coarse-grained mode. If this is the case, how do
 you define the number of executors to run with?

 Here's an example of why this matters (to us). Let's say we have the
 following cluster:

 num nodes: 8
 num cores: 256 (32 per node)
 total memory: 512GB (64GB per node)

 If I set my job to require 256 cores and per-executor-memory to 30GB,
 then Mesos will schedule a single executor per machine (8 executors
 total) and each executor will get 32 cores to work with. This means
 that we have 8 executors * 32GB each for a total of 240G of cluster
 memory in use — less than half of what is available. If you want
 actually 16 executors in order to increase the amount of memory in use
 across the cluster, how can you do this with Mesos? It seems that a
 parameter is missing (or I haven't found it yet) which lets me tune
 this for Mesos:
  * number of executors per n-cores OR
  * number of executors total

 Furthermore, in fine-grained mode in Mesos, how are the executors
 started/allocated? That is, since Spark tasks map to Mesos tasks, when
 and how are executors started? If they are transient and an executor
 per task is created, does this mean we cannot have cached RDDs?

 Thanks for any advice or pointers,

 Josh

54 matches

Mail list logo