Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-11 Thread Tien Dat
Thanks for your suggestion. I have been checking Spark-jobserver. Just a off-topic question about this project: Does Apache Spark project have any support/connection to this Spark-jobserver project? I noticed that they do not have release for the newest version of Spark (e.g., 2.3.1). As you

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-10 Thread Mark Hamstra
ency? > >> > >> Best > >> > >> > >> > >> -- > >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > >> > >> - > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > >> > > > Quoted from: > > http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-on-MESOS-Avoid-re-fetching-Spark-binary-tp32849p32865.html > > > _ > Sent from http://apache-spark-user-list.1001560.n3.nabble.com > >

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-07 Thread Mark Hamstra
Essentially correct. The latency to start a Spark Job is nowhere close to 2-4 seconds under typical conditions. Creating a new Spark Application every time instead of running multiple Jobs in one Application is not going to lead to acceptable interactive or real-time performance, nor is that an

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Mark Hamstra
The latency to start a Spark Job is nowhere close to 2-4 seconds under typical conditions. You appear to be creating a new Spark Application everytime instead of running multiple Jobs in one Application. On Fri, Jul 6, 2018 at 3:12 AM Tien Dat wrote: > Dear Timothy, > > It works like a charm

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Timothy Chen
I know there are some community efforts shown in Spark summits before, mostly around reusing the same Spark context with multiple “jobs”. I don’t think reducing Spark job startup time is a community priority afaik. Tim On Fri, Jul 6, 2018 at 7:12 PM Tien Dat wrote: > Dear Timothy, > > It works

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Tien Dat
Dear Timothy, It works like a charm now. BTW (don't judge me if I am to greedy :-)), the latency to start a Spark job is around 2-4 seconds, unless I am not aware of some awesome optimization on Spark. Do you know if Spark community is working on reducing this latency? Best -- Sent from:

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Timothy Chen
Got it, then you can have an extracted Spark directory on each host on the same location, and don’t specify SPARK_EXECUTOR_URI. Instead, set spark.mesos.executor.home to that directory. This should effectively do what you want, which avoids extracting and fetching and just executed the command.

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Tien Dat
Thank you for your answer. The think it I actually pointed to a local binary file. And Mesos locally copied the binary file to a specific folder in /var/lib/mesos/... and extract it to every time it launched an Spark executor. With the fetch cache, the copy time is reduced, but the reduction is

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Timothy Chen
If its available locally on each host, then don’t specify a remote url but a local file uri instead. We have a fetcher cache in Mesos a while ago, I believe there is integration in the Spark framework if you look at the documentation as well. With the fetcher cache enabled Mesos agent will cache

[SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Tien Dat
Dear all, We are running Spark with Mesos as the master for resource management. In our cluster, there are jobs that require very short response time (near real time applications), which usually around 3-5 seconds. In order to Spark to execute with Mesos, one has to specify the