Thanks for your answer David,

It is as I thought then. When you write that there could be some approaches
to solve this using Yarn or Mesos, can you give any idea about this? Or
better yet, is there any site with documentation about this issue?
Currently, we are launching our jobs using Yarn, but still we do not know
how to properly schedule our jobs to have the highest utilization of our
cluster.

Best

On Tue, Jan 13, 2015 at 2:12 AM, Buttler, David <buttl...@llnl.gov> wrote:

>  Spark has a built-in cluster manager (The spark stand-alone cluster),
> but it is not designed for multi-tenancy –either multiple people using the
> system, or multiple tasks sharing resources.  It is a first in, first out
> queue of tasks where tasks will block until the previous tasks are finished
> (as you described).  If you want to have higher utilization of your
> cluster, then you should use either Yarn or Mesos to schedule the system.
> The same issues will come up, but they have a much broader range of
> approaches that you can take to solve the problem.
>
>
>
> Dave
>
>
>
> *From:* Luis Guerra [mailto:luispelay...@gmail.com]
> *Sent:* Monday, January 12, 2015 8:36 AM
> *To:* user
> *Subject:* Spark executors resources. Blocking?
>
>
>
> Hello all,
>
>
>
> I have a naive question regarding how spark uses the executors in a
> cluster of machines. Imagine the scenario in which I do not know the input
> size of my data in execution A, so I set Spark to use 20 (out of my 25
> nodes, for instance). At the same time, I also launch a second execution B,
> setting Spark to use 10 nodes for this.
>
>
>
> Assuming a huge input size for execution A, which implies an execution
> time of 30 minutes for example (using all the resources), and a constant
> execution time for B of 10 minutes, then both executions will last for 40
> minutes (I assume that B cannot be launched until 10 resources are
> completely available, when A finishes).
>
>
>
> Now, assuming a very small input size for execution A running for 5
> minutes in only 2 of the 20 planned resources, I would like execution B to
> be launched at that time, consuming both executions only 10 minutes (and 12
> resources). However, as execution A has set Spark to use 20 resources,
> execution B has to wait until A has finished, so the total execution time
> lasts for 15 minutes.
>
>
>
> Is this right? If so, how can I solve this kind of scenarios? If I am
> wrong, what would be the correct interpretation for this?
>
>
>
> Thanks in advance,
>
>
>
> Best
>

Reply via email to