Re: How does spark manage the memory of executor with multiple tasks

Arush Kharbanda Tue, 26 May 2015 02:57:07 -0700

Hi Evo,

Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you
would be able to run multiple executors on the same JVM/worker.


https://issues.apache.org/jira/browse/SPARK-1706.

Thanks
Arush

On Tue, May 26, 2015 at 2:54 PM, canan chen <ccn...@gmail.com> wrote:

> I think the concept of task in spark should be on the same level of task
> in MR. Usually in MR, we need to specify the memory the each mapper/reducer
> task. And I believe executor is not a user-facing concept, it's a spark
> internal concept. For spark users they don't need to know the concept of
> executor, but need to know the concept of task.
>
> On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com>
> wrote:
>
>> This is the first time I hear that “one can specify the RAM per task” –
>> the RAM is granted per Executor (JVM). On the other hand each Task operates
>> on ONE RDD Partition – so you can say that this is “the RAM allocated to
>> the Task to process” – but it is still within the boundaries allocated to
>> the Executor (JVM) within which the Task is running. Also while running,
>> any Task like any JVM Thread can request as much additional RAM e.g. for
>> new Object instances  as there is available in the Executor aka JVM Heap
>>
>>
>>
>> *From:* canan chen [mailto:ccn...@gmail.com]
>> *Sent:* Tuesday, May 26, 2015 9:30 AM
>> *To:* Evo Eftimov
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: How does spark manage the memory of executor with
>> multiple tasks
>>
>>
>>
>> Yes, I know that one task represent a JVM thread. This is what I
>> confused. Usually users want to specify the memory on task level, so how
>> can I do it if task if thread level and multiple tasks runs in the same
>> executor. And even I don't know how many threads there will be. Besides
>> that, if one task cause OOM, it would cause other tasks in the same
>> executor fail too. There's no isolation between tasks.
>>
>>
>>
>> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov <evo.efti...@isecc.com>
>> wrote:
>>
>> An Executor is a JVM instance spawned and running on a Cluster Node
>> (Server machine). Task is essentially a JVM Thread – you can have as many
>> Threads as you want per JVM. You will also hear about “Executor Slots” –
>> these are essentially the CPU Cores available on the machine and granted
>> for use to the Executor
>>
>>
>>
>> Ps: what creates ongoing confusion here is that the Spark folks have
>> “invented” their own terms to describe the design of their what is
>> essentially a Distributed OO Framework facilitating Parallel Programming
>> and Data Management in a Distributed Environment, BUT have not provided
>> clear dictionary/explanations linking these “inventions” with standard
>> concepts familiar to every Java, Scala etc developer
>>
>>
>>
>> *From:* canan chen [mailto:ccn...@gmail.com]
>> *Sent:* Tuesday, May 26, 2015 9:02 AM
>> *To:* user@spark.apache.org
>> *Subject:* How does spark manage the memory of executor with multiple
>> tasks
>>
>>
>>
>> Since spark can run multiple tasks in one executor, so I am curious to
>> know how does spark manage memory across these tasks. Say if one executor
>> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
>> then each task can consume 100MB on average. Do I understand it correctly ?
>> It doesn't make sense to me that spark run multiple tasks in one executor.
>>
>>
>>
>
>


-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: How does spark manage the memory of executor with multiple tasks

Reply via email to