Re: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread canan chen
Thanks Yong, this is very helpful. And found ShuffleMemoryManager which is
used to allocate memory across tasks in one executor.

>> These 2 tasks have to share the 2G heap memory. I don't think specifying
the memory per task is a good idea, as task is running in the Thread level,
and Memory only apply for the JVM processor.

Yes, it's not a good idea to specify memory per task in spark due to that
it can run multiple tasks in one executor. Actually this is what confuse
me. In spark, I can only specify total cores + memory per executor for each
app,
Based on this I can calculate how many cores/tasks per executor and then
estimate how much memory each task can consume on average.  I am not sure
whether this is a good idea to that in this way. Because to users, they
think the job on the task level and consider tasks are independent and
isolated. But at runtime, the tasks may run in one JVM and share the memory
they specify per executor. This is confused to me at least.



On Wed, May 27, 2015 at 9:06 PM, java8964  wrote:

> Same as you, there are lots of people coming from MapReduce world, and try
> to understand the internals of Spark. Hope below can help you some way.
>
> For the end users, they only have concept of Job. I want to run a word
> count job from this one big file, that is the job I want to run. How many
> stages and tasks this job will generate depends on the file size and
> parallelism you specify in your job.
>
> For word count, it will generate 2 stages, as we have shuffle in it,
> thinking it the same way as Mapper and Reducer part.
>
> If the file is 1280M size in HDFS with 128M block, so the first stage will
> generate 10 tasks. If you use the default parallelism in spark, the 2nd
> stage should generate 200 tasks.
>
> Forget about Executors right now, so the above job will have 210 tasks to
> run. In the standalone mode, you need to specify the cores and memory for
> your job. Let's assume you have 5 worker nodes with 4 cores + 8G each. Now,
> if you ask 10 cores and 2G per executor, and cluster does have the enough
> resources available, then you will get 1 executor from each work node, with
> 2 cores + 2G per executor to run your job.
> In this case, first 10 tasks in the stage one can start concurrently at
> the same time, after that, every 10 tasks in stage 2 can be run
> concurrently. You get 5 executors, as you have 5 worker nodes. There is a
> coming feature to start multi executors per worker, but we are talking
> about the normally case here. In fact, you can start multi workers in one
> physical box, if you have enough resource.
>
> In the above case, 2 tasks will be run concurrently per executor. You
> control this by specify how many cores you want for your job, plus how many
> workers in your cluster as pre configured. These 2 tasks have to share the
> 2G heap memory. I don't think specifying the memory per task is a good
> idea, as task is running in the Thread level, and Memory only apply for the
> JVM processor.
>
> In MR, every mapper and reducer match to a java processing, but in spark,
> the task is just matching with a thread/core.
>
> In Spark, memory tuning is more like an art, but still have lot of rules
> to follow. In the above case, you can increase the parallelism to 400, then
> you will have 400 tasks in the stage 2, so each task will come with less
> data, provided you have much large unique words in the file. Or you can
> lower the cores from 10 to 5, then each executor will only process one task
> at a time, but your job will run slower.
>
> Overall, you want to max the parallelism to gain the best speed, but also
> make sure the memory is enough for your job at this speed, to avoid OOM. It
> is a balance.
>
> Keep in mind:
>
>- Cluster pre-config with number of workers with total cores + max
>heap memory you can ask
>- Per application, you specify total cores you want + heap memory per
>executor
>- In your application, you can specify the parallelism level, as lots
>of "Action" supporting it. So parallelism is dynamic, from job to job, or
>even from stage to stage.
>
>
> Yong
>
> --
> Date: Wed, 27 May 2015 15:48:57 +0800
> Subject: Re: How does spark manage the memory of executor with multiple
> tasks
> From: ccn...@gmail.com
> To: evo.efti...@isecc.com
> CC: ar...@sigmoidanalytics.com; user@spark.apache.org
>
>
> Does anyone can answer my question ? I am curious to know if there's
> multiple reducer tasks in one executor, how to allocate memory between
> these reducers tasks since each shuffle will consume a lot of memory ?
>
> On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov 
> wrote:
>
>  the link you sent says multiple 

RE: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread java8964
Same as you, there are lots of people coming from MapReduce world, and try to 
understand the internals of Spark. Hope below can help you some way.
For the end users, they only have concept of Job. I want to run a word count 
job from this one big file, that is the job I want to run. How many stages and 
tasks this job will generate depends on the file size and parallelism you 
specify in your job.
For word count, it will generate 2 stages, as we have shuffle in it, thinking 
it the same way as Mapper and Reducer part.
If the file is 1280M size in HDFS with 128M block, so the first stage will 
generate 10 tasks. If you use the default parallelism in spark, the 2nd stage 
should generate 200 tasks.
Forget about Executors right now, so the above job will have 210 tasks to run. 
In the standalone mode, you need to specify the cores and memory for your job. 
Let's assume you have 5 worker nodes with 4 cores + 8G each. Now, if you ask 10 
cores and 2G per executor, and cluster does have the enough resources 
available, then you will get 1 executor from each work node, with 2 cores + 2G 
per executor to run your job.In this case, first 10 tasks in the stage one can 
start concurrently at the same time, after that, every 10 tasks in stage 2 can 
be run concurrently. You get 5 executors, as you have 5 worker nodes. There is 
a coming feature to start multi executors per worker, but we are talking about 
the normally case here. In fact, you can start multi workers in one physical 
box, if you have enough resource.
In the above case, 2 tasks will be run concurrently per executor. You control 
this by specify how many cores you want for your job, plus how many workers in 
your cluster as pre configured. These 2 tasks have to share the 2G heap memory. 
I don't think specifying the memory per task is a good idea, as task is running 
in the Thread level, and Memory only apply for the JVM processor. 
In MR, every mapper and reducer match to a java processing, but in spark, the 
task is just matching with a thread/core.
In Spark, memory tuning is more like an art, but still have lot of rules to 
follow. In the above case, you can increase the parallelism to 400, then you 
will have 400 tasks in the stage 2, so each task will come with less data, 
provided you have much large unique words in the file. Or you can lower the 
cores from 10 to 5, then each executor will only process one task at a time, 
but your job will run slower.
Overall, you want to max the parallelism to gain the best speed, but also make 
sure the memory is enough for your job at this speed, to avoid OOM. It is a 
balance.
Keep in mind:Cluster pre-config with number of workers with total cores + max 
heap memory you can askPer application, you specify total cores you want + heap 
memory per executorIn your application, you can specify the parallelism level, 
as lots of "Action" supporting it. So parallelism is dynamic, from job to job, 
or even from stage to stage.
Yong
Date: Wed, 27 May 2015 15:48:57 +0800
Subject: Re: How does spark manage the memory of executor with multiple tasks
From: ccn...@gmail.com
To: evo.efti...@isecc.com
CC: ar...@sigmoidanalytics.com; user@spark.apache.org

Does anyone can answer my question ? I am curious to know if there's multiple 
reducer tasks in one executor, how to allocate memory between these reducers 
tasks since each shuffle will consume a lot of memory ?
On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov  wrote:
 the link you sent says multiple executors per node
Worker is just demon process launching Executors / JVMs so it can execute tasks 
- it does that by cooperating with the master and the driver 
There is a one to one maping between Executor and JVM 

Sent from Samsung Mobile

 Original message From: Arush Kharbanda  Date:2015/05/26  10:55 
 (GMT+00:00) To: canan chen  Cc: Evo Eftimov ,user@spark.apache.org Subject: 
Re: How does spark manage the memory of executor with multiple tasks 
Hi Evo,
Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you 
would be able to run multiple executors on the same JVM/worker.
https://issues.apache.org/jira/browse/SPARK-1706.

ThanksArush
On Tue, May 26, 2015 at 2:54 PM, canan chen  wrote:
I think the concept of task in spark should be on the same level of task in MR. 
Usually in MR, we need to specify the memory the each mapper/reducer task. And 
I believe executor is not a user-facing concept, it's a spark internal concept. 
For spark users they don't need to know the concept of executor, but need to 
know the concept of task. 
On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov  wrote:
This is the first time I hear that “one can specify the RAM per task” – the RAM 
is granted per Executor (JVM). On the other hand each Task operates on ONE RDD 
Partition – so you can say that this is “the RAM allocated to the Task to 
process” – but it is still within the boundaries allocated to the Executor 
(JVM) within wh

Re: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread canan chen
Does anyone can answer my question ? I am curious to know if there's
multiple reducer tasks in one executor, how to allocate memory between
these reducers tasks since each shuffle will consume a lot of memory ?

On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov  wrote:

>  the link you sent says multiple executors per node
>
> Worker is just demon process launching Executors / JVMs so it can execute
> tasks - it does that by cooperating with the master and the driver
>
> There is a one to one maping between Executor and JVM
>
>
> Sent from Samsung Mobile
>
>
>  Original message 
> From: Arush Kharbanda
> Date:2015/05/26 10:55 (GMT+00:00)
> To: canan chen
> Cc: Evo Eftimov ,user@spark.apache.org
> Subject: Re: How does spark manage the memory of executor with multiple
> tasks
>
> Hi Evo,
>
> Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you
> would be able to run multiple executors on the same JVM/worker.
>
> https://issues.apache.org/jira/browse/SPARK-1706.
>
> Thanks
> Arush
>
> On Tue, May 26, 2015 at 2:54 PM, canan chen  wrote:
>
>> I think the concept of task in spark should be on the same level of task
>> in MR. Usually in MR, we need to specify the memory the each mapper/reducer
>> task. And I believe executor is not a user-facing concept, it's a spark
>> internal concept. For spark users they don't need to know the concept of
>> executor, but need to know the concept of task.
>>
>> On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov 
>> wrote:
>>
>>> This is the first time I hear that “one can specify the RAM per task” –
>>> the RAM is granted per Executor (JVM). On the other hand each Task operates
>>> on ONE RDD Partition – so you can say that this is “the RAM allocated to
>>> the Task to process” – but it is still within the boundaries allocated to
>>> the Executor (JVM) within which the Task is running. Also while running,
>>> any Task like any JVM Thread can request as much additional RAM e.g. for
>>> new Object instances  as there is available in the Executor aka JVM Heap
>>>
>>>
>>>
>>> *From:* canan chen [mailto:ccn...@gmail.com]
>>> *Sent:* Tuesday, May 26, 2015 9:30 AM
>>> *To:* Evo Eftimov
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: How does spark manage the memory of executor with
>>> multiple tasks
>>>
>>>
>>>
>>> Yes, I know that one task represent a JVM thread. This is what I
>>> confused. Usually users want to specify the memory on task level, so how
>>> can I do it if task if thread level and multiple tasks runs in the same
>>> executor. And even I don't know how many threads there will be. Besides
>>> that, if one task cause OOM, it would cause other tasks in the same
>>> executor fail too. There's no isolation between tasks.
>>>
>>>
>>>
>>> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov 
>>> wrote:
>>>
>>> An Executor is a JVM instance spawned and running on a Cluster Node
>>> (Server machine). Task is essentially a JVM Thread – you can have as many
>>> Threads as you want per JVM. You will also hear about “Executor Slots” –
>>> these are essentially the CPU Cores available on the machine and granted
>>> for use to the Executor
>>>
>>>
>>>
>>> Ps: what creates ongoing confusion here is that the Spark folks have
>>> “invented” their own terms to describe the design of their what is
>>> essentially a Distributed OO Framework facilitating Parallel Programming
>>> and Data Management in a Distributed Environment, BUT have not provided
>>> clear dictionary/explanations linking these “inventions” with standard
>>> concepts familiar to every Java, Scala etc developer
>>>
>>>
>>>
>>> *From:* canan chen [mailto:ccn...@gmail.com]
>>> *Sent:* Tuesday, May 26, 2015 9:02 AM
>>> *To:* user@spark.apache.org
>>> *Subject:* How does spark manage the memory of executor with multiple
>>> tasks
>>>
>>>
>>>
>>> Since spark can run multiple tasks in one executor, so I am curious to
>>> know how does spark manage memory across these tasks. Say if one executor
>>> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
>>> then each task can consume 100MB on average. Do I understand it correctly ?
>>> It doesn't make sense to me that spark run multiple tasks in one executor.
>>>
>>>
>>>
>>
>>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>


Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Evo Eftimov
 the link you sent says multiple executors per node

Worker is just demon process launching Executors / JVMs so it can execute tasks 
- it does that by cooperating with the master and the driver 

There is a one to one maping between Executor and JVM 


Sent from Samsung Mobile

 Original message From: Arush Kharbanda 
 Date:2015/05/26  10:55  (GMT+00:00) 
To: canan chen  Cc: Evo Eftimov 
,user@spark.apache.org Subject: Re: How does 
spark manage the memory of executor with multiple tasks 
Hi Evo,

Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you 
would be able to run multiple executors on the same JVM/worker.

https://issues.apache.org/jira/browse/SPARK-1706.

Thanks
Arush

On Tue, May 26, 2015 at 2:54 PM, canan chen  wrote:
I think the concept of task in spark should be on the same level of task in MR. 
Usually in MR, we need to specify the memory the each mapper/reducer task. And 
I believe executor is not a user-facing concept, it's a spark internal concept. 
For spark users they don't need to know the concept of executor, but need to 
know the concept of task. 

On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov  wrote:
This is the first time I hear that “one can specify the RAM per task” – the RAM 
is granted per Executor (JVM). On the other hand each Task operates on ONE RDD 
Partition – so you can say that this is “the RAM allocated to the Task to 
process” – but it is still within the boundaries allocated to the Executor 
(JVM) within which the Task is running. Also while running, any Task like any 
JVM Thread can request as much additional RAM e.g. for new Object instances  as 
there is available in the Executor aka JVM Heap  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:30 AM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: How does spark manage the memory of executor with multiple tasks

 

Yes, I know that one task represent a JVM thread. This is what I confused. 
Usually users want to specify the memory on task level, so how can I do it if 
task if thread level and multiple tasks runs in the same executor. And even I 
don't know how many threads there will be. Besides that, if one task cause OOM, 
it would cause other tasks in the same executor fail too. There's no isolation 
between tasks.  

 

On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov  wrote:

An Executor is a JVM instance spawned and running on a Cluster Node (Server 
machine). Task is essentially a JVM Thread – you can have as many Threads as 
you want per JVM. You will also hear about “Executor Slots” – these are 
essentially the CPU Cores available on the machine and granted for use to the 
Executor

 

Ps: what creates ongoing confusion here is that the Spark folks have “invented” 
their own terms to describe the design of their what is essentially a 
Distributed OO Framework facilitating Parallel Programming and Data Management 
in a Distributed Environment, BUT have not provided clear 
dictionary/explanations linking these “inventions” with standard concepts 
familiar to every Java, Scala etc developer  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:02 AM
To: user@spark.apache.org
Subject: How does spark manage the memory of executor with multiple tasks

 

Since spark can run multiple tasks in one executor, so I am curious to know how 
does spark manage memory across these tasks. Say if one executor takes 1GB 
memory, then if this executor can run 10 tasks simultaneously, then each task 
can consume 100MB on average. Do I understand it correctly ? It doesn't make 
sense to me that spark run multiple tasks in one executor. 

 





-- 


Arush Kharbanda || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Arush Kharbanda
Hi Evo,

Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you
would be able to run multiple executors on the same JVM/worker.

https://issues.apache.org/jira/browse/SPARK-1706.

Thanks
Arush

On Tue, May 26, 2015 at 2:54 PM, canan chen  wrote:

> I think the concept of task in spark should be on the same level of task
> in MR. Usually in MR, we need to specify the memory the each mapper/reducer
> task. And I believe executor is not a user-facing concept, it's a spark
> internal concept. For spark users they don't need to know the concept of
> executor, but need to know the concept of task.
>
> On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov 
> wrote:
>
>> This is the first time I hear that “one can specify the RAM per task” –
>> the RAM is granted per Executor (JVM). On the other hand each Task operates
>> on ONE RDD Partition – so you can say that this is “the RAM allocated to
>> the Task to process” – but it is still within the boundaries allocated to
>> the Executor (JVM) within which the Task is running. Also while running,
>> any Task like any JVM Thread can request as much additional RAM e.g. for
>> new Object instances  as there is available in the Executor aka JVM Heap
>>
>>
>>
>> *From:* canan chen [mailto:ccn...@gmail.com]
>> *Sent:* Tuesday, May 26, 2015 9:30 AM
>> *To:* Evo Eftimov
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: How does spark manage the memory of executor with
>> multiple tasks
>>
>>
>>
>> Yes, I know that one task represent a JVM thread. This is what I
>> confused. Usually users want to specify the memory on task level, so how
>> can I do it if task if thread level and multiple tasks runs in the same
>> executor. And even I don't know how many threads there will be. Besides
>> that, if one task cause OOM, it would cause other tasks in the same
>> executor fail too. There's no isolation between tasks.
>>
>>
>>
>> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov 
>> wrote:
>>
>> An Executor is a JVM instance spawned and running on a Cluster Node
>> (Server machine). Task is essentially a JVM Thread – you can have as many
>> Threads as you want per JVM. You will also hear about “Executor Slots” –
>> these are essentially the CPU Cores available on the machine and granted
>> for use to the Executor
>>
>>
>>
>> Ps: what creates ongoing confusion here is that the Spark folks have
>> “invented” their own terms to describe the design of their what is
>> essentially a Distributed OO Framework facilitating Parallel Programming
>> and Data Management in a Distributed Environment, BUT have not provided
>> clear dictionary/explanations linking these “inventions” with standard
>> concepts familiar to every Java, Scala etc developer
>>
>>
>>
>> *From:* canan chen [mailto:ccn...@gmail.com]
>> *Sent:* Tuesday, May 26, 2015 9:02 AM
>> *To:* user@spark.apache.org
>> *Subject:* How does spark manage the memory of executor with multiple
>> tasks
>>
>>
>>
>> Since spark can run multiple tasks in one executor, so I am curious to
>> know how does spark manage memory across these tasks. Say if one executor
>> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
>> then each task can consume 100MB on average. Do I understand it correctly ?
>> It doesn't make sense to me that spark run multiple tasks in one executor.
>>
>>
>>
>
>


-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com


Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
I think the concept of task in spark should be on the same level of task in
MR. Usually in MR, we need to specify the memory the each mapper/reducer
task. And I believe executor is not a user-facing concept, it's a spark
internal concept. For spark users they don't need to know the concept of
executor, but need to know the concept of task.

On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov  wrote:

> This is the first time I hear that “one can specify the RAM per task” –
> the RAM is granted per Executor (JVM). On the other hand each Task operates
> on ONE RDD Partition – so you can say that this is “the RAM allocated to
> the Task to process” – but it is still within the boundaries allocated to
> the Executor (JVM) within which the Task is running. Also while running,
> any Task like any JVM Thread can request as much additional RAM e.g. for
> new Object instances  as there is available in the Executor aka JVM Heap
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:30 AM
> *To:* Evo Eftimov
> *Cc:* user@spark.apache.org
> *Subject:* Re: How does spark manage the memory of executor with multiple
> tasks
>
>
>
> Yes, I know that one task represent a JVM thread. This is what I confused.
> Usually users want to specify the memory on task level, so how can I do it
> if task if thread level and multiple tasks runs in the same executor. And
> even I don't know how many threads there will be. Besides that, if one task
> cause OOM, it would cause other tasks in the same executor fail too.
> There's no isolation between tasks.
>
>
>
> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov 
> wrote:
>
> An Executor is a JVM instance spawned and running on a Cluster Node
> (Server machine). Task is essentially a JVM Thread – you can have as many
> Threads as you want per JVM. You will also hear about “Executor Slots” –
> these are essentially the CPU Cores available on the machine and granted
> for use to the Executor
>
>
>
> Ps: what creates ongoing confusion here is that the Spark folks have
> “invented” their own terms to describe the design of their what is
> essentially a Distributed OO Framework facilitating Parallel Programming
> and Data Management in a Distributed Environment, BUT have not provided
> clear dictionary/explanations linking these “inventions” with standard
> concepts familiar to every Java, Scala etc developer
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:02 AM
> *To:* user@spark.apache.org
> *Subject:* How does spark manage the memory of executor with multiple
> tasks
>
>
>
> Since spark can run multiple tasks in one executor, so I am curious to
> know how does spark manage memory across these tasks. Say if one executor
> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
> then each task can consume 100MB on average. Do I understand it correctly ?
> It doesn't make sense to me that spark run multiple tasks in one executor.
>
>
>


RE: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Evo Eftimov
This is the first time I hear that “one can specify the RAM per task” – the RAM 
is granted per Executor (JVM). On the other hand each Task operates on ONE RDD 
Partition – so you can say that this is “the RAM allocated to the Task to 
process” – but it is still within the boundaries allocated to the Executor 
(JVM) within which the Task is running. Also while running, any Task like any 
JVM Thread can request as much additional RAM e.g. for new Object instances  as 
there is available in the Executor aka JVM Heap  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:30 AM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: How does spark manage the memory of executor with multiple tasks

 

Yes, I know that one task represent a JVM thread. This is what I confused. 
Usually users want to specify the memory on task level, so how can I do it if 
task if thread level and multiple tasks runs in the same executor. And even I 
don't know how many threads there will be. Besides that, if one task cause OOM, 
it would cause other tasks in the same executor fail too. There's no isolation 
between tasks.  

 

On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov  wrote:

An Executor is a JVM instance spawned and running on a Cluster Node (Server 
machine). Task is essentially a JVM Thread – you can have as many Threads as 
you want per JVM. You will also hear about “Executor Slots” – these are 
essentially the CPU Cores available on the machine and granted for use to the 
Executor 

 

Ps: what creates ongoing confusion here is that the Spark folks have “invented” 
their own terms to describe the design of their what is essentially a 
Distributed OO Framework facilitating Parallel Programming and Data Management 
in a Distributed Environment, BUT have not provided clear 
dictionary/explanations linking these “inventions” with standard concepts 
familiar to every Java, Scala etc developer  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:02 AM
To: user@spark.apache.org
Subject: How does spark manage the memory of executor with multiple tasks

 

Since spark can run multiple tasks in one executor, so I am curious to know how 
does spark manage memory across these tasks. Say if one executor takes 1GB 
memory, then if this executor can run 10 tasks simultaneously, then each task 
can consume 100MB on average. Do I understand it correctly ? It doesn't make 
sense to me that spark run multiple tasks in one executor. 

 



Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
Yes, I know that one task represent a JVM thread. This is what I confused.
Usually users want to specify the memory on task level, so how can I do it
if task if thread level and multiple tasks runs in the same executor. And
even I don't know how many threads there will be. Besides that, if one task
cause OOM, it would cause other tasks in the same executor fail too.
There's no isolation between tasks.

On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov  wrote:

> An Executor is a JVM instance spawned and running on a Cluster Node
> (Server machine). Task is essentially a JVM Thread – you can have as many
> Threads as you want per JVM. You will also hear about “Executor Slots” –
> these are essentially the CPU Cores available on the machine and granted
> for use to the Executor
>
>
>
> Ps: what creates ongoing confusion here is that the Spark folks have
> “invented” their own terms to describe the design of their what is
> essentially a Distributed OO Framework facilitating Parallel Programming
> and Data Management in a Distributed Environment, BUT have not provided
> clear dictionary/explanations linking these “inventions” with standard
> concepts familiar to every Java, Scala etc developer
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:02 AM
> *To:* user@spark.apache.org
> *Subject:* How does spark manage the memory of executor with multiple
> tasks
>
>
>
> Since spark can run multiple tasks in one executor, so I am curious to
> know how does spark manage memory across these tasks. Say if one executor
> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
> then each task can consume 100MB on average. Do I understand it correctly ?
> It doesn't make sense to me that spark run multiple tasks in one executor.
>


RE: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Evo Eftimov
An Executor is a JVM instance spawned and running on a Cluster Node (Server 
machine). Task is essentially a JVM Thread – you can have as many Threads as 
you want per JVM. You will also hear about “Executor Slots” – these are 
essentially the CPU Cores available on the machine and granted for use to the 
Executor 

 

Ps: what creates ongoing confusion here is that the Spark folks have “invented” 
their own terms to describe the design of their what is essentially a 
Distributed OO Framework facilitating Parallel Programming and Data Management 
in a Distributed Environment, BUT have not provided clear 
dictionary/explanations linking these “inventions” with standard concepts 
familiar to every Java, Scala etc developer  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:02 AM
To: user@spark.apache.org
Subject: How does spark manage the memory of executor with multiple tasks

 

Since spark can run multiple tasks in one executor, so I am curious to know how 
does spark manage memory across these tasks. Say if one executor takes 1GB 
memory, then if this executor can run 10 tasks simultaneously, then each task 
can consume 100MB on average. Do I understand it correctly ? It doesn't make 
sense to me that spark run multiple tasks in one executor.