RE: How to increase the number of tasks

2015-06-05 Thread Evo Eftimov
The param is for “Default number of partitions in RDDs returned by 
transformations like join, reduceByKey, and parallelize when NOT set by user.”

 

While Deepak is setting the number of partitions EXPLICITLY 

 

From: 李铖 [mailto:lidali...@gmail.com] 
Sent: Friday, June 5, 2015 11:08 AM
To: ÐΞ€ρ@Ҝ (๏̯͡๏)
Cc: Evo Eftimov; user
Subject: Re: How to increase the number of tasks

 

just multiply 2-4 with the cpu core number of the node .

 

2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) :

I did not change spark.default.parallelism,

What is recommended value for it. 

 

On Fri, Jun 5, 2015 at 3:31 PM, 李铖  wrote:

Did you have a change of the value of 'spark.default.parallelism'?be a bigger 
number.

 

2015-06-05 17:56 GMT+08:00 Evo Eftimov :

It may be that your system runs out of resources (ie 174 is the ceiling) due to 
the following 

 

1.   RDD Partition = (Spark) Task

2.   RDD Partition != (Spark) Executor

3.   (Spark) Task != (Spark) Executor

4.   (Spark) Task = JVM Thread

5.   (Spark) Executor = JVM instance 

 

From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] 
Sent: Friday, June 5, 2015 10:48 AM
To: user
Subject: How to increase the number of tasks

 

I have a  stage that spawns 174 tasks when i run repartition on avro data. 

Tasks read between 512/317/316/214/173  MB of data. Even if i increase number 
of executors/ number of partitions (when calling repartition) the number of 
tasks launched remains fixed to 174.

 

1) I want to speed up this task. How do i do it ?

2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this 
behavior ?

Since this is a repartition stage, it should not depend on the nature of data.

 

Its taking more than 30 mins and i want to speed it up by throwing more 
executors at it.

 

Please suggest

 

Deepak

 

 





 

-- 

Deepak

 

 



Re: How to increase the number of tasks

2015-06-05 Thread 李铖
just multiply 2-4 with the cpu core number of the node .

2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) :

> I did not change spark.default.parallelism,
> What is recommended value for it.
>
> On Fri, Jun 5, 2015 at 3:31 PM, 李铖  wrote:
>
>> Did you have a change of the value of 'spark.default.parallelism'?be a
>> bigger number.
>>
>> 2015-06-05 17:56 GMT+08:00 Evo Eftimov :
>>
>>> It may be that your system runs out of resources (ie 174 is the ceiling)
>>> due to the following
>>>
>>>
>>>
>>> 1.   RDD Partition = (Spark) Task
>>>
>>> 2.   RDD Partition != (Spark) Executor
>>>
>>> 3.   (Spark) Task != (Spark) Executor
>>>
>>> 4.   (Spark) Task = JVM Thread
>>>
>>> 5.   (Spark) Executor = JVM instance
>>>
>>>
>>>
>>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com]
>>> *Sent:* Friday, June 5, 2015 10:48 AM
>>> *To:* user
>>> *Subject:* How to increase the number of tasks
>>>
>>>
>>>
>>> I have a  stage that spawns 174 tasks when i run repartition on avro
>>> data.
>>>
>>> Tasks read between 512/317/316/214/173  MB of data. Even if i increase
>>> number of executors/ number of partitions (when calling repartition) the
>>> number of tasks launched remains fixed to 174.
>>>
>>>
>>>
>>> 1) I want to speed up this task. How do i do it ?
>>>
>>> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why
>>> is this behavior ?
>>>
>>> Since this is a repartition stage, it should not depend on the nature of
>>> data.
>>>
>>>
>>>
>>> Its taking more than 30 mins and i want to speed it up by throwing more
>>> executors at it.
>>>
>>>
>>>
>>> Please suggest
>>>
>>>
>>>
>>> Deepak
>>>
>>>
>>>
>>
>>
>
>
> --
> Deepak
>
>


Re: How to increase the number of tasks

2015-06-05 Thread ๏̯͡๏
I did not change spark.default.parallelism,
What is recommended value for it.

On Fri, Jun 5, 2015 at 3:31 PM, 李铖  wrote:

> Did you have a change of the value of 'spark.default.parallelism'?be a
> bigger number.
>
> 2015-06-05 17:56 GMT+08:00 Evo Eftimov :
>
>> It may be that your system runs out of resources (ie 174 is the ceiling)
>> due to the following
>>
>>
>>
>> 1.   RDD Partition = (Spark) Task
>>
>> 2.   RDD Partition != (Spark) Executor
>>
>> 3.   (Spark) Task != (Spark) Executor
>>
>> 4.   (Spark) Task = JVM Thread
>>
>> 5.   (Spark) Executor = JVM instance
>>
>>
>>
>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com]
>> *Sent:* Friday, June 5, 2015 10:48 AM
>> *To:* user
>> *Subject:* How to increase the number of tasks
>>
>>
>>
>> I have a  stage that spawns 174 tasks when i run repartition on avro
>> data.
>>
>> Tasks read between 512/317/316/214/173  MB of data. Even if i increase
>> number of executors/ number of partitions (when calling repartition) the
>> number of tasks launched remains fixed to 174.
>>
>>
>>
>> 1) I want to speed up this task. How do i do it ?
>>
>> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is
>> this behavior ?
>>
>> Since this is a repartition stage, it should not depend on the nature of
>> data.
>>
>>
>>
>> Its taking more than 30 mins and i want to speed it up by throwing more
>> executors at it.
>>
>>
>>
>> Please suggest
>>
>>
>>
>> Deepak
>>
>>
>>
>
>


-- 
Deepak


Re: How to increase the number of tasks

2015-06-05 Thread 李铖
Did you have a change of the value of 'spark.default.parallelism'?be a
bigger number.

2015-06-05 17:56 GMT+08:00 Evo Eftimov :

> It may be that your system runs out of resources (ie 174 is the ceiling)
> due to the following
>
>
>
> 1.   RDD Partition = (Spark) Task
>
> 2.   RDD Partition != (Spark) Executor
>
> 3.   (Spark) Task != (Spark) Executor
>
> 4.   (Spark) Task = JVM Thread
>
> 5.   (Spark) Executor = JVM instance
>
>
>
> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com]
> *Sent:* Friday, June 5, 2015 10:48 AM
> *To:* user
> *Subject:* How to increase the number of tasks
>
>
>
> I have a  stage that spawns 174 tasks when i run repartition on avro data.
>
> Tasks read between 512/317/316/214/173  MB of data. Even if i increase
> number of executors/ number of partitions (when calling repartition) the
> number of tasks launched remains fixed to 174.
>
>
>
> 1) I want to speed up this task. How do i do it ?
>
> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is
> this behavior ?
>
> Since this is a repartition stage, it should not depend on the nature of
> data.
>
>
>
> Its taking more than 30 mins and i want to speed it up by throwing more
> executors at it.
>
>
>
> Please suggest
>
>
>
> Deepak
>
>
>


RE: How to increase the number of tasks

2015-06-05 Thread Evo Eftimov
It may be that your system runs out of resources (ie 174 is the ceiling) due to 
the following 

 

1.   RDD Partition = (Spark) Task

2.   RDD Partition != (Spark) Executor

3.   (Spark) Task != (Spark) Executor

4.   (Spark) Task = JVM Thread

5.   (Spark) Executor = JVM instance 

 

From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] 
Sent: Friday, June 5, 2015 10:48 AM
To: user
Subject: How to increase the number of tasks

 

I have a  stage that spawns 174 tasks when i run repartition on avro data. 

Tasks read between 512/317/316/214/173  MB of data. Even if i increase number 
of executors/ number of partitions (when calling repartition) the number of 
tasks launched remains fixed to 174.

 

1) I want to speed up this task. How do i do it ?

2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this 
behavior ?

Since this is a repartition stage, it should not depend on the nature of data.

 

Its taking more than 30 mins and i want to speed it up by throwing more 
executors at it.

 

Please suggest

 

Deepak