RE: How to increase the number of tasks
The param is for “Default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when NOT set by user.” While Deepak is setting the number of partitions EXPLICITLY From: 李铖 [mailto:lidali...@gmail.com] Sent: Friday, June 5, 2015 11:08 AM To: ÐΞ€ρ@Ҝ (๏̯͡๏) Cc: Evo Eftimov; user Subject: Re: How to increase the number of tasks just multiply 2-4 with the cpu core number of the node . 2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) : I did not change spark.default.parallelism, What is recommended value for it. On Fri, Jun 5, 2015 at 3:31 PM, 李铖 wrote: Did you have a change of the value of 'spark.default.parallelism'?be a bigger number. 2015-06-05 17:56 GMT+08:00 Evo Eftimov : It may be that your system runs out of resources (ie 174 is the ceiling) due to the following 1. RDD Partition = (Spark) Task 2. RDD Partition != (Spark) Executor 3. (Spark) Task != (Spark) Executor 4. (Spark) Task = JVM Thread 5. (Spark) Executor = JVM instance From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] Sent: Friday, June 5, 2015 10:48 AM To: user Subject: How to increase the number of tasks I have a stage that spawns 174 tasks when i run repartition on avro data. Tasks read between 512/317/316/214/173 MB of data. Even if i increase number of executors/ number of partitions (when calling repartition) the number of tasks launched remains fixed to 174. 1) I want to speed up this task. How do i do it ? 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this behavior ? Since this is a repartition stage, it should not depend on the nature of data. Its taking more than 30 mins and i want to speed it up by throwing more executors at it. Please suggest Deepak -- Deepak
Re: How to increase the number of tasks
just multiply 2-4 with the cpu core number of the node . 2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) : > I did not change spark.default.parallelism, > What is recommended value for it. > > On Fri, Jun 5, 2015 at 3:31 PM, 李铖 wrote: > >> Did you have a change of the value of 'spark.default.parallelism'?be a >> bigger number. >> >> 2015-06-05 17:56 GMT+08:00 Evo Eftimov : >> >>> It may be that your system runs out of resources (ie 174 is the ceiling) >>> due to the following >>> >>> >>> >>> 1. RDD Partition = (Spark) Task >>> >>> 2. RDD Partition != (Spark) Executor >>> >>> 3. (Spark) Task != (Spark) Executor >>> >>> 4. (Spark) Task = JVM Thread >>> >>> 5. (Spark) Executor = JVM instance >>> >>> >>> >>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >>> *Sent:* Friday, June 5, 2015 10:48 AM >>> *To:* user >>> *Subject:* How to increase the number of tasks >>> >>> >>> >>> I have a stage that spawns 174 tasks when i run repartition on avro >>> data. >>> >>> Tasks read between 512/317/316/214/173 MB of data. Even if i increase >>> number of executors/ number of partitions (when calling repartition) the >>> number of tasks launched remains fixed to 174. >>> >>> >>> >>> 1) I want to speed up this task. How do i do it ? >>> >>> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why >>> is this behavior ? >>> >>> Since this is a repartition stage, it should not depend on the nature of >>> data. >>> >>> >>> >>> Its taking more than 30 mins and i want to speed it up by throwing more >>> executors at it. >>> >>> >>> >>> Please suggest >>> >>> >>> >>> Deepak >>> >>> >>> >> >> > > > -- > Deepak > >
Re: How to increase the number of tasks
I did not change spark.default.parallelism, What is recommended value for it. On Fri, Jun 5, 2015 at 3:31 PM, 李铖 wrote: > Did you have a change of the value of 'spark.default.parallelism'?be a > bigger number. > > 2015-06-05 17:56 GMT+08:00 Evo Eftimov : > >> It may be that your system runs out of resources (ie 174 is the ceiling) >> due to the following >> >> >> >> 1. RDD Partition = (Spark) Task >> >> 2. RDD Partition != (Spark) Executor >> >> 3. (Spark) Task != (Spark) Executor >> >> 4. (Spark) Task = JVM Thread >> >> 5. (Spark) Executor = JVM instance >> >> >> >> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >> *Sent:* Friday, June 5, 2015 10:48 AM >> *To:* user >> *Subject:* How to increase the number of tasks >> >> >> >> I have a stage that spawns 174 tasks when i run repartition on avro >> data. >> >> Tasks read between 512/317/316/214/173 MB of data. Even if i increase >> number of executors/ number of partitions (when calling repartition) the >> number of tasks launched remains fixed to 174. >> >> >> >> 1) I want to speed up this task. How do i do it ? >> >> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is >> this behavior ? >> >> Since this is a repartition stage, it should not depend on the nature of >> data. >> >> >> >> Its taking more than 30 mins and i want to speed it up by throwing more >> executors at it. >> >> >> >> Please suggest >> >> >> >> Deepak >> >> >> > > -- Deepak
Re: How to increase the number of tasks
Did you have a change of the value of 'spark.default.parallelism'?be a bigger number. 2015-06-05 17:56 GMT+08:00 Evo Eftimov : > It may be that your system runs out of resources (ie 174 is the ceiling) > due to the following > > > > 1. RDD Partition = (Spark) Task > > 2. RDD Partition != (Spark) Executor > > 3. (Spark) Task != (Spark) Executor > > 4. (Spark) Task = JVM Thread > > 5. (Spark) Executor = JVM instance > > > > *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] > *Sent:* Friday, June 5, 2015 10:48 AM > *To:* user > *Subject:* How to increase the number of tasks > > > > I have a stage that spawns 174 tasks when i run repartition on avro data. > > Tasks read between 512/317/316/214/173 MB of data. Even if i increase > number of executors/ number of partitions (when calling repartition) the > number of tasks launched remains fixed to 174. > > > > 1) I want to speed up this task. How do i do it ? > > 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is > this behavior ? > > Since this is a repartition stage, it should not depend on the nature of > data. > > > > Its taking more than 30 mins and i want to speed it up by throwing more > executors at it. > > > > Please suggest > > > > Deepak > > >
RE: How to increase the number of tasks
It may be that your system runs out of resources (ie 174 is the ceiling) due to the following 1. RDD Partition = (Spark) Task 2. RDD Partition != (Spark) Executor 3. (Spark) Task != (Spark) Executor 4. (Spark) Task = JVM Thread 5. (Spark) Executor = JVM instance From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] Sent: Friday, June 5, 2015 10:48 AM To: user Subject: How to increase the number of tasks I have a stage that spawns 174 tasks when i run repartition on avro data. Tasks read between 512/317/316/214/173 MB of data. Even if i increase number of executors/ number of partitions (when calling repartition) the number of tasks launched remains fixed to 174. 1) I want to speed up this task. How do i do it ? 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this behavior ? Since this is a repartition stage, it should not depend on the nature of data. Its taking more than 30 mins and i want to speed it up by throwing more executors at it. Please suggest Deepak