Number of tasks is very likely not the reason for getting timeouts. Few
things to look for:

What is actually timing out? What kind of operation?
Writing/Reading to HSDF (NameNode or DataNode)
or fetching shuffle data (External Shuffle Service or not)
or driver is not able to talk to executor.

Trivial of things to do is to increase network timeouts in spark conf.
Other thing to check is: if GC is kicking in -- try increasing heap size.

thanks & all the best,
rohitk



On Thu, Mar 16, 2017 at 7:23 AM, Yong Zhang <java8...@hotmail.com> wrote:

> Not really sure what is the root problem you try to address.
>
>
> The number of tasks need to be run in Spark depends on the number of
> partitions in your job.
>
>
> Let's use a simple word count example, if your spark job read 128G data
> from HDFS (assume the default block size is 128M), then the mapper stage of
> your spark job will spawn 1000 tasks (128G / 128M).
>
>
> In the reducer stage, by default, spark will spawn 200 tasks (controlled
> by spark.default.parallelism if you are using RDD api or
> spark.sql.shuffle.partitions if you are using DataFrame, and you didn't
> specify the partition number in any of your API call).
>
>
> In either case, you can change the tasks number spawned (Even in the
> mapper case, but I didn't see any reason under normal case). For huge
> datasets running in Spark, people often to increase the tasks count spawned
> in the reducing stage, to make each task processing much less volume of
> data, to reduce the memory pressure and increase performance.
>
>
> Still in the word count example, if you have 2000 unique words in your
> dataset, then your reducer count could be from 1 to 2000. 1 is the worst,
> as only one task will process all 2000 unique words, meaning all the data
> will be sent to this one task, and it will be the slowest. But on the other
> hand, 2000 maybe is neither the best.
>
>
> Let's say we set 200 is the best number, so you will have 200 reduce tasks
> to process 2000 unique words. Setting the number of executors and cores is
> just to allocation how many these tasks can be run concurrently. So if your
> cluster has enough cores and memory available, obviously grant as many as
> cores up to 200 to your spark job for this reducing stage is the best.
>
>
> You need to be more clear about what problem you are facing when running
> your spark job here, so we can provide help. Reducing the number of tasks
> spawned normally is a very strange way.
>
>
> Yong
>
>
> ------------------------------
> *From:* Kevin Peng <kpe...@gmail.com>
> *Sent:* Wednesday, March 15, 2017 1:35 PM
> *To:* mohini kalamkar
> *Cc:* user@spark.apache.org
> *Subject:* Re: Setting Optimal Number of Spark Executor Instances
>
> Mohini,
>
> We set that parameter before we went and played with the number of
> executors and that didn't seem to help at all.
>
> Thanks,
>
> KP
>
> On Tue, Mar 14, 2017 at 3:37 PM, mohini kalamkar <
> mohini.kalam...@gmail.com> wrote:
>
>> Hi,
>>
>> try using this parameter --conf spark.sql.shuffle.partitions=1000
>>
>> Thanks,
>> Mohini
>>
>> On Tue, Mar 14, 2017 at 3:30 PM, kpeng1 <kpe...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am currently on Spark 1.6 and I was doing a sql join on two tables that
>>> are over 100 million rows each and I noticed that it was spawn 30000+
>>> tasks
>>> (this is the progress meter that we are seeing show up).  We tried to
>>> coalesece, repartition and shuffle partitions to drop the number of tasks
>>> down because we were getting time outs due to the number of task being
>>> spawned, but those operations did not seem to reduce the number of tasks.
>>> The solution we came up with was actually to set the num executors to 50
>>> (--num-executors=50) and it looks like it spawned 200 active tasks, but
>>> the
>>> total number of tasks remained the same.  Was wondering if anyone knows
>>> what
>>> is going on?  Is there an optimal number of executors, I was under the
>>> impression that the default dynamic allocation would pick the optimal
>>> number
>>> of executors for us and that this situation wouldn't happen.  Is there
>>> something I am missing?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Setting-Optimal-Number-of-Spark-Execut
>>> or-Instances-tp28493.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Mohini Kalamkar
>> M: +1 310 567 9329 <(310)%20567-9329>
>>
>
>

Reply via email to