That error message *FetchFailedException: Failed to connect to
<executor_IP> on port 7337 *happens when a task running on one executor
node tries to fetch data from another executor node but fails to establish
a connection to the specified port (7337 in this case). In a nutshell it is
performing network IO among your executors.

Check the following:

- Any network issue or connectivity problems anong nodes that your
executors are running on
- any executor failure causing this error. Check the executor logs
- Concurrency and Thread Issues: If there are too many concurrent
connections or thread limitations,
  it could result in failed connections. *Adjust
spark.shuffle.io.clientThreads*
- It might be prudent to do the same to *spark.shuffle.io.server.Threads*
- Check how stable your environment is. Observe any issues reported in
Spark UI

HTH


Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 18 Aug 2023 at 23:30, Nebi Aydin <nayd...@binghamton.edu> wrote:

>
> Hi, sorry for duplicates. First time user :)
> I keep getting fetchfailedexception 7337 port closed. Which is external
> shuffle service port.
> I was trying to tune these parameters.
> I have around 1000 executors and 5000 cores.
> I tried to set spark.shuffle.io.serverThreads to 2k. Should I also set 
> spark.shuffle.io.clientThreads
> to 2000?
> Does shuffle client threads allow one executor to fetch from multiple
> nodes shuffle service?
>
> Thanks
> On Fri, Aug 18, 2023 at 17:42 Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> These two threads that you sent seem to be duplicates of each other?
>>
>> Anyhow I trust that you are familiar with the concept of shuffle in
>> Spark. Spark Shuffle is an expensive operation since it involves the
>> following
>>
>>    -
>>
>>    Disk I/O
>>    -
>>
>>    Involves data serialization and deserialization
>>    -
>>
>>    Network I/O
>>
>> Basically these are based on the concept of map/reduce in Spark and these
>> parameters you posted relate to various aspects of threading and
>> concurrency.
>>
>> HTH
>>
>>
>> Mich Talebzadeh,
>> Solutions Architect/Engineering Lead
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Aug 2023 at 20:39, Nebi Aydin <nayd...@binghamton.edu.invalid>
>> wrote:
>>
>>>
>>> I want to learn differences among below thread configurations.
>>>
>>> spark.shuffle.io.serverThreads
>>> spark.shuffle.io.clientThreads
>>> spark.shuffle.io.threads
>>> spark.rpc.io.serverThreads
>>> spark.rpc.io.clientThreads
>>> spark.rpc.io.threads
>>>
>>> Thanks.
>>>
>>

Reply via email to