[jira] [Resolved] (SPARK-31179) Fast fail the connection while last shuffle connection failed in the last retry IO wait

Thomas Graves (Jira) Thu, 02 Apr 2020 06:22:30 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-31179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Graves resolved SPARK-31179.
-----------------------------------
    Fix Version/s: 3.1.0
         Assignee: feiwang
       Resolution: Fixed

> Fast fail the connection while last shuffle connection failed in the last 
> retry IO wait 
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-31179
>                 URL: https://issues.apache.org/jira/browse/SPARK-31179
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 3.1.0
>            Reporter: feiwang
>            Assignee: feiwang
>            Priority: Major
>             Fix For: 3.1.0
>
>
> When reading shuffle data, maybe several fetch request sent to a same shuffle 
> server.
> There is a client pool, and these request may share the same client.
> When the shuffle server is busy, it may cause the request connection timeout.
> For example: there are two request connection, rc1 and rc2.
> Especially, the io.numConnectionsPerPeer is 1 and connection timeout is 2 
> minutes.
> 1: rc1 hold the client lock, it timeout after 2 minutes.
> 2: rc2 hold the client lock, it timeout after 2 minutes.
> 3: rc1 start the second retry, hold lock and timeout after 2 minutes.
> 4: rc2 start the second retry, hold lock and timeout after 2 minutes.
> 5: rc1 start the third retry, hold lock and timeout after 2 minutes.
> 6: rc2 start the third retry, hold lock and timeout after 2 minutes.
> It wastes lots of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31179) Fast fail the connection while last shuffle connection failed in the last retry IO wait

Reply via email to