[jira] [Assigned] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails
[ https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-40096: --- Assignee: Wan Kun > Finalize shuffle merge slow due to connection creation fails > > > Key: SPARK-40096 > URL: https://issues.apache.org/jira/browse/SPARK-40096 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Major > > *How to reproduce this issue* > * Enable push based shuffle > * Remove some merger nodes before sending finalize RPCs > * Driver try to connect those merger shuffle services and send finalize RPC > one by one, each connection creation will timeout after > SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default) > > We can send these RPCs in *shuffleMergeFinalizeScheduler* thread pool and > handle the connection creation exception -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails
[ https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40096: Assignee: (was: Apache Spark) > Finalize shuffle merge slow due to connection creation fails > > > Key: SPARK-40096 > URL: https://issues.apache.org/jira/browse/SPARK-40096 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Wan Kun >Priority: Major > > *How to reproduce this issue* > * Enable push based shuffle > * Remove some merger nodes before sending finalize RPCs > * Driver try to connect those merger shuffle services and send finalize RPC > one by one, each connection creation will timeout after > SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default) > > We can send these RPCs in *shuffleMergeFinalizeScheduler* thread pool and > handle the connection creation exception -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40096) Finalize shuffle merge slow due to connection creation fails
[ https://issues.apache.org/jira/browse/SPARK-40096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40096: Assignee: Apache Spark > Finalize shuffle merge slow due to connection creation fails > > > Key: SPARK-40096 > URL: https://issues.apache.org/jira/browse/SPARK-40096 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Wan Kun >Assignee: Apache Spark >Priority: Major > > *How to reproduce this issue* > * Enable push based shuffle > * Remove some merger nodes before sending finalize RPCs > * Driver try to connect those merger shuffle services and send finalize RPC > one by one, each connection creation will timeout after > SPARK_NETWORK_IO_CONNECTIONCREATIONTIMEOUT_KEY (120s by default) > > We can send these RPCs in *shuffleMergeFinalizeScheduler* thread pool and > handle the connection creation exception -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org