[ 
https://issues.apache.org/jira/browse/SPARK-36772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-36772:
----------------------------------
    Target Version/s: 3.2.0

> FinalizeShuffleMerge fails with an exception due to attempt id not matching
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-36772
>                 URL: https://issues.apache.org/jira/browse/SPARK-36772
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 3.2.0
>            Reporter: Mridul Muralidharan
>            Priority: Blocker
>
> As part of driver request to external shuffle services (ESS) to finalize the 
> merge, it also passes its [application attempt 
> id|https://github.com/apache/spark/blob/3f09093a21306b0fbcb132d4c9f285e56ac6b43c/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java#L180]
>  so that ESS can validate the request is from the correct attempt.
> This attempt id is fetched from the TransportConf passed in when creating the 
> [ExternalBlockStoreClient|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkEnv.scala#L352]
>  - and the transport conf leverages a [cloned 
> copy|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala#L47]
>  of the SparkConf passed to it.
> Application attempt id is set as part of SparkContext 
> [initialization|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L586].
> But this happens after driver SparkEnv has [already been 
> created|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L460].
> Hence the attempt id that ExternalBlockStoreClient uses will always end up 
> being -1 : which will not match the attempt id at ESS (which is based on 
> spark.app.attempt.id) : resulting in merge finalization to always fail (" 
> java.lang.IllegalArgumentException: The attempt id -1 in this 
> FinalizeShuffleMerge message does not match with the current attempt id 1 
> stored in shuffle service for application ...")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to