[GitHub] [spark] LantaoJin edited a comment on issue #25971: [SPARK-29298][CORE] Separate block manager heartbeat endpoint from driver endpoint

GitBox Fri, 08 Nov 2019 19:14:35 -0800

LantaoJin edited a comment on issue #25971: [SPARK-29298][CORE] Separate block 
manager heartbeat endpoint from driver endpoint
URL: https://github.com/apache/spark/pull/25971#issuecomment-552060448
 
 
   Thanks for the comment @jiangxb1987 
   
   > Please correct me if I'm wrong but I don't see approach to retry when 
`GetLocations*` requests timeout.
   
   `GetLocations` event never timeout.
   
https://github.com/apache/spark/blob/e1ea806b3075d279b5f08a29fe4c1ad6d3c4191a/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala#L85
   `BlockManagerHeartbeat` event could timeout, and if timeout we treat it as 
an executor lost.
   
https://github.com/apache/spark/blob/70987d8144f4f2c094f3b82d0c4a98e818366225/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L271
   But in a busy block manager, executors are not dead in deed but treated as 
lost by this mistakenly timeout. That's what this PR to fix.
   
   > so other events do not have timeout? or they will retry if timeout?
   
   Previously, I am not confirm that. But I think yes. They do not timeout. I 
only see `BlockManagerHeartbeat ` with timeout parameter.
   ```scala
   driverEndpoint.askSync[T](BlockManagerHeartbeat, new RpcTimeout(..))
   ```
   
   > I won't call an async message Heartbeat.
   
   Sorry, I still keep it sync.
   
https://github.com/apache/spark/blob/7b8b398633789b65d116ce716d6fb1afcded0427/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L270


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin edited a comment on issue #25971: [SPARK-29298][CORE] Separate block manager heartbeat endpoint from driver endpoint

Reply via email to