Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7927#discussion_r36556494
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -590,10 +590,21 @@ private[spark] class BlockManager(
       private def doGetRemote(blockId: BlockId, asBlockResult: Boolean): 
Option[Any] = {
         require(blockId != null, "BlockId is null")
         val locations = Random.shuffle(master.getLocations(blockId))
    +    var failTimes = 0
         for (loc <- locations) {
           logDebug(s"Getting remote block $blockId from $loc")
    -      val data = blockTransferService.fetchBlockSync(
    -        loc.host, loc.port, loc.executorId, 
blockId.toString).nioByteBuffer()
    +      val data = try {
    +        blockTransferService.fetchBlockSync(
    +          loc.host, loc.port, loc.executorId, 
blockId.toString).nioByteBuffer()
    +      } catch {
    +        case e: IOException if failTimes < locations.size - 1 =>
    +          // Return null when IOException throw, so we can fetch block
    +          // from another location if there still have locations
    +          failTimes += 1
    +          logWarning(s"Try ${failTimes} times getting remote block 
$blockId from $loc failed:", e)
    --- End diff --
    
    Hi @GraceH ,
    sorry I didn't see your comment earlier.  You are right, catching just 
`IOException` doesn't make sense, no matter what the exception we should keep 
trying all locations.
    
    I think we are in agreement that `doGetRemote` should tolerate exceptions, 
as long as it is able to get from one location (and all those exceptions should 
get logged).
    
    The question is, what should it do when it cannot fetch the block from any 
location?  Should it (a) return `None`, and just log all the exceptions or (b) 
should it throw an exception?  In some ways, it seems the answer should be (a), 
since the api already returns an `Option`, and it would seem that was the whole 
purpose.  But on the other hand, the only way the existing code could return 
`None` is if `locations` were empty -- the actual behavior of the current 
version is to throw an exception if the block isn't available.
    
    I'm really torn between both options, I find I can convince myself of 
either one.  Ultimately I am leaning slightly toward (b), just because we can 
get a slightly better exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to