[GitHub] spark pull request: [SPARK-4107] Fix incorrect handling of read() ...

JoshRosen Mon, 27 Oct 2014 23:44:58 -0700

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2969#discussion_r19455153
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/TachyonStore.scala 
---
    @@ -105,25 +106,17 @@ private[spark] class TachyonStore(
           return None
         }
         val is = file.getInStream(ReadType.CACHE)
    -    var buffer: ByteBuffer = null
    +    assert (is != null)
         try {
    -      if (is != null) {
    -        val size = file.length
    -        val bs = new Array[Byte](size.asInstanceOf[Int])
    -        val fetchSize = is.read(bs, 0, size.asInstanceOf[Int])
    -        buffer = ByteBuffer.wrap(bs)
    -        if (fetchSize != size) {
    -          logWarning(s"Failed to fetch the block $blockId from Tachyon: 
Size $size " +
    -            s"is not equal to fetched size $fetchSize")
    -          return None
    -        }
    -      }
    +      val size = file.length
    +      val bs = new Array[Byte](size.asInstanceOf[Int])
    +      ByteStreams.readFully(is, bs)
    +      Some(ByteBuffer.wrap(bs))
         } catch {
    --- End diff --
    
    Ah, gotcha.
    
    From a general API design perspective, I think it's a little weird to have 
methods that swallow OOMs and resurface them as other errors.  As a library 
consumer, how would you feel if, say, Snappy were to swallow OOMs and re-throw 
them as IOException?  A library with that sort of unexpected behavior might 
actually break user applications' ability to recover from OOMs: if we ran out 
of memory due to excessive memory usage in some other part of the app but the 
OOM happened to be caught and swallowed by the library, then the top-level 
uncaught exception handler might never get a chance to respond to the OOM in an 
application-specific way (e.g. attempt to close files, trigger GCs, or clear 
caches).
    
    I'm not against handling OOMs in principle, but I think that it should 
probably be done at higher levels of the stack since that seems easier to 
reason about.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4107] Fix incorrect handling of read() ...

Reply via email to