Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21440#discussion_r203773818
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -659,6 +659,11 @@ private[spark] class BlockManager(
        * Get block from remote block managers as serialized bytes.
        */
       def getRemoteBytes(blockId: BlockId): Option[ChunkedByteBuffer] = {
    +    // TODO if we change this method to return the ManagedBuffer, then 
getRemoteValues
    +    // could just use the inputStream on the temp file, rather than 
memory-mapping the file.
    +    // Until then, replication can cause the process to use too much 
memory and get killed
    +    // by the OS / cluster manager (not a java OOM, since its a 
memory-mapped file) even though
    +    // we've read the data to disk.
    --- End diff --
    
    to be honest I don't have perfect understanding of this, but my impression 
is that it is not exactly _lazy_ loading, the OS has a lot of leeway in 
deciding how much to keep in memory, but that it should always release the 
memory under pressure.  this is problematic under yarn, when the container's 
memory use is being monitored independently of the OS.  so the OS thinks its 
fine to put large amounts of data in physical memory, but then the yarn NM 
looks at the memory use of the specific process tree, decides its over the 
limits it has configured, and so kills it.
    
    At least, I've seen cases of yarn killing things for exceeding memory 
limits where I *thought* that was the case, though I did not directly confirm 
it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to