Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22511
  
    > this seems like a big change, will we hit perf regression?
    
    Not vs. 2.3.  It only effects things when stream-to-disk is enabled, and 
when it is enabled, for reading remote cached blocks, this is actually going 
back to the old behavior.  See 
https://github.com/apache/spark/pull/19476/files#diff-2b643ea78c1add0381754b1f47eec132R692
 -- `FileSegmentManagedBuffer.nioByteBuffer()` reads the entire file into a 
regular byte buffer.  I changed it to memory map the file as an attempted 
optimization.
    
    This change will make things slower, but that's vs. other changes that are 
only in 2.4.  There are TODOs in the code for ways to improve this further but 
that should not go into 2.4.
    
    > is this a long-standing bug?
    
    the change here is not fixing a long-standing bug, its just updating new 
changes for 2.4.
    
    however, I'm really wondering why TorrentBroadcast calls dispose on the 
blocks.  For regular buffers, its a no-op, so it hasn't mattered, but I can't 
come up with a reason that you *do* want to dispose those blocks.
    
    Secondly, it seems there is an implicit assumption that you never add 
memory-mapped byte buffers to the MemoryStore.  Maybe that makes sense ... its 
kind of messing with the Memory / Disk management spark has.  But the 
MemoryStore never checks that you don't add a mapped buffer, you'll just get 
weird behavior like this later on.  Seems there should be a check at the very 
least, to avoid this kind of issue.
    
    As neither of those things are new to 2.4, I don't think we should deal w/ 
them here.
    
    The major motivation for memory mapping the file was not for broadcast 
blocks, it was for reading large cached blocks.  But it actually makes more 
sense to change the interfaces in BlockManager to allow us to just get the 
managedBuffer, instead of a ChunkedByteBuffer (thats this TODO: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L728)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to