Github user squito commented on the issue: https://github.com/apache/spark/pull/22511 > this seems like a big change, will we hit perf regression? Not vs. 2.3. It only effects things when stream-to-disk is enabled, and when it is enabled, for reading remote cached blocks, this is actually going back to the old behavior. See https://github.com/apache/spark/pull/19476/files#diff-2b643ea78c1add0381754b1f47eec132R692 -- `FileSegmentManagedBuffer.nioByteBuffer()` reads the entire file into a regular byte buffer. I changed it to memory map the file as an attempted optimization. This change will make things slower, but that's vs. other changes that are only in 2.4. There are TODOs in the code for ways to improve this further but that should not go into 2.4. > is this a long-standing bug? the change here is not fixing a long-standing bug, its just updating new changes for 2.4. however, I'm really wondering why TorrentBroadcast calls dispose on the blocks. For regular buffers, its a no-op, so it hasn't mattered, but I can't come up with a reason that you *do* want to dispose those blocks. Secondly, it seems there is an implicit assumption that you never add memory-mapped byte buffers to the MemoryStore. Maybe that makes sense ... its kind of messing with the Memory / Disk management spark has. But the MemoryStore never checks that you don't add a mapped buffer, you'll just get weird behavior like this later on. Seems there should be a check at the very least, to avoid this kind of issue. As neither of those things are new to 2.4, I don't think we should deal w/ them here. The major motivation for memory mapping the file was not for broadcast blocks, it was for reading large cached blocks. But it actually makes more sense to change the interfaces in BlockManager to allow us to just get the managedBuffer, instead of a ChunkedByteBuffer (thats this TODO: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L728)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org