GitHub user squito opened a pull request: https://github.com/apache/spark/pull/21440
[SPARK-24307][CORE] Support reading remote cached partitions > 2gb (1) Netty's ByteBuf cannot support data > 2gb. So to transfer data from a ChunkedByteBuffer over the network, we use a custom version of FileRegion which is backed by the ChunkedByteBuffer. (2) On the receiving end, we need to expose all the data in a FileSegmentManagedBuffer as a ChunkedByteBuffer. We do that by memory mapping the entire file in chunks. Added unit tests. Ran the randomized test a couple of hundred times on my laptop. Tests cover the equivalent of SPARK-24107 for the ChunkedByteBufferFileRegion. Also tested on a cluster with remote cache reads >2gb (in memory and on disk). You can merge this pull request into a Git repository by running: $ git pull https://github.com/squito/spark chunked_bb_file_region Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21440.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21440 ---- commit 4373e27c2ec96b77a2311f5c5997ae5ca84bf6c5 Author: Imran Rashid <irashid@...> Date: 2018-05-23T03:59:40Z [SPARK-24307][CORE] Support reading remote cached partitions > 2gb (1) Netty's ByteBuf cannot support data > 2gb. So to transfer data from a ChunkedByteBuffer over the network, we use a custom version of FileRegion which is backed by the ChunkedByteBuffer. (2) On the receiving end, we need to expose all the data in a FileSegmentManagedBuffer as a ChunkedByteBuffer. We do that by memory mapping the entire file in chunks. Added unit tests. Also tested on a cluster with remote cache reads > 2gb (in memory and on disk). ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org