DiskStore.getBytes uses memory mapped files if the length is more than a configured limit. This code path is used during map side shuffle in ExternalSorter. I want to know if its possible for the length to exceed the limit in the case of shuffle. The reason I ask is in the case of Hadoop, each map task is supposed to produce only data that can fit within the task's configured max memory. Otherwise it will result in OOM. Is the behavior same in Spark or the size of data generated by a map task can exceed what can be fitted in memory.
if (length < minMemoryMapBytes) { val buf = ByteBuffer.allocate(length.toInt) .... } else { Some(channel.map(MapMode.READ_ONLY, offset, length)) } -- Kannan