Using memory mapped file for shuffle

Kannan Rajah Mon, 13 Apr 2015 22:02:04 -0700

DiskStore.getBytes uses memory mapped files if the length is more than a
configured limit. This code path is used during map side shuffle in
ExternalSorter. I want to know if its possible for the length to exceed the
limit in the case of shuffle. The reason I ask is in the case of Hadoop,
each map task is supposed to produce only data that can fit within the
task's configured max memory. Otherwise it will result in OOM. Is the
behavior same in Spark or the size of data generated by a map task can
exceed what can be fitted in memory.


  if (length < minMemoryMapBytes) {
    val buf = ByteBuffer.allocate(length.toInt)
    ....
  } else {
    Some(channel.map(MapMode.READ_ONLY, offset, length))
  }

--
Kannan

Using memory mapped file for shuffle

Reply via email to