DiskStore.getBytes uses memory mapped files if the length is more than a
configured limit. This code path is used during map side shuffle in
ExternalSorter. I want to know if its possible for the length to exceed the
limit in the case of shuffle. The reason I ask is in the case of Hadoop,
each map task is supposed to produce only data that can fit within the
task's configured max memory. Otherwise it will result in OOM. Is the
behavior same in Spark or the size of data generated by a map task can
exceed what can be fitted in memory.

  if (length < minMemoryMapBytes) {
    val buf = ByteBuffer.allocate(length.toInt)
    ....
  } else {
    Some(channel.map(MapMode.READ_ONLY, offset, length))
  }

--
Kannan

Reply via email to