oome from large map output status

Stephen Haberman Mon, 17 Feb 2014 15:19:26 -0800

Hey,

I tracked an OOME on our 0.9 standalone master down to the master
making a large byte[] for the output statuses (serializeMapStatuses),
and it getting copied once/executor.


In our case, an RDD had 9,000 partitions, so, 81m shuffle combinations,
with the output status using 1 byte per compressed size, that's ~81m,
which, after gzipping, the byte[] was 49mb.

However, it's sent via an Akka message, so, a) 49mb is over the default
Akka frame size of 10mb (we'd already upped ours) and b) the byte[]
gets copied into a new byte[] for each slave/executor asking for it.
Plus a few more copies seem to have in the Netty/NIO stack.

AFAICT. As we basically ended with 70 of these 50mb byte[]s in RAM,
for a total of 3.5gb.

So, a few things:

1) Obviously we should not have an RDD with 9k partitions. I'll have the
job author fix that and then we should be fine.

2) That said, since this is sensitive to getting large easily (even if
in user error), perhaps a broadcast variable (or something) should be
used instead of sending the raw bytes through Akka itself?

IANAE, so perhaps optimizing our degenerate case is not worth it, but I
thought I would at least share what we ran in to.

Thanks,
Stephen

oome from large map output status

Reply via email to