Our current hack is to use Broadcast variables when serialized statuses are above some (configurable) size : and have the workers directly pull them from master. This is a workaround : so would be great if there was a better/principled solution.
Please note that the responses are going to different workers requesting for the output statuses for shuffle (after map) - so not sure if back pressure buffers, etc would help. Regards, Mridul On Mon, Jun 30, 2014 at 11:07 PM, Mridul Muralidharan <mri...@gmail.com> wrote: > Hi, > > While sending map output tracker result, the same serialized byte > array is sent multiple times - but the akka implementation copies it > to a private byte array within ByteString for each send. > Caching a ByteString instead of Array[Byte] did not help, since akka > does not support special casing ByteString : serializes the > ByteString, and copies the result out to an array before creating > ByteString out of it (in Array[Byte] serializing is thankfully simply > returning same array - so one copy only). > > > Given the need to send immutable data large number of times, is there > any way to do it in akka without copying internally in akka ? > > > To see how expensive it is, for 200 nodes withi large number of > mappers and reducers, the status becomes something like 30 mb for us - > and pulling this about 200 to 300 times results in OOM due to the > large number of copies sent out. > > > Thanks, > Mridul