Misha Dmitriev created SPARK-24801:
--------------------------------------

             Summary: Empty byte[] arrays in 
spark.network.sasl.SaslEncryption$EncryptedMessage can waste a lot of memory
                 Key: SPARK-24801
                 URL: https://issues.apache.org/jira/browse/SPARK-24801
             Project: Spark
          Issue Type: Improvement
          Components: YARN
    Affects Versions: 2.3.0
            Reporter: Misha Dmitriev


I recently analyzed another Yarn NM heap dump with jxray 
([www.jxray.com),|http://www.jxray.com),/] and found that 81% of memory is 
wasted by empty (all zeroes) byte[] arrays. Most of these arrays are referenced 
by {{org.apache.spark.network.util.ByteArrayWritableChannel.data}}, and these 
in turn come from 
{{spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel}}. Here is the 
full reference chain that leads to the problematic arrays:
{code:java}
2,597,946K (64.1%): byte[]: 40583 / 100% of empty 2,597,946K (64.1%)

↖org.apache.spark.network.util.ByteArrayWritableChannel.data
↖org.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel
↖io.netty.channel.ChannelOutboundBuffer$Entry.msg
↖io.netty.channel.ChannelOutboundBuffer$Entry.{next}
↖io.netty.channel.ChannelOutboundBuffer.flushedEntry
↖io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelUnsafe.outboundBuffer
↖io.netty.channel.socket.nio.NioSocketChannel.unsafe
↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.associatedChannel
↖{java.util.concurrent.ConcurrentHashMap}.values
↖org.apache.spark.network.server.OneForOneStreamManager.streams
↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager
↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler
↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance{code}
 

Checking the code of {{SaslEncryption$EncryptedMessage}}, I see that 
byteChannel is always initialized eagerly in the constructor:
{code:java}
this.byteChannel = new ByteArrayWritableChannel(maxOutboundBlockSize);{code}
So I think to address the problem of empty byte[] arrays flooding the memory, 
we should initialize {{byteChannel}} lazily, upon the first use. As far as I 
can see, it's used only in one method, {{private void nextChunk()}}.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to