Misha Dmitriev created SPARK-24801: -------------------------------------- Summary: Empty byte[] arrays in spark.network.sasl.SaslEncryption$EncryptedMessage can waste a lot of memory Key: SPARK-24801 URL: https://issues.apache.org/jira/browse/SPARK-24801 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.3.0 Reporter: Misha Dmitriev
I recently analyzed another Yarn NM heap dump with jxray ([www.jxray.com),|http://www.jxray.com),/] and found that 81% of memory is wasted by empty (all zeroes) byte[] arrays. Most of these arrays are referenced by {{org.apache.spark.network.util.ByteArrayWritableChannel.data}}, and these in turn come from {{spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel}}. Here is the full reference chain that leads to the problematic arrays: {code:java} 2,597,946K (64.1%): byte[]: 40583 / 100% of empty 2,597,946K (64.1%) ↖org.apache.spark.network.util.ByteArrayWritableChannel.data ↖org.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel ↖io.netty.channel.ChannelOutboundBuffer$Entry.msg ↖io.netty.channel.ChannelOutboundBuffer$Entry.{next} ↖io.netty.channel.ChannelOutboundBuffer.flushedEntry ↖io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelUnsafe.outboundBuffer ↖io.netty.channel.socket.nio.NioSocketChannel.unsafe ↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.associatedChannel ↖{java.util.concurrent.ConcurrentHashMap}.values ↖org.apache.spark.network.server.OneForOneStreamManager.streams ↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager ↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler ↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance{code} Checking the code of {{SaslEncryption$EncryptedMessage}}, I see that byteChannel is always initialized eagerly in the constructor: {code:java} this.byteChannel = new ByteArrayWritableChannel(maxOutboundBlockSize);{code} So I think to address the problem of empty byte[] arrays flooding the memory, we should initialize {{byteChannel}} lazily, upon the first use. As far as I can see, it's used only in one method, {{private void nextChunk()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org