*I’ve been looking at where untracked memory is getting used in spark, especially offheap memory, and I’ve discovered some things I’d like to share with the community. Most of what I’ve learned has been about the way spark is using netty -- I’ll go into some more detail about that below. I’m also planning on opening up a number of jiras. I don’t think it really makes sense to put them in an Epic or anything, since they’re mostly independent, just somewhat related, so I’m going to put a “memory-analysis” label on some of the new (and existing) jiras. I thought it would be useful to share a slightly broader view with the community here.Aside from memory use by netty, two other high level points: 1. It would be really nice if spark had an “executor-plugin” api [https://issues.apache.org/jira/browse/SPARK-24918 <https://issues.apache.org/jira/browse/SPARK-24918>]. Its nice to have instrumentation which can exist entirely outside of the spark codebase[https://github.com/squito/spark-memory <https://github.com/squito/spark-memory>], but with dynamic allocation you can’t easily get something to execute everytime an executor starts with the current apis. (For users with memory issues -- you might be interested in trying out my tool with a patched version of spark.)2. Metaspace uses about 200 MB on the driver, and its all offheap in java 8. Its not a ton, but something I really hadn’t considered before for problems with offheap memory, and big enough that it can matter.More details about Netty:Netty maintains its own pools of memory[https://netty.io/wiki/reference-counted-objects.html <https://netty.io/wiki/reference-counted-objects.html>] for performance. By default, spark asks netty to use offheap memory for this (configurable with “spark.shuffle.io.preferDirectBufs”). Furthermore, netty ties the configuration of the pool to the number of IO threads by default, to minimize the thread contention.While this is meant to save time of allocating memory, in practice, with Spark’s configuration, this seems to result in a lot of wasted memory. First, spark creates multiple independent pools. Depending on configuration & whether its the executor or driver, there are different pools for: 1. RPC Client2. RPC Server3. BlockTransfer Client4. BlockTransfer Server5. ExternalShuffle ClientMemory use can spike very quickly for an incoming burst of messages. As the pools grow in 16 MB chunks, the growth isn’t necessarily related to the amount of data received. A burst of tiny messages, processed by 8 io threads, will lead to a minimum of 128 MB in the respective pool (though the pool is almost entirely empty).[https://issues.apache.org/jira/browse/SPARK-24920 <https://issues.apache.org/jira/browse/SPARK-24920>]Spark limits each of these to have no more than 8 io threads, but given 4 different “services” are active, the total number of io threads is often 32 threads.Even when Spark configures netty to use offheap memory, we still see the pools using a significant portion of memory onheap -- it may just be from message encoder choosing onheap buffers[https://issues.apache.org/jira/browse/SPARK-24938 <https://issues.apache.org/jira/browse/SPARK-24938>]Netty exposes the memory use of its pools which you can *poll*[https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArenaMetric.java <https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArenaMetric.java>]. However, polling is imprecise -- if you see a largely unused pool, is that because it was nearly full and then freed right away? Or did it really grow much more than it ever needed to? Similarly there are many more aspects of netty state which are hard to monitor, like: - The “quantile” of each chunk in the chunk lists[https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArena.java#L55-L60 <https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArena.java#L55-L60>]- The source of pool usage -- inbound data that is read off the socket? Or messages that spark is encoding?- Getting notifications when high water marks are passed- How many messages are sitting in ChannelOutboundBuffer waiting to be processed [related: https://issues.apache.org/jira/browse/SPARK-24801 <https://issues.apache.org/jira/browse/SPARK-24801>]Note that spark can hold onto the buffers from netty beyond just receiving and decoding data, the steps we think of as associated w/ the network layer. For example, when fetching shuffle blocks, after the bytes have been fetched, we create a ByteBufInputStream from the bytes and feed them into the shuffle logic; they are returned to the pool when they are fully read.There are some more minor details, I won’t go into those here as this is already long enough. I’ll just open up specific jiras with the “memory-analysis” label.I’m filing a bunch of issues, and I think they’re mostly up for grabs unless they have specific comments. It is worth noting, though, that for a lot of them the code changes are very small, the real work is trying the changes out with actual workloads and reporting on the results, so they might be hard for folks without the ability to run experiments on a cluster.* thanks, Imran