[ https://issues.apache.org/jira/browse/SPARK-24356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494058#comment-16494058 ]
Ruslan Dautkhanov commented on SPARK-24356: ------------------------------------------- Another improvement for YARN NodeManagers we saw that could decrease GC pressure is to decrease io.netty.allocator.maxOrder from default 11 down to 8. Which will decrease netty buffers from 16Mb to 2Mb. Thanks to [~mi...@cloudera.com] for helping to identify this one too {quote} Netty code responsible for highly underutilized buffers that we discussed. Long story short, I think I found the variables that control these byte[] arrays referenced by io.netty.buffer.PoolChunk.memory. Check the code of http://netty.io/4.0/xref/io/netty/buffer/PooledByteBufAllocator.html: lines 39-40 look like: private static final int DEFAULT_PAGE_SIZE; private static final int DEFAULT_MAX_ORDER; // 8192 << 11 = 16 MiB per chunk A little below you can see: int defaultPageSize = SystemPropertyUtil.getInt("io.netty.allocator.pageSize", 8192); ... // Some validation DEFAULT_PAGE_SIZE = defaultPageSize; int defaultMaxOrder = SystemPropertyUtil.getInt("io.netty.allocator.maxOrder", 11); ... // Some validation DEFAULT_MAX_ORDER = defaultMaxOrder; And then from the rest of the code in this class, as well as PoolChunk, PoolChunkList and PoolArena, it is clear that the size of the said buffers is set as pageSize * (2^maxOrder), with the default values as above. 8192b * (2^11) = 16MB, which agrees with the buffer size obtained from the jxray report, that I previously mentioned. So looks like to reduce the amount of memory wasted by these underutilized netty buffers, it's best to run the Yarn NM JVM with the "io.netty.allocator.maxOrder" explicitly set to something less than the default 11 value. Decreasing this number by 1 will reduce the amount of memory consumed by this stuff by a factor of 2. I would suggest starting with property value 9 or 8 - that seems like a reasonable balance between savings and safety. {quote} I got surprised to learn that YARN NM actually uses some Spark code (e.g. org.apache.spark.network.yarn.YarnShuffleService) so this issue could be common between YARN NM and Spark shuffle service. However we did not check if underutilized buffers in netty apply to Spark shuffle service too - might be a good idea to open another jira. jxray seems to be a great tool to find issues like these. > Duplicate strings in File.path managed by FileSegmentManagedBuffer > ------------------------------------------------------------------ > > Key: SPARK-24356 > URL: https://issues.apache.org/jira/browse/SPARK-24356 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.3.0 > Reporter: Misha Dmitriev > Priority: Major > Attachments: SPARK-24356.01.patch > > > I recently analyzed a heap dump of Yarn Node Manager that was suffering from > high GC pressure due to high object churn. Analysis was done with the jxray > tool ([www.jxray.com)|http://www.jxray.com)/] that checks a heap dump for a > number of well-known memory issues. One problem that it found in this dump is > 19.5% of memory wasted due to duplicate strings. Of these duplicates, more > than a half come from {{FileInputStream.path}} and {{File.path}}. All the > {{FileInputStream}} objects that JXRay shows are garbage - looks like they > are used for a very short period and then discarded (I guess there is a > separate question of whether that's a good pattern). But {{File}} instances > are traceable to > {{org.apache.spark.network.buffer.FileSegmentManagedBuffer.file}} field. Here > is the full reference chain: > > {code:java} > ↖java.io.File.path > ↖org.apache.spark.network.buffer.FileSegmentManagedBuffer.file > ↖{j.u.ArrayList} > ↖j.u.ArrayList$Itr.this$0 > ↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.buffers > ↖{java.util.concurrent.ConcurrentHashMap}.values > ↖org.apache.spark.network.server.OneForOneStreamManager.streams > ↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager > ↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler > ↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance > {code} > > Values of these {{File.path}}'s and {{FileInputStream.path}}'s look very > similar, so I think {{FileInputStream}}s are generated by the > {{FileSegmentManagedBuffer}} code. Instances of {{File}}, in turn, likely > come from > [https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java#L258-L263] > > To avoid duplicate strings in {{File.path}}'s in this case, it is suggested > that in the above code we create a File with a complete, normalized pathname, > that has been already interned. This will prevent the code inside > {{java.io.File}} from modifying this string, and thus it will use the > interned copy, and will pass it to FileInputStream. Essentially the current > line > {code:java} > return new File(new File(localDir, String.format("%02x", subDirId)), > filename);{code} > should be replaced with something like > {code:java} > String pathname = localDir + File.separator + String.format(...) + > File.separator + filename; > pathname = fileSystem.normalize(pathname).intern(); > return new File(pathname);{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org