[ 
https://issues.apache.org/jira/browse/SPARK-24356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494058#comment-16494058
 ] 

Ruslan Dautkhanov commented on SPARK-24356:
-------------------------------------------

Another improvement for YARN NodeManagers we saw that could decrease GC 
pressure is to decrease io.netty.allocator.maxOrder from default 11 down to 8. 
Which will decrease netty buffers from 16Mb to 2Mb. 

Thanks to [~mi...@cloudera.com] for helping to identify this one too

{quote}
Netty code responsible for highly underutilized buffers that we discussed. Long 
story short, I think I found the variables that control these byte[] arrays 
referenced by io.netty.buffer.PoolChunk.memory. Check the code of 
http://netty.io/4.0/xref/io/netty/buffer/PooledByteBufAllocator.html: lines 
39-40 look like:

private static final int DEFAULT_PAGE_SIZE;
private static final int DEFAULT_MAX_ORDER; // 8192 << 11 = 16 MiB per chunk

A little below you can see:

int defaultPageSize = SystemPropertyUtil.getInt("io.netty.allocator.pageSize", 
8192);
... // Some validation
DEFAULT_PAGE_SIZE = defaultPageSize;

int defaultMaxOrder = SystemPropertyUtil.getInt("io.netty.allocator.maxOrder", 
11);
... // Some validation
DEFAULT_MAX_ORDER = defaultMaxOrder;

And then from the rest of the code in this class, as well as PoolChunk, 
PoolChunkList and PoolArena, it is clear that the size of the said buffers is 
set as pageSize * (2^maxOrder), with the default values as above. 8192b * 
(2^11) = 16MB, which agrees with the buffer size obtained from the jxray 
report, that I previously mentioned.

So looks like to reduce the amount of memory wasted by these underutilized 
netty buffers, it's best to run the Yarn NM JVM with the 
"io.netty.allocator.maxOrder" explicitly set to something less than the default 
11 value. Decreasing this number by 1 will reduce the amount of memory consumed 
by this stuff by a factor of 2. I would suggest starting with property value 9 
or 8 - that seems like a reasonable balance between savings and safety.

{quote}

I got surprised to learn that YARN NM actually uses some Spark code (e.g. 
org.apache.spark.network.yarn.YarnShuffleService) so this issue could be common 
between YARN NM and Spark shuffle service. However we did not check if 
underutilized buffers in netty apply to Spark shuffle service too - might be a 
good idea to open another jira. 

jxray seems to be a great tool to find issues like these.

> Duplicate strings in File.path managed by FileSegmentManagedBuffer
> ------------------------------------------------------------------
>
>                 Key: SPARK-24356
>                 URL: https://issues.apache.org/jira/browse/SPARK-24356
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.3.0
>            Reporter: Misha Dmitriev
>            Priority: Major
>         Attachments: SPARK-24356.01.patch
>
>
> I recently analyzed a heap dump of Yarn Node Manager that was suffering from 
> high GC pressure due to high object churn. Analysis was done with the jxray 
> tool ([www.jxray.com)|http://www.jxray.com)/] that checks a heap dump for a 
> number of well-known memory issues. One problem that it found in this dump is 
> 19.5% of memory wasted due to duplicate strings. Of these duplicates, more 
> than a half come from {{FileInputStream.path}} and {{File.path}}. All the 
> {{FileInputStream}} objects that JXRay shows are garbage - looks like they 
> are used for a very short period and then discarded (I guess there is a 
> separate question of whether that's a good pattern). But {{File}} instances 
> are traceable to 
> {{org.apache.spark.network.buffer.FileSegmentManagedBuffer.file}} field. Here 
> is the full reference chain:
>  
> {code:java}
> ↖java.io.File.path
> ↖org.apache.spark.network.buffer.FileSegmentManagedBuffer.file
> ↖{j.u.ArrayList}
> ↖j.u.ArrayList$Itr.this$0
> ↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.buffers
> ↖{java.util.concurrent.ConcurrentHashMap}.values
> ↖org.apache.spark.network.server.OneForOneStreamManager.streams
> ↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager
> ↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler
> ↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance
> {code}
>  
> Values of these {{File.path}}'s and {{FileInputStream.path}}'s look very 
> similar, so I think {{FileInputStream}}s are generated by the 
> {{FileSegmentManagedBuffer}} code. Instances of {{File}}, in turn, likely 
> come from 
> [https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java#L258-L263]
>  
> To avoid duplicate strings in {{File.path}}'s in this case, it is suggested 
> that in the above code we create a File with a complete, normalized pathname, 
> that has been already interned. This will prevent the code inside 
> {{java.io.File}} from modifying this string, and thus it will use the 
> interned copy, and will pass it to FileInputStream. Essentially the current 
> line
> {code:java}
> return new File(new File(localDir, String.format("%02x", subDirId)), 
> filename);{code}
> should be replaced with something like
> {code:java}
> String pathname = localDir + File.separator + String.format(...) + 
> File.separator + filename;
> pathname = fileSystem.normalize(pathname).intern();
> return new File(pathname);{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to