[ https://issues.apache.org/jira/browse/MAPREDUCE-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Payne updated MAPREDUCE-6678: ---------------------------------- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks [~nroberts]. I committed these changes to trunk, branch-2, and branch-2.8. > Allow ShuffleHandler readahead without drop-behind > -------------------------------------------------- > > Key: MAPREDUCE-6678 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6678 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: nodemanager > Affects Versions: 3.0.0, 2.7.2 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Fix For: 2.8.0 > > Attachments: YARN-4964.001.patch > > > Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead > (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the > ShuffleHandler. > It would be beneficial if these were separately configurable. > - Running without readahead can lead to significant seek storms caused by > large numbers of sendfiles() competing with one another. > - However, running with drop-behind can also lead to seek storms because > there are cases where the server can successfully write the shuffle bytes to > the network, BUT the client doesn't want the bytes right now (MergeManager > wants to WAIT is an example) so it ignores them and asks for them again a bit > later. This causes repeated reads of the same data from disk. > I'll attach a simple patch that enables/disables readahead based on > mapreduce.shuffle.readahead.bytes==0, leaving > mapreduce.shuffle.manage.os.cache controlling only the drop-behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org