[ https://issues.apache.org/jira/browse/MAPREDUCE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934339#comment-13934339 ]
Chris Nauroth commented on MAPREDUCE-5791: ------------------------------------------ A few comments: # {{ShuffleHandler}} ## {{SHUFFLE_BUFFER_SIZE}} is potentially confusing, because it's reusing {{io.file.buffer.size}}, which is already used elsewhere, and with a different default value. I recommend making this an MR-specific property, like {{mapreduce.shuffle.transfer.buffer.size}}, and documenting it in mapred-default.xml. # {{TestFadvisedFileRegion}} ## Minor nit: we put the opening { inline, not on a separate line. ## {{out}}, {{inputFile}}, {{targetFile}}, {{target}} and {{in}} all should be closed inside a finally block to guarantee cleanup. The {{IOUtils#cleanup}} method is helpful for this. For {{fileRegion}}, it looks like we ought to call {{releaseExternalResources}}: http://grepcode.com/file/repository.jboss.org/nexus/content/repositories/releases/org.jboss.netty/netty/3.2.0.CR1/org/jboss/netty/channel/DefaultFileRegion.java. ## {{testCustomShuffleTransferCornerCases}}: This can be removed if you don't intend to add test code in here. > Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not > read disks efficiently > ------------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-5791 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5791 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Nikola Vujic > Assignee: Nikola Vujic > Attachments: MAPREDUCE-5791.patch > > > transferTo method in org.apache.hadoop.mapred.FadvisedFileRegion is using > transferTo method from a FileChannel to transfer data from a disk to socket. > This is performing slow in Windows, slower than in Linux. The reason is that > transferTo method for the java.nio is issuing 32K IO requests all the time. > In Windows, these 32K transfers are not optimal and we don't get the best > performance form the underlying IO subsystem. In order to achieve better > performance when reading from the drives, we need to read data in bigger > chunks, 512K for example. -- This message was sent by Atlassian JIRA (v6.2#6252)