[
https://issues.apache.org/jira/browse/HADOOP-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640158#action_12640158
]
Jothi Padmanabhan commented on HADOOP-4396:
-------------------------------------------
One thing that was different between the LocalFileSystem (LFS) and the
RawLocalFileSystem (RFS) is that all the reads and writes from LFS, when they
reach the RFS layer, are in chunks of 512 bytes. I tried to mimic this
behavior at the IFileInputStream and IFileOutputStream by reading and writing
in 1k chunks and magically the performance degradation disappeared. What I did
was something like
{code}
write (b, off, len) {
bytesToWrite = 0;
bytesWritten = 0;
while (bytesWritten < len) {
bytesToWrite = Math.min(len-bytesWritten,1024);
out.write (b, off+bytesWritten, bytesToWrite);
bytesWritten += bytesToWrite;
}
}
{code}
Similarly for the read as well. Any thoughts on why this should work?
> sort on 400 nodes is now slower than in 18
> ------------------------------------------
>
> Key: HADOOP-4396
> URL: https://issues.apache.org/jira/browse/HADOOP-4396
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Jothi Padmanabhan
> Assignee: Jothi Padmanabhan
> Priority: Blocker
> Fix For: 0.19.0
>
>
> Sort on 400 nodes on hadoop release 18 takes about 29 minutes, but with the
> 19 branch takes about 32 minutes. This behavior is consistent.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.