[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551977 ]
Mukund Madhugiri commented on HADOOP-1707: ------------------------------------------ I ran sort benchmark on 500 nodes and here is the data: trunk: * random writer: 0.405 hrs * sort: 1.508 hrs * sort validation: 0.333 hrs trunk + patch: * random writer: 0.534 hrs * sort: 1.808 hrs * sort validation: 0.408 hrs During the sort reduce phase, I observed some errors, but the sort eventually succeeded: * java.io.IOException: Could not get block locations. Aborting... * java.io.IOException: All datanodes are bad. Aborting... > Remove the DFS Client disk-based cache > -------------------------------------- > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.16.0 > > Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, > clientDiskBuffer11.patch, clientDiskBuffer12.patch, clientDiskBuffer14.patch, > clientDiskBuffer2.patch, clientDiskBuffer6.patch, clientDiskBuffer7.patch, > clientDiskBuffer8.patch, clientDiskBuffer9.patch, DataTransferProtocol.doc, > DataTransferProtocol.html > > > The DFS client currently uses a staging file on local disk to cache all > user-writes to a file. When the staging file accumulates 1 block worth of > data, its contents are flushed to a HDFS datanode. These operations occur > sequentially. > A simple optimization of allowing the user to write to another staging file > while simultaneously uploading the contents of the first staging file to HDFS > will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.