[ http://issues.apache.org/jira/browse/HADOOP-816?page=all ]
Devaraj Das updated HADOOP-816:
-------------------------------
Attachment: 816.patch
Added code in Sort.java that will look at the dfs blocksize of a file in the
input path, and depending on the blocksize, will infer the map buffer size and
set a config in jobconf (this is because the sort benchmark spawns maps that
each work on dfs blocksize amount of data)
Added a new config variable - map.buffer.size.mb in MapTask.java that will
control the buffer size (currently the buffer size is controlled by
io.sort.mb). Decided to make this independent since io.sort.mb has some bearing
on the buffer sizes for the buffers created during reading/writing files;
wanted to keep those buffer sizes separate from the map buffer size.
> Allow the sort benchmark to set a buffersize for the map buffer
> ---------------------------------------------------------------
>
> Key: HADOOP-816
> URL: http://issues.apache.org/jira/browse/HADOOP-816
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
> Assigned To: Devaraj Das
> Attachments: 816.patch
>
>
> Discovered that framework merges are the hotspots where most time is spent in
> the sort benchmark. With HADOOP-331, the Map phase could potentially do a
> merge of the spills (this merge was not done pre-HADOOP-331), and then there
> is one compulsory merge on each reduce. It may be good to avoid the merge in
> the Map phase, if possible.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira