Re: [jira] Commented: (HADOOP-939) No-sort optimization

Arkady Borkovsky Mon, 29 Jan 2007 10:07:20 -0800

Doug's calculation shows that the total gain can be only 1/3 (15 areunavoidable, and taking advantage of largely pre-sorted input reducesoverhead from 12/27 to 3/18, so the maximum total gain is 27->18.)

Does this model assume that the size of the output of reduce is similarto the size of the input?

An important class of applications (mentioned in this thread before)uses two inputs:-- M ("master file") -- very large, presorted and not changing from runto run,-- D ("details file") -- smaller, different from run to run, notnecessarily presorted

and the output size is proportional to the size of D.

In this case the gain from "no-sort" may be much higher, as the 13"transfer and write" to DFS are applied to a smaller amount of data,while 11 (b-d) sort-n-shuffle-related are saved on the larger data).



On Jan 25, 2007, at 5:21 PM, Doug Cutting (JIRA) wrote:

[https://issues.apache.org/jira/browse/HADOOP-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467717 ]
Doug Cutting commented on HADOOP-939:
-------------------------------------
I suspect that most of the performance gains to be had by declaringinput to be sorted can also be had by using heuristics that also speedthings when input is only nearly sorted. (By "nearly sorted" I meanthings like merging a set of updates into a sorted database, e.g., thecrawl db update task in Nutch.)
Eric Baldeschwieler proposed a simple model for MapReduce performance.If you assume that disks can read and write at 100MB/s, and thatnodes can talk within rack at 100MB/s (Gb/s) and to nodes in anotherrack at 10MB/s, then a MapReduce requires the following number ofseconds per 100MB. (Note that this assumes various sort optimizationsthat are already in progress, where map outputs are buffered andsorted before they're spilled to the local disk on map nodes, andreduce inputs are buffered and merged before they're spilled to thelocal disk on the reduce node, so that, in many cases, reduce canproceed without an explicit sort stage but simply by merging a set ofalready sorted input files from the local disk.)
a.  1 read input data from local drive on map node
[ map ]
b.  1 write batches of sorted output data to temporary file on map node
c. 10 shuffle batches of sorted data to reduce node
d.  1 write batches of sorted data to reduce node
[ reduce]
e.  1 write one copy of output locally
f.  2 transfer and write one copy to another node on the same rack
g. 11 transfer and write one copy to an off-rack node
So the total is 27s/100MB. Only two of those are reallysort-specific, (b) and (d). 14 (more than half) are unavoidable.
The biggest chunk of fat to go after for pre-sorted input is (c).This can be eliminated if maps can be placed near reduces. Forexample, tasktrackers might report the size of each partition they'regenerating and the jobtracker might use this to schedule reduces onracks which already have a lot of their input.
No-sort optimization
--------------------

                Key: HADOOP-939
                URL: https://issues.apache.org/jira/browse/HADOOP-939
            Project: Hadoop
         Issue Type: New Feature
         Components: mapred
        Environment: all
           Reporter: Doug Judd
There should be a way to tell the mapred framework that the output ofthe map() phase will already be sorted. The Reduce phase can justmerge the intermediate files together without sorting.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (HADOOP-939) No-sort optimization

Reply via email to