[ 
http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12443297 ] 
            
Doug Cutting commented on HADOOP-331:
-------------------------------------

> keep the Key objects so that performance doesn't suck if the user doesn't 
> define raw comparators

Note that merge performance will also suffer when the user doesn't define raw 
comparators, but not as badly as sort performance.

One other advantage to using the raw, binary comparator is that we could then 
share the sort logic with SequenceFile.  It currently has a sort algorithm 
that, without too much work, could be exposed as something like:

public static sort(int[] pointers, int[] starts, int[] lengths, byte[] data, 
WritableComparator order);

The pointers array indicates which values in starts and lengths should be used. 
 It is permuted, so that, after sorting, the key at data[starts[pointers[i]]] 
is always less than data[starts[pointers[i+1]]].  To use this we'd dispense 
with the KeyByteOffset class altogether and simply keep something like:

byte[] data;  // all buffered keys and values
int[] starts;  // the start of each pair
int[] keyLengths; // the length of each key
// values run from starts[i]+keyLenghts[i] through starts[i+1].
List<int[]> partPointers;

Then we could sort each part pointer array and then write it.  Each buffered 
entry would have a fixed 12-byte overhead, so memory accounting could be exact.

> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
>
>                 Key: HADOOP-331
>                 URL: http://issues.apache.org/jira/browse/HADOOP-331
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.3.2
>            Reporter: eric baldeschwieler
>         Assigned To: Devaraj Das
>
> The current strategy of writing a file per target map is consuming a lot of 
> unused buffer space (causing out of memory crashes) and puts a lot of burden 
> on the FS (many opens, inodes used, etc).  
> I propose that we write a single file containing all output and also write an 
> index file IDing which byte range in the file goes to each reduce.  This will 
> remove the issue of buffer waste, address scaling issues with number of open 
> files and generally set us up better for scaling.  It will also have 
> advantages with very small inputs, since the buffer cache will reduce the 
> number of seeks needed and the data serving node can open a single file and 
> just keep it open rather than needing to do directory and open ops on every 
> request.
> The only issue I see is that in cases where the task output is substantiallyu 
> larger than its input, we may need to spill multiple times.  In this case, we 
> can do a merge after all spills are complete (or during the final spill).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to