[ 
http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12443028 ] 
            
Sameer Paranjpye commented on HADOOP-331:
-----------------------------------------

Doug:

> Still, I agree that we can minimize comparisons & swaps by bucketing these 
> pointers, then sorting each 
> buffer prior to dumping the data it points to.

Yes, this is what I meant.
---------------------

Don't think KeyByteOffset needs a partition# since partition# is the same as 
the index of the list in List<KeyByteOffset>[NumReduces] in which the key is 
stored.

> Under this scheme, each spill will result in a separate <PartKey,value> 
> sequence file, right? And probably 
> each such sequence file should be accompanied by a <part,position> index 
> file. That way, if there's only 
> one spill, it can be used directly. Note that block compression complicates 
> random access. Perhaps we 
> should add a SequenceFile#Writer.flush() method that, when block-compression 
> is used, resets the 
> compressor and emits a sync, creating a seekable offset.

+1

Another minor optimization could be to only include the partition# in the first 
key in each partition, populating all succeeding keys with 0.


> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
>
>                 Key: HADOOP-331
>                 URL: http://issues.apache.org/jira/browse/HADOOP-331
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.3.2
>            Reporter: eric baldeschwieler
>         Assigned To: Devaraj Das
>
> The current strategy of writing a file per target map is consuming a lot of 
> unused buffer space (causing out of memory crashes) and puts a lot of burden 
> on the FS (many opens, inodes used, etc).  
> I propose that we write a single file containing all output and also write an 
> index file IDing which byte range in the file goes to each reduce.  This will 
> remove the issue of buffer waste, address scaling issues with number of open 
> files and generally set us up better for scaling.  It will also have 
> advantages with very small inputs, since the buffer cache will reduce the 
> number of seeks needed and the data serving node can open a single file and 
> just keep it open rather than needing to do directory and open ops on every 
> request.
> The only issue I see is that in cases where the task output is substantiallyu 
> larger than its input, we may need to spill multiple times.  In this case, we 
> can do a merge after all spills are complete (or during the final spill).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to