[ http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12443028 ] Sameer Paranjpye commented on HADOOP-331: -----------------------------------------
Doug: > Still, I agree that we can minimize comparisons & swaps by bucketing these > pointers, then sorting each > buffer prior to dumping the data it points to. Yes, this is what I meant. --------------------- Don't think KeyByteOffset needs a partition# since partition# is the same as the index of the list in List<KeyByteOffset>[NumReduces] in which the key is stored. > Under this scheme, each spill will result in a separate <PartKey,value> > sequence file, right? And probably > each such sequence file should be accompanied by a <part,position> index > file. That way, if there's only > one spill, it can be used directly. Note that block compression complicates > random access. Perhaps we > should add a SequenceFile#Writer.flush() method that, when block-compression > is used, resets the > compressor and emits a sync, creating a seekable offset. +1 Another minor optimization could be to only include the partition# in the first key in each partition, populating all succeeding keys with 0. > map outputs should be written to a single output file with an index > ------------------------------------------------------------------- > > Key: HADOOP-331 > URL: http://issues.apache.org/jira/browse/HADOOP-331 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.3.2 > Reporter: eric baldeschwieler > Assigned To: Devaraj Das > > The current strategy of writing a file per target map is consuming a lot of > unused buffer space (causing out of memory crashes) and puts a lot of burden > on the FS (many opens, inodes used, etc). > I propose that we write a single file containing all output and also write an > index file IDing which byte range in the file goes to each reduce. This will > remove the issue of buffer waste, address scaling issues with number of open > files and generally set us up better for scaling. It will also have > advantages with very small inputs, since the buffer cache will reduce the > number of seeks needed and the data serving node can open a single file and > just keep it open rather than needing to do directory and open ops on every > request. > The only issue I see is that in cases where the task output is substantiallyu > larger than its input, we may need to spill multiple times. In this case, we > can do a merge after all spills are complete (or during the final spill). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira