[
http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12443043 ]
Sameer Paranjpye commented on HADOOP-331:
-----------------------------------------
Doug:
> But then we cannot use SequenceFile's merger, since keys wouldn't be
> independently comparable, right?
Yes, you're right, they wouldn't. This optimization is probably not worth the
trouble.
-----
To summarize, this is what we appear to have ended up with.
- A couple of key variants
class KeyByteOffset {
WritableComparable key;
int offset;
int length;
}
class PartKey {
int partition;
WritableComparable key;
}
- In RAM data structures
List<KeyByteOffset>[NumReduces] keyLists; // one list of keys per reduce
byte[] valueBuffer; // serialized values are appended to this buffer
- For each <key, value> pair.
Determine partition for the key. Append the key to keyLists[partition], append
the value to valueBuffer.
If #records is greater than #maxRecords or #valueBytes is greater than
#maxValueBytes
OR
We have no more records to process.
SPILL - The spill is a block compressed sequence file of <PartKey, value>
pairs, with a <part, offset> index.
Compressed blocks in this file must not span partitions.
- At the end, if #numSpills > 1, merge spills
> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
>
> Key: HADOOP-331
> URL: http://issues.apache.org/jira/browse/HADOOP-331
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.3.2
> Reporter: eric baldeschwieler
> Assigned To: Devaraj Das
>
> The current strategy of writing a file per target map is consuming a lot of
> unused buffer space (causing out of memory crashes) and puts a lot of burden
> on the FS (many opens, inodes used, etc).
> I propose that we write a single file containing all output and also write an
> index file IDing which byte range in the file goes to each reduce. This will
> remove the issue of buffer waste, address scaling issues with number of open
> files and generally set us up better for scaling. It will also have
> advantages with very small inputs, since the buffer cache will reduce the
> number of seeks needed and the data serving node can open a single file and
> just keep it open rather than needing to do directory and open ops on every
> request.
> The only issue I see is that in cases where the task output is substantiallyu
> larger than its input, we may need to spill multiple times. In this case, we
> can do a merge after all spills are complete (or during the final spill).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira