[ http://issues.apache.org/jira/browse/HADOOP-603?page=comments#action_12442654 ] eric baldeschwieler commented on HADOOP-603: --------------------------------------------
Yeah. Just keeping an array of offset vectors per partition would be simpler. Jim: Don't see a problem with your suggestion (HADOOP-603), but I don't think the two projects relate. These files will be broken into partitions. Fitting that into a MapFile might be a bit of a stretch. > Extend SequenceFile to provide MapFile function by storing index at the end > of the file > --------------------------------------------------------------------------------------- > > Key: HADOOP-603 > URL: http://issues.apache.org/jira/browse/HADOOP-603 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Jim Kellerman > > MapFile increases the load on the name node as two files are created to > provide a index file format. If SequenceFile were extended by storing the > index at the end of the file, 1/2 of the files currently created for a > map/reduce operation would be needed, reducing the load on the name node. > Perhaps this is why Google implemented SSTable files in this manner. (SSTable > files are functionally identical to Hadoop MapFiles) (see the paper on > BigTable - section 4 "Building Blocks" > http://labs.google.com/papers/bigtable.html) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira