[ http://issues.apache.org/jira/browse/HADOOP-522?page=all ]
Doug Cutting updated HADOOP-522: -------------------------------- Attachment: block-compress-map-file.patch Here's a candidate patch. It: 1. Adds MapFile and SetFile constructors to specify compression type. 2. Fixes MapFile to work correctly with block compression. 3. Fixes SequenceFile to permit random accesses. 4. Cleans up some awkward code in SequenceFile#Reader#readBuffer(). 5. Adds some javadoc clarifying a few API assumptions. 5. Extends the SetFile unit test to use block compression. Can someone familiar with SequenceFile please review this? Thanks! > MapFile should support block compression > ---------------------------------------- > > Key: HADOOP-522 > URL: http://issues.apache.org/jira/browse/HADOOP-522 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Doug Cutting > Assigned To: Doug Cutting > Attachments: block-compress-map-file.patch > > > MapFile is layered on SequenceFile and permits random-access to sorted data > files (typically reduce output) through a parallel index file. This is used > widely in Nutch (e.g. at search time for displaying cached pages, incoming > links, etc). Such sorted data should benefit from block compression, but the > current MapFile API does not support specification of block compression. > Also, even if it did, the semantics of SequenceFile methods like seek() and > getPosition() are changed under block compression so that MapFile may not > work. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira