[ https://issues.apache.org/jira/browse/HADOOP-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ankur updated HADOOP-1824: -------------------------- Attachment: ZipInputFormat.patch Attaching the patch file > want InputFormat for zip files > ------------------------------ > > Key: HADOOP-1824 > URL: https://issues.apache.org/jira/browse/HADOOP-1824 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Affects Versions: 0.15.2 > Reporter: Doug Cutting > Attachments: ZipInputFormat.patch > > > HDFS is inefficient with large numbers of small files. Thus one might pack > many small files into large, compressed, archives. But, for efficient > map-reduce operation, it is desireable to be able to split inputs into > smaller chunks, with one or more small original file per split. The zip > format, unlike tar, permits enumeration of files in the archive without > scanning the entire archive. Thus a zip InputFormat could efficiently permit > splitting large archives into splits that contain one or more archived files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.