[jira] [Commented] (MAPREDUCE-210) want InputFormat for zip files

2021-06-02 Thread Sebastien Crocquevieille (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356189#comment-17356189
 ] 

Sebastien Crocquevieille commented on MAPREDUCE-210:


[~indrajeetapache], [~cutting] quick ping here.

Any chance of waking up this issue from its deep slumber?

If the previous work done on this issue is too dusty, as [~harisekhon] said 
there is a 3rd party format here: 
https://github.com/cotdp/com-cotdp-hadoop/tree/master/src/main/java/com/cotdp/hadoop
With the associated blog post: 
[http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/]

We'd all be terribly grateful :)

> want InputFormat for zip files
> --
>
> Key: MAPREDUCE-210
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-210
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Doug Cutting
>Assignee: indrajit
>Priority: Major
> Attachments: ZipInputFormat_fixed.patch
>
>
> HDFS is inefficient with large numbers of small files.  Thus one might pack 
> many small files into large, compressed, archives.  But, for efficient 
> map-reduce operation, it is desireable to be able to split inputs into 
> smaller chunks, with one or more small original file per split.  The zip 
> format, unlike tar, permits enumeration of files in the archive without 
> scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
> splitting large archives into splits that contain one or more archived files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-210) want InputFormat for zip files

2015-03-11 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356797#comment-14356797
 ] 

Hari Sekhon commented on MAPREDUCE-210:
---

There is 3rd party zip inputformat here:

http://cotdp.com/2012/07/hadoop-processing-zip-files-in-mapreduce/

I think it's important for the zip inputformat to be natively supported because 
traditional enterprises where Hadoop is now starting to penetrate use zip a 
lot, especially in large corporates which are Windows heavy and don't realize 
the problems they are causing by having so many things in zip files that Hadoop 
currently can't read.

 want InputFormat for zip files
 --

 Key: MAPREDUCE-210
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-210
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Doug Cutting
Assignee: indrajit
 Attachments: ZipInputFormat_fixed.patch


 HDFS is inefficient with large numbers of small files.  Thus one might pack 
 many small files into large, compressed, archives.  But, for efficient 
 map-reduce operation, it is desireable to be able to split inputs into 
 smaller chunks, with one or more small original file per split.  The zip 
 format, unlike tar, permits enumeration of files in the archive without 
 scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
 splitting large archives into splits that contain one or more archived files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-210) want InputFormat for zip files

2015-01-06 Thread Teng Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266439#comment-14266439
 ] 

Teng Qiu commented on MAPREDUCE-210:


ping again, in 2015... -_-

 want InputFormat for zip files
 --

 Key: MAPREDUCE-210
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-210
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Doug Cutting
Assignee: indrajit
 Attachments: ZipInputFormat_fixed.patch


 HDFS is inefficient with large numbers of small files.  Thus one might pack 
 many small files into large, compressed, archives.  But, for efficient 
 map-reduce operation, it is desireable to be able to split inputs into 
 smaller chunks, with one or more small original file per split.  The zip 
 format, unlike tar, permits enumeration of files in the archive without 
 scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
 splitting large archives into splits that contain one or more archived files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-210) want InputFormat for zip files

2014-07-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065216#comment-14065216
 ] 

Allen Wittenauer commented on MAPREDUCE-210:


Ping!



 want InputFormat for zip files
 --

 Key: MAPREDUCE-210
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-210
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Doug Cutting
Assignee: indrajit
 Attachments: ZipInputFormat_fixed.patch


 HDFS is inefficient with large numbers of small files.  Thus one might pack 
 many small files into large, compressed, archives.  But, for efficient 
 map-reduce operation, it is desireable to be able to split inputs into 
 smaller chunks, with one or more small original file per split.  The zip 
 format, unlike tar, permits enumeration of files in the archive without 
 scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
 splitting large archives into splits that contain one or more archived files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] Commented: (MAPREDUCE-210) want InputFormat for zip files

2009-09-25 Thread Patrick Angeles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759618#action_12759618
 ] 

Patrick Angeles commented on MAPREDUCE-210:
---

Any updates on this issue? What's the current thinking on shell tools + JNI 
versus Ant's unzip code? Anything I can do to contribute? Regards...

 want InputFormat for zip files
 --

 Key: MAPREDUCE-210
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-210
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Doug Cutting
 Attachments: ZipInputFormat_fixed.patch


 HDFS is inefficient with large numbers of small files.  Thus one might pack 
 many small files into large, compressed, archives.  But, for efficient 
 map-reduce operation, it is desireable to be able to split inputs into 
 smaller chunks, with one or more small original file per split.  The zip 
 format, unlike tar, permits enumeration of files in the archive without 
 scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
 splitting large archives into splits that contain one or more archived files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.