[
https://issues.apache.org/jira/browse/CRUNCH-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815518#comment-15815518
]
Micah Whitacre commented on CRUNCH-632:
---------------------------------------
I think the change to support compression is actually easy from a reading
perspective. Just detect and use the CompressionCodec to wrap the InputStream.
The part that I'm more concerned with is how to support calculating the splits.
Text Files + compression = Not Splittable typically so we could take that
approach.
> Add compression support for CSVFileSource
> -----------------------------------------
>
> Key: CRUNCH-632
> URL: https://issues.apache.org/jira/browse/CRUNCH-632
> Project: Crunch
> Issue Type: Improvement
> Reporter: Jim McStanton
> Priority: Minor
>
> Currently CSVFileSource does not support decompressing files before reading
> them, and simply opens the file and starts reading the contents:
> https://github.com/apache/crunch/blob/6280983179e9c690af69c2bf0e296b054122d724/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVRecordReader.java#L127.
>
> This source would more closely match TextFileSource if this support was
> added. The {{LineRecordReader}} supports this behavior
> [here|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.7.1/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java?av=f#87].
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)