[ 
https://issues.apache.org/jira/browse/HADOOP-17562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated HADOOP-17562:
--------------------------------------
    Component/s: io

> Provide mechanism for explicitly specifying the compression codec for input 
> files
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-17562
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17562
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Nicholas Chammas
>            Priority: Minor
>
> I come to you via SPARK-29280.
> I am looking for the file _input_ equivalents of the following settings:
> {code:java}
> mapreduce.output.fileoutputformat.compress
> mapreduce.map.output.compress{code}
> Right now, I understand that Hadoop infers the codec to use when reading a 
> file from the file's extension.
> However, in some cases the files may have the incorrect extension or no 
> extension. There are links to some examples from SPARK-29280.
> Ideally, you should be able to explicitly specify the codec to use to read 
> those files. I don't believe that's possible today. Instead, the current 
> workaround appears to be to [create a custom codec 
> class|https://stackoverflow.com/a/17152167/877069] and override the 
> getDefaultExtension method to specify the extension to expect.
> Does it make sense to offer an explicit way to select the compression codec 
> for file input, mirroring how things work for file output?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to