[ https://issues.apache.org/jira/browse/HADOOP-17562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas Chammas updated HADOOP-17562: -------------------------------------- Component/s: io > Provide mechanism for explicitly specifying the compression codec for input > files > --------------------------------------------------------------------------------- > > Key: HADOOP-17562 > URL: https://issues.apache.org/jira/browse/HADOOP-17562 > Project: Hadoop Common > Issue Type: Improvement > Components: io > Reporter: Nicholas Chammas > Priority: Minor > > I come to you via SPARK-29280. > I am looking for the file _input_ equivalents of the following settings: > {code:java} > mapreduce.output.fileoutputformat.compress > mapreduce.map.output.compress{code} > Right now, I understand that Hadoop infers the codec to use when reading a > file from the file's extension. > However, in some cases the files may have the incorrect extension or no > extension. There are links to some examples from SPARK-29280. > Ideally, you should be able to explicitly specify the codec to use to read > those files. I don't believe that's possible today. Instead, the current > workaround appears to be to [create a custom codec > class|https://stackoverflow.com/a/17152167/877069] and override the > getDefaultExtension method to specify the extension to expect. > Does it make sense to offer an explicit way to select the compression codec > for file input, mirroring how things work for file output? -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org