Harsh J created PIG-2746:
----------------------------
Summary: Pig doesn't detect all forms of compression extensions
properly
Key: PIG-2746
URL: https://issues.apache.org/jira/browse/PIG-2746
Project: Pig
Issue Type: Bug
Reporter: Harsh J
The PigStorage has the following snippet.
{code}
private void setCompression(Path path, Job job) {
String location=path.getName();
if (location.endsWith(".bz2") || location.endsWith(".bz")) {
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);
} else if (location.endsWith(".gz")) {
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
} else {
FileOutputFormat.setCompressOutput( job, false);
}
}
{code}
This limits it to only work with STORE filenames provided as 'output.gz' or
'output.bz2' and for the rest (like LZO) one has to specify codecs and manually
enable compression.
Ideally Pig can rely on Hadoop's extension-to-codec detector instead of having
this ladder.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira