Harsh J created PIG-2746:
----------------------------

             Summary: Pig doesn't detect all forms of compression extensions 
properly
                 Key: PIG-2746
                 URL: https://issues.apache.org/jira/browse/PIG-2746
             Project: Pig
          Issue Type: Bug
            Reporter: Harsh J


The PigStorage has the following snippet.

{code}
private void setCompression(Path path, Job job) {
        String location=path.getName();
        if (location.endsWith(".bz2") || location.endsWith(".bz")) {
            FileOutputFormat.setCompressOutput(job, true);
            FileOutputFormat.setOutputCompressorClass(job,  BZip2Codec.class);
        }  else if (location.endsWith(".gz")) {
            FileOutputFormat.setCompressOutput(job, true);
            FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
        } else {
            FileOutputFormat.setCompressOutput( job, false);
        }
    }
{code}

This limits it to only work with STORE filenames provided as 'output.gz' or 
'output.bz2' and for the rest (like LZO) one has to specify codecs and manually 
enable compression.

Ideally Pig can rely on Hadoop's extension-to-codec detector instead of having 
this ladder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to