[
https://issues.apache.org/jira/browse/PIG-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877984#comment-13877984
]
Kumar Ravi commented on PIG-3681:
---------------------------------
Following is the stack trace when the OutOfMemory Error os seen:
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed,
hadoop does not return any error message
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:145)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
================================================================================
> NullpointerException while processing files in gzip format
> ----------------------------------------------------------
>
> Key: PIG-3681
> URL: https://issues.apache.org/jira/browse/PIG-3681
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11
> Environment: Linux CentOS 6.4; CDH-4.x and CDH-5.0 (beta)
> Reporter: Kumar Ravi
>
> When pig processes a large gzip file with text or mixed text and binary
> content, it throws a NullPointerException if the property
> texinputformat.record.delimiter is set to '\n'. This is because pig
> interprets the specified delimiter as a two character string "\" followed by
> "n" and not as a new line character.
> If this property is not set, same file unzips without problems, but the diff
> output of file unzipped using pig and unzipped using the gunzip command
> differs.
> Steps to recreate:
> 1. create a text file that is ~ 4GB - I concatanated some pig/hadoop stdout
> and syslog files to create this file about 4GB in size.
> 2. compress it on unix command line - Ex. gzip abc
> 3. upload to hdfs (optional)
> 4. run the pig script included below to read/write the file.
> pig --param job_name="gunzip abc" --param inputfile="abc.gz" --param
> outputdir=./test --param outputfile=abc gunzip.pig
>
> Here are the contents of gunzip.pig:
> set job.name '$job_name'
> set textinputformat.record.delimiter "\n";
> gzdata = LOAD '$inputfile' USING PigStorage();
> STORE gzdata INTO '$outputdir/$outputfile' USING PigStorage();
> This will cause the NullPointerException.
> If the second line (set textinputformat.record.delimiter field) is commented
> out, the Exception won't occur but the output is not the same as the one
> produced by gunzip.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)