[jira] [Commented] (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Prashant Kommireddi (Commented) (JIRA) Mon, 21 Nov 2011 12:29:30 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154458#comment-13154458
 ]


Prashant Kommireddi commented on PIG-798:
-----------------------------------------

I developed UDF to return a Tuple of DataByteArray. The error corresponding to 
this return type is
{code}
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine how to convert the bytearray to string.
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:657)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias result

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias result
        at org.apache.pig.PigServer.openIterator(PigServer.java:901)
        at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:553)
        at org.apache.pig.Main.main(Main.java:108)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Couldn't retrieve job.
        at org.apache.pig.PigServer.store(PigServer.java:965)
        at org.apache.pig.PigServer.openIterator(PigServer.java:876)
        ... 12 more
{code}

If I change the return type to be a Tuple of chararrays, I receive an error at 
a later stage when using SUM
{code}
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
computing sum in Initial
        at org.apache.pig.builtin.SUM$Initial.exec(SUM.java:113)
        at org.apache.pig.builtin.SUM$Initial.exec(SUM.java:85)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:253)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.pig.data.DataByteArray
        at org.apache.pig.builtin.SUM$Initial.exec(SUM.java:99)
        ... 15 more

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias result

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias result
        at org.apache.pig.PigServer.openIterator(PigServer.java:901)
        at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:553)
        at org.apache.pig.Main.main(Main.java:108)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Couldn't retrieve job.
        at org.apache.pig.PigServer.store(PigServer.java:965)
        at org.apache.pig.PigServer.openIterator(PigServer.java:876)
        ... 12 more
{code}


                
> Schema errors when using PigStorage and none when using BinStorage in 
> FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using 
> PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, 
> url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use 
> PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other 
> Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() 
> where is ok not to specify a schema??? Should it not be consistent across 
> both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Reply via email to