[
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154458#comment-13154458
]
Prashant Kommireddi commented on PIG-798:
-----------------------------------------
I developed UDF to return a Tuple of DataByteArray. The error corresponding to
this return type is
{code}
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to string.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:657)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias result
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open
iterator for alias result
at org.apache.pig.PigServer.openIterator(PigServer.java:901)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:553)
at org.apache.pig.Main.main(Main.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Couldn't retrieve job.
at org.apache.pig.PigServer.store(PigServer.java:965)
at org.apache.pig.PigServer.openIterator(PigServer.java:876)
... 12 more
{code}
If I change the return type to be a Tuple of chararrays, I receive an error at
a later stage when using SUM
{code}
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while
computing sum in Initial
at org.apache.pig.builtin.SUM$Initial.exec(SUM.java:113)
at org.apache.pig.builtin.SUM$Initial.exec(SUM.java:85)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:253)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.DataByteArray
at org.apache.pig.builtin.SUM$Initial.exec(SUM.java:99)
... 15 more
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias result
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open
iterator for alias result
at org.apache.pig.PigServer.openIterator(PigServer.java:901)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:553)
at org.apache.pig.Main.main(Main.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Couldn't retrieve job.
at org.apache.pig.PigServer.store(PigServer.java:965)
at org.apache.pig.PigServer.openIterator(PigServer.java:876)
... 12 more
{code}
> Schema errors when using PigStorage and none when using BinStorage in
> FOREACH??
> -------------------------------------------------------------------------------
>
> Key: PIG-798
> URL: https://issues.apache.org/jira/browse/PIG-798
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
> Reporter: Viraj Bhat
> Assignee: Alan Gates
> Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using
> PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray,
> url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use
> PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other
> Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage()
> where is ok not to specify a schema??? Should it not be consistent across
> both.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira