[
https://issues.apache.org/jira/browse/PIG-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988207#action_12988207
]
Olga Natkovich commented on PIG-1830:
-------------------------------------
The problem is with PigStorageSchema implementation. The class extends
PigStorage without overwriting getNext.
So, while the schema tells Pig that the data is coming as chararray, the data
is actually created (by PigStorage)
as bytearray.
The owner of the PigStorageSchema function needs to make sure that the data and
schema types match.
> Type mismatch error in key from map, when doing GROUP on PigStorageSchema()
> variable
> ------------------------------------------------------------------------------------
>
> Key: PIG-1830
> URL: https://issues.apache.org/jira/browse/PIG-1830
> Project: Pig
> Issue Type: Bug
> Reporter: Mitesh Singh Jat
>
> Pig fails when we try to GROUP data loaded via PigStorageSchema.
> {code}
> Events = LOAD 'input/PigStorageSchema' USING
> org.apache.pig.piggybank.storage.PigStorageSchema();
> Sessions = GROUP Events BY name;
> DUMP Sessions;
> {code}
> Schema file '''input/PigStorageSchema/.pig_schema'''
> {code}
> {"fields":[{"name":"name","type":55,"schema":null,"description":"autogenerated
> from Pig Field
> Schema"},{"name":"val","type":10,"schema":null,"description":"autogenerated
> from Pig Field Schema"}],"version":0,"sortKeys":[],"sortKeyOrders":[]}
> {code}
> Header file '''input/PigStorageSchema/.pig_header'''
> {code}
> name val
> {code}
> Sample input file '''input/PigStorageSchema/pss.in'''
> {code}
> peter 1
> samir 3
> michael 4
> peter 2
> peter 4
> samir 1
> {code}
> On running the above pig script, the following error is received.
> {code}
> 2010-12-15 08:07:58,367 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.io.IOException: Type mismatch in key from map: expected
> org.apache.pig.impl.io.NullableText, recieved
> org.apache.pig.impl.io.NullableBytesWritable
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:898)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:600)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:674)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:242)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
> at org.apache.hadoop.mapred.Child.main(Child.java:236)
> {code}
> On changing "type" of "name" from 55(chararray) to 50(bytearray), the
> GROUP-BY worked.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.