[ https://issues.apache.org/jira/browse/PIG-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mitesh Singh Jat updated PIG-1830: ---------------------------------- Description: Pig fails when we try to GROUP data loaded via PigStorageSchema. {code} Events = LOAD 'input/PigStorageSchema' USING org.apache.pig.piggybank.storage.PigStorageSchema(); Sessions = GROUP Events BY name; DUMP Sessions; {code} Schema file '''input/PigStorageSchema/.pig_schema''' {code} {"fields":[{"name":"name","type":55,"schema":null,"description":"autogenerated from Pig Field Schema"},{"name":"val","type":10,"schema":null,"description":"autogenerated from Pig Field Schema"}],"version":0,"sortKeys":[],"sortKeyOrders":[]} {code} Sample input file '''input/PigStorageSchema/pss.in''' {code} peter 1 samir 3 michael 4 peter 2 peter 4 samir 1 {code} On running the above pig script, the following error is received. {code} 2010-12-15 08:07:58,367 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableBytesWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:898) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:600) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335) at org.apache.hadoop.mapred.Child$4.run(Child.java:242) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) at org.apache.hadoop.mapred.Child.main(Child.java:236) {code} On changing "type" of "name" from 55(chararray) to 50(bytearray), the GROUP-BY worked. was: Pig fails when we try to GROUP data loaded via PigStorageSchema. {code lang=java} Events = LOAD 'input/PigStorageSchema' USING org.apache.pig.piggybank.storage.PigStorageSchema(); Sessions = GROUP Events BY name; DUMP Sessions; {code} Schema file '''input/PigStorageSchema/.pig_schema''' {code lang=java} {"fields":[{"name":"name","type":55,"schema":null,"description":"autogenerated from Pig Field Schema"},{"name":"val","type":10,"schema":null,"description":"autogenerated from Pig Field Schema"}],"version":0,"sortKeys":[],"sortKeyOrders":[]} {code} Sample input file '''input/PigStorageSchema/pss.in''' {code lang=java} peter 1 samir 3 michael 4 peter 2 peter 4 samir 1 {code} On running the above pig script, the following error is received. {code lang=java} 2010-12-15 08:07:58,367 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableBytesWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:898) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:600) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335) at org.apache.hadoop.mapred.Child$4.run(Child.java:242) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) at org.apache.hadoop.mapred.Child.main(Child.java:236) {code} On changing "type" of "name" from 55(chararray) to 50(bytearray), the GROUP-BY worked. > Type mismatch error in key from map, when doing GROUP on PigStorageSchema() > variable > ------------------------------------------------------------------------------------ > > Key: PIG-1830 > URL: https://issues.apache.org/jira/browse/PIG-1830 > Project: Pig > Issue Type: Bug > Reporter: Mitesh Singh Jat > > Pig fails when we try to GROUP data loaded via PigStorageSchema. > {code} > Events = LOAD 'input/PigStorageSchema' USING > org.apache.pig.piggybank.storage.PigStorageSchema(); > Sessions = GROUP Events BY name; > DUMP Sessions; > {code} > Schema file '''input/PigStorageSchema/.pig_schema''' > {code} > {"fields":[{"name":"name","type":55,"schema":null,"description":"autogenerated > from Pig Field > Schema"},{"name":"val","type":10,"schema":null,"description":"autogenerated > from Pig Field Schema"}],"version":0,"sortKeys":[],"sortKeyOrders":[]} > {code} > Sample input file '''input/PigStorageSchema/pss.in''' > {code} > peter 1 > samir 3 > michael 4 > peter 2 > peter 4 > samir 1 > {code} > On running the above pig script, the following error is received. > {code} > 2010-12-15 08:07:58,367 WARN org.apache.hadoop.mapred.Child: Error running > child > java.io.IOException: Type mismatch in key from map: expected > org.apache.pig.impl.io.NullableText, recieved > org.apache.pig.impl.io.NullableBytesWritable > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:898) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:600) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:674) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335) > at org.apache.hadoop.mapred.Child$4.run(Child.java:242) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) > at org.apache.hadoop.mapred.Child.main(Child.java:236) > {code} > On changing "type" of "name" from 55(chararray) to 50(bytearray), the > GROUP-BY worked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.