[ https://issues.apache.org/jira/browse/PIG-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-537: ------------------------------- Attachment: PIG-537.patch > Failure in Hadoop map collect stage due to type mismatch in the keys used in > cogroup > ------------------------------------------------------------------------------------ > > Key: PIG-537 > URL: https://issues.apache.org/jira/browse/PIG-537 > Project: Pig > Issue Type: Bug > Affects Versions: types_branch > Reporter: Viraj Bhat > Assignee: Pradeep Kamath > Priority: Critical > Fix For: types_branch > > Attachments: explain_aliasC.log, mygrades.txt, mymarks.txt, > PIG-537.patch > > > Consider the following pig query, which demonstrates various problems during > the Logical Plan creation and the subsequent execution of the M/R job. In > this query we do two cogroups, one between A and B to generate an alias > ABtemptable. Then we again cogroup A with ABtemptable based on marks which > was read in as an int. > ================================================================================== > {code} > A = load 'mymarks.txt' as (marks:int, username:chararray); > B = load 'mygrades.txt' as (username:chararray,grade:chararray); > ABtemp = cogroup A by username, B by username; > ABtemptable = foreach ABtemp generate > group as username, > flatten(A.marks) as newmarks; > --describe ABtemptable; > C = cogroup A by marks, ABtemptable by newmarks; > --describe C; > explain C; > dump C; > {code} > ================================================================================== > The schema for C and ABtemptable which pig reports: > ================================================================================== > {code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: > int} > {code}describe C;{code} C: {group: int,A: {username: chararray,marks: > int},ABtemptable: {username: chararray,newmarks: int}} > ================================================================================== > If you run the above query you get the following error: > ================================================================================== > 2008-11-18 03:57:14,372 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error > message from task (map) task_200810152105_0156_m_000000java.io.IOException: > Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, > recieved org.apache.pig.impl.io.NullableIntWritable > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209) > ================================================================================== > Looking at the {code}explain C;{code} output, you see that newmarks has > become a chararray (surprising!!) > ================================================================================== > ---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: > Unknown,{username: bytearray,marks: int},ABtemptable: {username: > chararray,newmarks: chararray}} Type: bag > Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] > Overloaded: false FieldSchema: marks: int Type: int > Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29 > Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] > Overloaded: false FieldSchema: newmarks: chararray Type: chararray > Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 > ---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: > chararray,newmarks: chararray} Type: bag > ================================================================================== > In Summary this script demonstrates the following problems: > 1) Logical Plan creation > 2) When cogrouping with fields of different types which results in group > unknown is not caught during compile phase. > Additionally I am enclosing the explain output of alias C and testfiles to > run the script which is on this jira!! > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.