[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759789#action_12759789 ] Raghu Angadi commented on PIG-949: -- I just committed this. Thanks Yan for the fix and Jing for the test! > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.5.0 > > Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758535#action_12758535 ] Hadoop QA commented on PIG-949: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420313/Pig_949.patch against trunk revision 817739. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/console This message is automatically generated. > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.4.0, 0.5.0 > > Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758387#action_12758387 ] Yan Zhou commented on PIG-949: -- Test case added. Thanks, Yan > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Attachments: Pig_949.patch, Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758328#action_12758328 ] Raghu Angadi commented on PIG-949: -- Yan, please include the test case in the patch. Also I would suggest a regular name for the test case file something like 'TestMapAcrossMultipleCGs.java' or something shorter. Inside the file you could mention JIRA number in the comment. Raghu. > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Attachments: Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758074#action_12758074 ] Yan Zhou commented on PIG-949: -- The test case is contrib/zebra/src/test/org/apache/hadoop/zebra/io/TestJira949.java > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Attachments: Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757996#action_12757996 ] Hadoop QA commented on PIG-949: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420202/Pig_949.patch against trunk revision 816832. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/40/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/40/console This message is automatically generated. > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Attachments: Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755094#action_12755094 ] Yan Zhou commented on PIG-949: -- The problem is caused by not adding "ColumnMappingEntry"s from the key-split specs in storage info to an explicitly specified MAP item in storage info, thus causing missing CGs as needed by the key-split specs. Everything falls apart thereafter. Will create a patch for R1 patch release soon. > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug > Environment: linux >Reporter: Alok Singh > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754353#action_12754353 ] Jing Huang commented on PIG-949: Thanks Alok. I am able to reproduce the problem. I was only using i/o layer (not pig loader) to test map split. This is what I did: final static String STR_SCHEMA = "m1:map(string),m2:map(map(int))"; final static String STR_STORAGE = "[m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]"; ...create table and insert data .. load: String projection = new String("m1#{a}"); I only got null returned. Without storage hint [m1], everything works fine. , i.e. final static String STR_STORAGE = "[m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}]"; ...create table and insert data .. load: String projection = new String("m1#{a}"); I am able to get value m1#{a}. Zebra team is working on the fix. > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug > Environment: linux >Reporter: Alok Singh > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.