[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759789#action_12759789
 ] 

Raghu Angadi commented on PIG-949:
--

I just committed this. Thanks Yan for the fix and Jing for the test!

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758535#action_12758535
 ] 

Hadoop QA commented on PIG-949:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420313/Pig_949.patch
  against trunk revision 817739.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/console

This message is automatically generated.

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.4.0, 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758387#action_12758387
 ] 

Yan Zhou commented on PIG-949:
--

Test case added.

Thanks,

Yan



> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758328#action_12758328
 ] 

Raghu Angadi commented on PIG-949:
--

Yan, please include the test case in the patch. 

Also I would suggest a regular name for the test case file something like 
'TestMapAcrossMultipleCGs.java' or something shorter. Inside the file you could 
mention JIRA number in the comment.

Raghu.

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-21 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758074#action_12758074
 ] 

Yan Zhou commented on PIG-949:
--

The test case is 

contrib/zebra/src/test/org/apache/hadoop/zebra/io/TestJira949.java

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757996#action_12757996
 ] 

Hadoop QA commented on PIG-949:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420202/Pig_949.patch
  against trunk revision 816832.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/40/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/40/console

This message is automatically generated.

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-14 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755094#action_12755094
 ] 

Yan Zhou commented on PIG-949:
--

The problem is caused by not adding "ColumnMappingEntry"s from the key-split 
specs in storage info to an  explicitly specified MAP item in storage info, 
thus causing missing CGs as needed by the key-split specs. Everything falls 
apart thereafter. Will create a patch for R1 patch release soon.

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
> Environment: linux
>Reporter: Alok Singh
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-11 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754353#action_12754353
 ] 

Jing Huang commented on PIG-949:


Thanks Alok. 
I am able to reproduce the problem. 
I was only using i/o layer (not pig loader) to test map split. 
This is what I did:
  final static String STR_SCHEMA = "m1:map(string),m2:map(map(int))";
  final static String STR_STORAGE = "[m1#{a}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1]";
...create table and insert data ..

load:  String projection = new String("m1#{a}");

I only got null returned. 



Without storage hint [m1], everything works fine. , i.e. 
 final static String STR_STORAGE = "[m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}]";
 ...create table and insert data ..
load:  String projection = new String("m1#{a}");
I am able to get value m1#{a}. 

Zebra team is working on the fix.



> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
> Environment: linux
>Reporter: Alok Singh
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.