[ https://issues.apache.org/jira/browse/PIG-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499345#comment-13499345 ]
Koji Noguchi commented on PIG-3051: ----------------------------------- bq. I couldn't reproduce this with short pig code (due to ColumnPruning somehow not happening when shortened), Learned that columnprune does not kick in unless there is column-or-map to prune inside load. (even though columnprune does more than just pruning at the load part.) By adding one extra line to force columnpruning, i was able to reproduce this issue. First example hitting IndexOutOfBoundsException and second one producing incorrect result. {noformat} % cat test/pig-3051-1.pig A = load 'a.txt' using PigStorage() as (a1:chararray, a2:chararray, a3:chararray, a4:chararray); B = foreach A generate a2,a3,a4; --to force columnprune algo to cover G = order B by a4; U1 = limit G 3; U2 = foreach U1 generate a4; store G into 'g' using PigStorage(); store U2 into 'u2' using PigStorage(); % cat a.txt 1 2 3 4 2 3 4 1 3 4 1 2 4 1 2 3 % pig -x local test/pig-3051-1.pig ... fails with Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 {noformat} Now adding extra 2 columns, job finishes but result incorrect. {noformat} % cat test/pig-3051-2.pig A = load 'b.txt' using PigStorage() as (a1:chararray, a2:chararray, a3:chararray, a4:chararray, a5:chararray, a6:chararray); B = foreach A generate a2,a3,a4,a5,a6; --to force columnprune algo to cover G = order B by a4; U1 = limit G 4; U2 = foreach U1 generate a4,a5,a6; store G into 'g' using PigStorage(); store U2 into 'u2' using PigStorage(); % cat b.txt 1 2 3 4 5 6 2 3 4 5 6 1 3 4 5 6 1 2 4 5 6 1 2 3 5 6 1 2 3 4 6 1 2 3 4 5 % pig -x local test/pig-3051-2.pig ... success % cat u2/part-r-00000 5 6 1 6 1 2 1 2 3 2 3 4 {noformat} And last, taking out store G (to take out LOSplit). This produces a correct output. {noformat} % cat test/pig-3051-3.pig A = load 'b.txt' using PigStorage() as (a1:chararray, a2:chararray, a3:chararray, a4:chararray, a5:chararray, a6:chararray); B = foreach A generate a2,a3,a4,a5,a6; --to force columnprune algo to cover G = order B by a4; U1 = limit G 4; U2 = foreach U1 generate a4,a5,a6; --store G into 'g' using PigStorage(); store U2 into 'u2' using PigStorage(); % pig -x local test/pig-3051-3.pig ... Success. % cat u2/part-r-00000 1 2 3 2 3 4 3 4 5 4 5 6 % {noformat} Also tested the patch(pig-3051-v1-withouttest.txt) and it does fix the incorrect result case. > java.lang.IndexOutOfBoundsException failure with LimitOptimizer + > ColumnPruning > -------------------------------------------------------------------------------- > > Key: PIG-3051 > URL: https://issues.apache.org/jira/browse/PIG-3051 > Project: Pig > Issue Type: Bug > Components: parser > Affects Versions: 0.10.0, 0.11 > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Attachments: pig-3051-v1-withouttest.txt > > > Had a user hitting > "Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1" error > when he had multiple stores and limit in his code. > I couldn't reproduce this with short pig code (due to ColumnPruning somehow > not happening when shortened), but here's a snippet. > {noformat} > ... > G3 = FOREACH G2 GENERATE sortCol, FLATTEN(group) as label, (long)COUNT(G1) as > cnt; > G4 = ORDER G3 BY cnt DESC PARALLEL 25; > ONEROW = LIMIT G4 1; > U1 = FOREACH ONEROW GENERATE 3 as sortcol, 'somelabel' as label, cnt; > store U1 into 'u1' using PigStorage(); > store G4 into 'g4' using PigStorage(); > {noformat} > With '-t ColumnMapKeyPrune', job didn't hit the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira