[
https://issues.apache.org/jira/browse/PIG-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498166#comment-13498166
]
Koji Noguchi commented on PIG-3051:
-----------------------------------
Sorry, After ColumnMapKeyPrune, I pasted the wrong one. Here's the one after
Pruning.
{noformat}
U1: (Name: LOStore Schema:
sortCol#1871:int,label#1872:chararray,cnt#1870:long)ColumnPrune:InputUids=[1870,
1871, 1872]ColumnPrune:OutputUids=[1870, 1871, 1872]
|
|---U1: (Name: LOForEach Schema:
sortCol#1871:int,label#1872:chararray,cnt#1870:long)ColumnPrune:InputUids=[1870]ColumnPrune:OutputUids=[1870,
1871, 1872]
| |
| (Name: LOGenerate[false,false,false] Schema:
sortCol#1871:int,label#1872:chararray,cnt#1870:long)ColumnPrune:InputUids=[1870]ColumnPrune:OutputUids=[1870,
1871, 1872]
| | |
| | (Name: Constant Type: int Uid: 1871)
| | |
| | (Name: Constant Type: chararray Uid: 1872)
| | |
| | cnt:(Name: Project Type: long Uid: 1870 Input: 0 Column: (*))
| |
| |---(Name: LOInnerLoad[0] Schema: cnt#1870:long)
|
|---(Name: LOSort Schema: cnt#1870:long)ColumnPrune:InputUids=[1865,
1870]ColumnPrune:OutputUids=[1870]
| | *****HERE*****
| cnt:(Name: Project Type: long Uid: 1865 Input: 0 Column: ***2***)
|
|---G4: (Name: LOSplitOutput Schema:
cnt#1870:long)ColumnPrune:InputUids=[1865]ColumnPrune:OutputUids=[1870]
| |
| (Name: Constant Type: boolean Uid: 1867)
|
|---(Name: LOForEach Schema: cnt#1865:long)
| |
| (Name: LOGenerate[false] Schema: cnt#1865:long)
| | |
| | cnt:(Name: Project Type: long Uid: 1865 Input: 0
Column: (*))
| |
| |---(Name: LOInnerLoad[2] Schema: cnt#1865:long)
|
|---G4: (Name: LOSplit Schema:
sortCol#1864:int,label#1857:chararray,cnt#1865:long)ColumnPrune:InputUids=[1864,
1865, 1857]ColumnPrune:OutputUids=[1864, 1865, 1857]
|
|---G4: (Name: LOSort Schema:
sortCol#1864:int,label#1857:chararray,cnt#1865:long)ColumnPrune:InputUids=[1864,
1865, 1857]ColumnPrune:OutputUids=[1864, 1865, 1857]
| |
| cnt:(Name: Project Type: long Uid: 1865 Input: 0
Column: 2)
|
|---G3: (Name: LOForEach Schema:
sortCol#1864:int,label#1857:chararray,cnt#1865:long)ColumnPrune:InputUids=[1857,
1862]ColumnPrune:OutputUids=[1864, 1865, 1857]
{noformat}
So I believe the new LOSort introduced by the LimitOptimizer has the projection
pointing to the previous LOSOrt which breaks when columns are pruned and column
index is not being updated.
> java.lang.IndexOutOfBoundsException failure with LimitOptimizer +
> ColumnPruning
> --------------------------------------------------------------------------------
>
> Key: PIG-3051
> URL: https://issues.apache.org/jira/browse/PIG-3051
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10.0, 0.11
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
>
> Had a user hitting
> "Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1" error
> when he had multiple stores and limit in his code.
> I couldn't reproduce this with short pig code (due to ColumnPruning somehow
> not happening when shortened), but here's a snippet.
> {noformat}
> ...
> G3 = FOREACH G2 GENERATE sortCol, FLATTEN(group) as label, (long)COUNT(G1) as
> cnt;
> G4 = ORDER G3 BY cnt DESC PARALLEL 25;
> ONEROW = LIMIT G4 1;
> U1 = FOREACH ONEROW GENERATE 3 as sortcol, 'somelabel' as label, cnt;
> store U1 into 'u1' using PigStorage();
> store G4 into 'g4' using PigStorage();
> {noformat}
> With '-t ColumnMapKeyPrune', job didn't hit the error.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira