[ 
https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771390#action_12771390
 ] 

Ankur commented on PIG-1060:
----------------------------

Here's a sample script to illustrate the issue. Note that sample data isn't 
very important here since the optimization and execution fail. 
=== test.pig ====

data = LOAD 'dummy' as (name:chararray, freq:int);

filter1 = FILTER data BY freq < 5;
group1 = GROUP filter1 BY name;
proj1 = FOREACH group1 GENERATE FLATTEN(group), 'string1', SUM(filter1.freq);

filter2 = FILTER data by freq > 5;
group2 = GROUP filter2 BY name;
proj2 = FOREACH group2 GENERATE FLATTEN(group), 'string2', SUM(filter2.freq);

filter3 = FILTER filter2 by freq < 10;
group3 = GROUP filter3 By name;
proj3 = FOREACH group3 GENERATE FLATTEN(group), 'string3', SUM(filter3.freq);

filter4 = FILTER filter3 by freq > 7;
group4 = GROUP filter4 By name;
proj4 = FOREACH group4 GENERATE FLATTEN(group), 'string4', SUM(filter4.freq);

M1 = LIMIT proj1 10;
M2 = LIMIT proj2 10;
M3 = LIMIT proj3 10;
M4 = LIMIT proj4 10;

U = UNION M1, M2, M3, M4;

STORE U INTO 'res' USING PigStorage();

The dot output can dumped via command - "explain -dot -script test.pig;" to 
visualize the scenario.
A surprising observation is that despite turning MultiQuery off using -M, it 
seems that the MultiQuery optimizer is still runs and fails the script.




> MultiQuery optimization throws error for multi-level splits
> -----------------------------------------------------------
>
>                 Key: PIG-1060
>                 URL: https://issues.apache.org/jira/browse/PIG-1060
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Ankur
>
> Consider the following scenario :-
> 1. Multi-level splits in the map plan.
> 2. Each split branch further progressing across a local-global rearrange.
> 3. Output of each of these finally merged via a UNION.
> MultiQuery optimizer throws the following error in such a case:
> "ERROR 2146: Internal Error. Inconsistency in key index found during 
> optimization."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to