[ 
https://issues.apache.org/jira/browse/PIG-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-2721:
------------------------------

    Attachment: pig-2721-trunk-notestyet.patch

Taking the logical plan when used with -t ColumnMapKeyPrune

{noformat}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
C: (Name: LOStore Schema: id#11:chararray,bttype#14:chararray,cat#15:long)
|
|---B: (Name: LOForEach Schema: id#11:chararray,bttype#14:chararray,cat#15:long)
    |   |
    |   (Name: LOGenerate[false,true] Schema: 
id#11:chararray,bttype#14:chararray,cat#15:long)
    |   |   |
    |   |   id:(Name: Project Type: chararray Uid: 11 Input: 0 Column: (*))
    |   |   |
    |   |   mybag:(Name: Project Type: bag Uid: 12 Input: 1 Column: (*))   
<==*HERE1*
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: id#11:chararray)
    |   |
    |   |---mybag: (Name: LOInnerLoad[1] Schema: 
bttype#14:chararray,cat#15:long)
    |
    |---C: (Name: LOSort Schema: 
id#11:chararray,bttype#14:chararray,cat#15:long) <==*HERE2*
        |   |
        |   id:(Name: Project Type: chararray Uid: 11 Input: 0 Column: 0)
        |
        |---A: (Name: LOForEach Schema: 
id#11:chararray,mybag#12:bag{#18:tuple(bttype#14:chararray,cat#15:long)})
            |   |
            |   (Name: LOGenerate[false,false] Schema: 
id#11:chararray,mybag#12:bag{#18:tuple(bttype#14:chararray,cat#15:long)})
            |   |   |
            |   |   (Name: Cast Type: chararray Uid: 11)
            |   |   |
            |   |   |---id:(Name: Project Type: bytearray Uid: 11 Input: 0 
Column: (*))
            |   |   |
            |   |   (Name: Cast Type: bag Uid: 12)
            |   |   |
            |   |   |---mybag:(Name: Project Type: bytearray Uid: 12 Input: 1 
Column: (*))
            |   |
            |   |---(Name: LOInnerLoad[0] Schema: id#11:bytearray)
            |   |
            |   |---(Name: LOInnerLoad[1] Schema: mybag#12:bytearray)
            |
            |---A: (Name: LOLoad Schema: 
id#11:bytearray,mybag#12:bytearray)RequiredFields:null

{noformat}


Tracing the ColumnPrune*, use of 'Uid:12' gets lost at first LOGenerate 
(*HERE*) part when its projection refers to LOSort (*HERE2*) and checks the 
schema.

Looking further, LOSort schema was not getting updated by SchemaPatcher when 
PushDownForEachFlatten swapped ForEach and Sort.  This was due to 
PushDownForEachFlatten.reportChange() not passing the changes it made.

I believe attached patch fixes this issue.
Confirmed logical plan now changes to 
{noformat}
$ diff /tmp/before /tmp/after
18c18
<     |---C: (Name: LOSort Schema: 
id#11:chararray,bttype#14:chararray,cat#15:long)
---
>     |---C: (Name: LOSort Schema: 
> id#11:chararray,mybag#12:bag{#18:tuple(bttype#14:chararray,cat#15:long)})
{noformat}

and it produces the correct output.
                
> Wrong output generated while loading bags as input
> --------------------------------------------------
>
>                 Key: PIG-2721
>                 URL: https://issues.apache.org/jira/browse/PIG-2721
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.2
>            Reporter: Vivek Padmanabhan
>         Attachments: pig-2721-trunk-notestyet.patch
>
>
> {code}
> A = LOAD '/user/pvivek/sample' as 
> (id:chararray,mybag:bag{tuple(bttype:chararray,cat:long)});
> B = foreach A generate id,FLATTEN(mybag) AS (bttype, cat);
> C = order B by id;
> dump C;
> {code}
> The above code generates wrong results when executed with Pig 0.10 and Pig 0.9
> The below is the sample input;
> {code}
> ...LKGaHqg--  {(aa,806743)}
> ..0MI1Y37w--  {(aa,498970)}
> ..0bnlpJrw--  {(aa,806740)}
> ..0p0IIhbA--  {(aa,498971),(se,498995)}
> ..1VkGqvXA--  {(aa,805219)}
> {code}
> I think the Pig optimizers are causing this issue.From the logs I can see 
> that the $1 is pruned for the relation A.
> [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - 
> Columns pruned for A: $1
> One workaround for this is to disable -t ColumnMapKeyPrune.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to