[ 
https://issues.apache.org/jira/browse/PIG-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke resolved PIG-4857.
----------------------------
    Resolution: Fixed

already fixed in PIG-4876

> Last record is missing in STREAM operator
> -----------------------------------------
>
>                 Key: PIG-4857
>                 URL: https://issues.apache.org/jira/browse/PIG-4857
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Xianda Ke
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4857.patch
>
>
> This bug is similar to PIG-4842.
> Scenario:
> {code}
> cat input.txt
> 1
> 1
> 2
> {code}
> Pig script:
> {code}
> REGISTER myudfs.jar;
> A = LOAD 'input.txt' USING myudfs.DummyCollectableLoader() AS (id); 
> B = GROUP A by $0 USING 'collected';    -- (1, {(1),(1)}), (2,{(2)})
> C = STREAM B THROUGH ` awk '{
>      print $0;
> }'`;
> DUMP C;
> {code}
> Expected Result:
> {code}
> (1,{(1),(1)})
> (2,{(2)})
> {code}
> Actual Result:
> {code}
> (1,{(1),(1)})
> {code}
> The last record is missing...
> Root Cause:
> When the flag endOfAllInput was set as true by the predecessor,  the 
> predecessor buffers the last record which is the input of Stream.   Then 
> POStream find endOfAllInput is true, in fact, the last input is not consumed 
> yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to