Incorrect data generated by diff of SUM
---------------------------------------
Key: PIG-1525
URL: https://issues.apache.org/jira/browse/PIG-1525
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
Fix For: 0.8.0
Given data;
input1:
{code}
id9 0
{code}
input2:
{code}
id8 1
id9 1
{code}
Pig script
{code}
A = LOAD 'input1' AS (id:chararray, val:long);
B = LOAD 'input2' AS (id:chararray, val:long);
C = COGROUP A BY id, B BY id;
D = FOREACH C GENERATE group, SUM(B.val), SUM(A.val), (SUM(A.val) - SUM(B.val));
dump D;
{code}
generates incorrect data:
{code}
(id8,1L,,)
(id9,1L,0L,-2L)
{code}
The workaround is to replace the FOREACH statement with
{code}
D = FOREACH C GENERATE group, SUM(B.val) as b, SUM(A.val) as a;
E = FOREACH D GENERATE $0, b, a, (a-b);
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.