[
https://issues.apache.org/jira/browse/PIG-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014254#comment-14014254
]
Koji Noguchi commented on PIG-3975:
-----------------------------------
With the script from the description, job DAG looked like below.
{noformat}
Job DAG:
job_1399356417814_189120 ->
job_1399356417814_189121,job_1399356417814_189122,
job_1399356417814_189121 -> job_1399356417814_189123,
job_1399356417814_189123
job_1399356417814_189122
{noformat}
Looking at the plan, I see that even though job_1399356417814_189122 and
job_1399356417814_189123 read from output of job_1399356417814_189121, somehow
job_1399356417814_189122 is missing that dependency.
{noformat}
==============================================================
job_1399356417814_189120
pig.inputs ================
hdfs:/aaa.bbb.ccc:8020/user/knoguchi/input1:org.apache.pig.builtin.PigStorage
pig.mapPlan=================
B: Local Rearrange[tuple]{int}(false) - scope-8
| |
| Project[int][0] - scope-9
|
|---A: New For Each(false)[bag] - scope-5
| |
| Cast[int] - scope-3
| |
| |---Project[bytearray][0] - scope-2
pig.reducePlan=================
Empty Plan!
pig.reduce.stores=================
[(Name: B:
Store(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-113052755:org.apache.pig.impl.io.TFileStorage)
- scope-10
==============================================================
job_1399356417814_189121
pig.inputs ================
hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-113052755:org.apache.pig.impl.io.TFileStorage
pig.mapPlan=================
Empty Plan!
pig.map.stores=================
[(Name:
Store(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-690789368:org.apache.pig.impl.io.InterStorage)
- scope-25 Operator Key: scope-25)]
pig.reducePlan=================
null
pig.reduce.stores=================
[]
==============================================================
job_1399356417814_189122
pig.inputs ================
hdfs://aaa.bbb.ccc:8020/user/knoguchi/input3:org.apache.pig.builtin.PigStorage
pig.mapPlan=================
F: New For Each(false)[bag] - scope-20
| |
| POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-19
| |
| |---Constant(0) - scope-17
| |
| |---Constant(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-690789368) -
scope-18
pig.map.stores=================
[(Name: F: Store(/tmp/deletemeF:org.apache.pig.builtin.PigStorage) - scope-21
Operator Key: scope-21)]
pig.reducePlan=================
null
pig.reduce.stores=================
[]
==============================================================
job_1399356417814_189123
pig.inputs ================
hdfs://aaa.bbb.ccc:8020/user/knoguchi/input2.txt:org.apache.pig.builtin.PigStorage
pig.mapPlan=================
D: New For Each(false)[bag] - scope-14
| |
| POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
| |
| |---Constant(0) - scope-11
| |
| |---Constant(hdfs://aaa.bbb.ccc:8020/tmp/temp-912971374/tmp-690789368) -
scope-12
pig.map.stores=================
[(Name: D: Store(/tmp/deletemeD:org.apache.pig.builtin.PigStorage) - scope-15
Operator Key: scope-15)]
pig.reducePlan=================
null
pig.reduce.stores=================
[]
{noformat}
> Multiple Scalar reference calls leading to missing records
> ----------------------------------------------------------
>
> Key: PIG-3975
> URL: https://issues.apache.org/jira/browse/PIG-3975
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1, 0.9.2, 0.10.1, 0.11.1, 0.12.2
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Critical
>
> We noticed that multiple pig runs with same input were producing different
> outputs.
> Simplified script looked like this.
> {noformat}
> A = load 'input1' as (a1:int);
> B = group A by a1 parallel 200;
> C = load 'input2' as (c1:int);
> D = foreach C generate B.$0;
> store D into '/tmp/deletemeD';
> E = load 'input3' as (c1:int);
> F = foreach E generate B.$0;
> store F into '/tmp/deletemeF';
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)