[
https://issues.apache.org/jira/browse/PIG-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4269:
----------------------------------
Attachment: PIG-4269_Jekins.png
PIG-4269.patch
After using PIG-4269.patch, unit tests about TestAccumulator pass except
testAccumWithSort,testAccumWithDistinct and testAccumAfterNestedOp(see
PIG-4269_Jekins.png]
The reason why testAccumWithSort,testAccumWithDistinct and
testAccumAfterNestedOp fail is:
TestAccumulator#testAccumAfterNestedOp
pig script:
{code}
A = load '" + INPUT_FILE1 + "' as (id:int, fruit);
B = group A by id;
C = foreach B
{ o = order A by id;
generate org.apache.pig.test.utils.AccumulatorBagCount(o);
};
{code}
in Spark:
{code}
C: Store(hdfs://localhost:52502/tmp/temp827450292/tmp-1280786869:org
.apache.pig.impl.io.InterStorage) - scope-17
|
|---C: New For Each(false)[bag] - scope-16
| |
| POUserFunc(org.apache.pig.test.utils.AccumulatorBagCount)[int] -
scope-12
| |
| |---RelationToExpressionProject[bag][*] - scope-11
| |
| |---o: POSort[bag]() - scope-15
| | |
| | Project[int][0] - scope-14
| |
| |---Project[bag][1] - scope-13
|
|---B: Package(Packager)[tuple]{int} - scope-8
|
|---B: Global Rearrange[tuple] - scope-7
B: Local Rearrange[tuple]{int}(false) - scope-9
| |
| Project[int][0] - scope-10
|
|---A: New For Each(false,false)[bag] - scope-6
| |
| Cast[int] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Project[bytearray][1] - scope-4
|
|---A:
Load(hdfs://localhost:52502/user/root/AccumulatorInput1.txt:org.apache.pig.builtin.PigStorage)
- scope-0
{code}
in MR:
{code}
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-18
Map Plan
B: Local Rearrange[tuple]{int}(false) - scope-9
| |
| Project[int][0] - scope-10
|
|---A: New For Each(false,false)[bag] - scope-6
| |
| Cast[int] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Project[bytearray][1] - scope-4
|
|---A:
Load(hdfs://localhost:40299/user/root/AccumulatorInput1.txt:org.apache.pig.builtin.PigStorage)
- scope-0--------
Reduce Plan
C:
Store(hdfs://localhost:40299/tmp/temp-493016342/tmp-1209478651:org.apache.pig.impl.io.InterStorage)
- scope-17
|
|---C: New For Each(false)[bag] - scope-16
| |
| POUserFunc(org.apache.pig.test.utils.AccumulatorBagCount)[int] -
scope-12
| |
| |---RelationToExpressionProject[bag][*] - scope-11
| |
| |---Project[bag][1] - scope-13
|
|---B: Package(Packager)[tuple]{int} - scope-8--------
Global sort: false
----------------
{code}
In spark mode, the pig script fails because it generates POSort in
sparkplan while in MR mode POSort is not generated. If POSort is genereated, it
will throw Exception "Caught error from UDF:
org.apache.pig.test.utils.AccumulatorBagCount [exec() should not be called.] "
> Enable unit test "TestAccumulator" for spark
> --------------------------------------------
>
> Key: PIG-4269
> URL: https://issues.apache.org/jira/browse/PIG-4269
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4269.patch, PIG-4269_Jekins.png,
> TEST-org.apache.pig.test.TestAccumulator.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)