[ 
https://issues.apache.org/jira/browse/PIG-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591176#comment-14591176
 ] 

liyunzhang_intel commented on PIG-4607:
---------------------------------------

Let's make an example to explain why unit tests about TestRank1, TestRank3 fail:

rank01RowNumber.pig:
{code}
A = LOAD './rank01RowNumber.txt' AS (f1:chararray,f2:int,f3:chararray);
C = rank A;
store C into './rank01RowNumber.out';
{code}

cat rank01RowNumber.txt:
{code}
A       1       N
B       2       N
C       3       M
D       4       P
E       4       Q
E       4       Q
F       8       Q
F       7       Q
F       8       T
F       8       Q
G       10      V
{code}

the physical plan is :
{code}
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
C: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-13
|
|---C: PORank[tuple] - scope-12
    |
    |---C: POCounter[tuple] - scope-11
        |
        |---A: New For Each(false,false,false)[bag] - scope-10
            |   |
            |   Cast[chararray] - scope-2
            |   |
            |   |---Project[bytearray][0] - scope-1
            |   |
            |   Cast[int] - scope-5
            |   |
            |   |---Project[bytearray][1] - scope-4
            |   |
            |   Cast[chararray] - scope-8
            |   |
            |   |---Project[bytearray][2] - scope-7
            |
            |---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/rank01RowNumber.txt:org.apache.pig.builtin.PigStorage)
 - scope-0
{code}

The spark plan is:
{code}
scope-14
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-14
C: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-13
|
|---A: New For Each(false,false,false)[bag] - scope-10
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[int] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[chararray] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |
    |---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/rank01RowNumber.txt:org.apache.pig.builtin.PigStorage)
 - scope-0--------
{code}

The root cause is "POCounter" and "PORank" are missing in the spark plan.

> Enable "TestRank1","TestRank3" unit tests in spark mode
> -------------------------------------------------------
>
>                 Key: PIG-4607
>                 URL: https://issues.apache.org/jira/browse/PIG-4607
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: kexianda
>             Fix For: spark-branch
>
>
>  In https://builds.apache.org/job/Pig-spark/216/#showFailuresLink, unit tests 
> about TestRank1, TestRank3:
> org.apache.pig.test.TestRank1.testRank02RowNumber
> org.apache.pig.test.TestRank1.testRank01RowNumber
> org.apache.pig.test.TestRank3.testRankWithSplitInMap
> org.apache.pig.test.TestRank3.testRankWithSplitInReduce
> org.apache.pig.test.TestRank3.testRankCascade



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to