[ 
https://issues.apache.org/jira/browse/PIG-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516163#comment-14516163
 ] 

liyunzhang_intel commented on PIG-4522:
---------------------------------------

after using PIG-4522.patch:
SparkPlan of the pig script in the jira description is:
{code}
#-----------------------------------------------------#
#The Spark node relations are:
#-----------------------------------------------------#
scope-17->scope-18 
scope-18
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-17
A: New For Each(false,false,false)[bag] - scope-10
|   |
|   Cast[int] - scope-2
|   |
|   |---Project[bytearray][0] - scope-1
|   |
|   Cast[int] - scope-5
|   |
|   |---Project[bytearray][1] - scope-4
|   |
|   Cast[int] - scope-8
|   |
|   |---Project[bytearray][2] - scope-7
|
|---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage)
 - scope-0--------

Spark node scope-18
x: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-16
|
|---x: Filter[bag] - scope-12
    |   |
    |   Less Than[boolean] - scope-15
    |   |
    |   |---Project[int][0] - scope-13
    |   |
    |   |---Constant(7) - scope-14--------

        #-----------------------------------------------------#
#The Spark node relations are:
#-----------------------------------------------------#
scope-36->scope-37 
scope-37
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-36
A: New For Each(false,false,false)[bag] - scope-29
|   |
|   Cast[int] - scope-21
|   |
|   |---Project[bytearray][0] - scope-20
|   |
|   Cast[int] - scope-24
|   |
|   |---Project[bytearray][1] - scope-23
|   |
|   Cast[int] - scope-27
|   |
|   |---Project[bytearray][2] - scope-26
|
|---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage)
 - scope-19--------

Spark node scope-37
y: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-35
|
|---y: Filter[bag] - scope-31
    |   |
    |   Equal To[boolean] - scope-34
    |   |
    |   |---Project[int][1] - scope-32
    |   |
    |   |---Constant(5) - scope-33--------
#-----------------------------------------------------#
#The Spark node relations are:
#-----------------------------------------------------#
scope-59->scope-60 
scope-60
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-59
A: New For Each(false,false,false)[bag] - scope-48
|   |
|   Cast[int] - scope-40
|   |
|   |---Project[bytearray][0] - scope-39
|   |
|   Cast[int] - scope-43
|   |
|   |---Project[bytearray][1] - scope-42
|   |
|   Cast[int] - scope-46
|   |
|   |---Project[bytearray][2] - scope-45
|
|---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage)
 - scope-38--------

Spark node scope-60
z: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-58
|
|---z: Filter[bag] - scope-50
    |   |
    |   Or[boolean] - scope-57
    |   |
    |   |---Less Than[boolean] - scope-53
    |   |   |
    |   |   |---Project[int][2] - scope-51
    |   |   |
    |   |   |---Constant(6) - scope-52
    |   |
    |   |---Greater Than[boolean] - scope-56
    |       |
    |       |---Project[int][2] - scope-54
    |       |
    |       |---Constant(6) - scope-55--------

{code}


> Remove unnecessary store and load when POSplit is encounted
> -----------------------------------------------------------
>
>                 Key: PIG-4522
>                 URL: https://issues.apache.org/jira/browse/PIG-4522
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4522.patch
>
>
> pig script:
> {code}
> A = load './testSplit.txt' as (f1:int, f2:int,f3:int);
> split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6);
> store x into './testSplit_x.out';
> store y into './testSplit_y.out';
> store z into './testSplit_z.out';
> explain x; 
> explain y;
> explain z;
> {code}
> spark plan:
> {code}
> #The Spark node relations are:
> #-----------------------------------------------------#
> scope-17->scope-20 
> scope-20
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-17
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage)
>  - scope-18
> |
> |---A: New For Each(false,false,false)[bag] - scope-10
>     |   |
>     |   Cast[int] - scope-2
>     |   |
>     |   |---Project[bytearray][0] - scope-1
>     |   |
>     |   Cast[int] - scope-5
>     |   |
>     |   |---Project[bytearray][1] - scope-4
>     |   |
>     |   Cast[int] - scope-8
>     |   |
>     |   |---Project[bytearray][2] - scope-7
>     |
>     |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage)
>  - scope-0--------
> Spark node scope-20
> x: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-16
> |
> |---x: Filter[bag] - scope-12
>     |   |
>     |   Less Than[boolean] - scope-15
>     |   |
>     |   |---Project[int][0] - scope-13
>     |   |
>     |   |---Constant(7) - scope-14
>     |
>     
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage)
>  - scope-19--------
> #-----------------------------------------------------#
> #The Spark node relations are:
> #-----------------------------------------------------#
> scope-38->scope-41 
> scope-41
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-38
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage)
>  - scope-39
> |
> |---A: New For Each(false,false,false)[bag] - scope-31
>     |   |
>     |   Cast[int] - scope-23
>     |   |
>     |   |---Project[bytearray][0] - scope-22
>     |   |
>     |   Cast[int] - scope-26
>     |   |
>     |   |---Project[bytearray][1] - scope-25
>     |   |
>     |   Cast[int] - scope-29
>     |   |
>     |   |---Project[bytearray][2] - scope-28
>     |
>     |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage)
>  - scope-21--------
> Spark node scope-41
> y: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-37
> |
> |---y: Filter[bag] - scope-33
>     |   |
>     |   Equal To[boolean] - scope-36
>     |   |
>     |   |---Project[int][1] - scope-34
>     |   |
>     |   |---Constant(5) - scope-35
>     |
>     
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage)
>  - scope-40--------
> #-----------------------------------------------------#
> #The Spark node relations are:
> #-----------------------------------------------------#
> scope-63->scope-66 
> scope-66
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-63
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage)
>  - scope-64
> |
> |---A: New For Each(false,false,false)[bag] - scope-52
>     |   |
>     |   Cast[int] - scope-44
>     |   |
>     |   |---Project[bytearray][0] - scope-43
>     |   |
>     |   Cast[int] - scope-47
>     |   |
>     |   |---Project[bytearray][1] - scope-46
>     |   |
>     |   Cast[int] - scope-50
>     |   |
>     |   |---Project[bytearray][2] - scope-49
>     |
>     |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage)
>  - scope-42--------
> Spark node scope-66
> z: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-62
> |
> |---z: Filter[bag] - scope-54
>     |   |
>     |   Or[boolean] - scope-61
>     |   |
>     |   |---Less Than[boolean] - scope-57
>     |   |   |
>     |   |   |---Project[int][2] - scope-55
>     |   |   |
>     |   |   |---Constant(6) - scope-56
>     |   |
>     |   |---Greater Than[boolean] - scope-60
>     |       |
>     |       |---Project[int][2] - scope-58
>     |       |
>     |       |---Constant(6) - scope-59
>     |
>     
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage)
>  - scope-65--------
> {code}
> Scope-18(Store) and Scope-19(Load)  is not necessary. It should be removed.  
> Scope-39(Store) and Scope-40(Load)  is not necessary. It should be removed.  
> Scope-64(Store) and Scope-65(Load) is not necessary. It should be removed.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to