[ 
https://issues.apache.org/jira/browse/PIG-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110019#comment-15110019
 ] 

liyunzhang_intel commented on PIG-4783:
---------------------------------------

In PIG-4783.patch:
*why calling SparkCompiler#connectSoftLink in SparkLauncher?*

Find TestScalarAliasesLocal fails when refactoring code:
scalar.pig
{code}
A = load './scalar_input.txt';
scalar = load './scalar.txt';
B = foreach A generate 5 /scalar.$1;
store B into './scalar.out';
{code}

SparkPlan:
{code}
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-25
B: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/scalar.out:org.apache.pig.builtin.PigStorage)
 - scope-23
|
|---B: New For Each(false)[bag] - scope-22
    |   |
    |   Divide[int] - scope-21
    |   |
    |   |---Constant(5) - scope-16
    |   |
    |   |---Cast[int] - scope-20
    |       |
    |       |---POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[bytearray] 
- scope-19
    |           |
    |           |---Constant(1) - scope-17
    |           |
    |           
|---Constant(hdfs://zly1.sh.intel.com:8020/tmp/temp-500166940/tmp-528164685) - 
scope-18
    |
    |---A: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/scalar_input.txt:org.apache.pig.builtin.PigStorage)
 - scope-13--------

Spark node scope-24
scalar: 
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-500166940/tmp-528164685:org.apache.pig.impl.io.InterStorage)
 - scope-15
|
|---scalar: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/scalar.txt:org.apache.pig.builtin.PigStorage)
 - scope-14--------
{code}

Spark node(scope-24) should be executed before Spark node(scope-25) because 
scope-25 needs the scalar file(scope-15) generated in scope-24.
SparkCompiler#connectSoftLink builds a dependency relationship between two 
SparkOperators in this kind of case.


> Refactor SparkLauncher for spark engine
> ---------------------------------------
>
>                 Key: PIG-4783
>                 URL: https://issues.apache.org/jira/browse/PIG-4783
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4783.patch
>
>
> Currently, the code of SparkLauncher is too big. We can put some function 
> which  executes the spark plan and collects job statistics to other class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to