[
https://issues.apache.org/jira/browse/PIG-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110019#comment-15110019
]
liyunzhang_intel commented on PIG-4783:
---------------------------------------
In PIG-4783.patch:
*why calling SparkCompiler#connectSoftLink in SparkLauncher?*
Find TestScalarAliasesLocal fails when refactoring code:
scalar.pig
{code}
A = load './scalar_input.txt';
scalar = load './scalar.txt';
B = foreach A generate 5 /scalar.$1;
store B into './scalar.out';
{code}
SparkPlan:
{code}
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-25
B:
Store(hdfs://zly1.sh.intel.com:8020/user/root/scalar.out:org.apache.pig.builtin.PigStorage)
- scope-23
|
|---B: New For Each(false)[bag] - scope-22
| |
| Divide[int] - scope-21
| |
| |---Constant(5) - scope-16
| |
| |---Cast[int] - scope-20
| |
| |---POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[bytearray]
- scope-19
| |
| |---Constant(1) - scope-17
| |
|
|---Constant(hdfs://zly1.sh.intel.com:8020/tmp/temp-500166940/tmp-528164685) -
scope-18
|
|---A:
Load(hdfs://zly1.sh.intel.com:8020/user/root/scalar_input.txt:org.apache.pig.builtin.PigStorage)
- scope-13--------
Spark node scope-24
scalar:
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-500166940/tmp-528164685:org.apache.pig.impl.io.InterStorage)
- scope-15
|
|---scalar:
Load(hdfs://zly1.sh.intel.com:8020/user/root/scalar.txt:org.apache.pig.builtin.PigStorage)
- scope-14--------
{code}
Spark node(scope-24) should be executed before Spark node(scope-25) because
scope-25 needs the scalar file(scope-15) generated in scope-24.
SparkCompiler#connectSoftLink builds a dependency relationship between two
SparkOperators in this kind of case.
> Refactor SparkLauncher for spark engine
> ---------------------------------------
>
> Key: PIG-4783
> URL: https://issues.apache.org/jira/browse/PIG-4783
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4783.patch
>
>
> Currently, the code of SparkLauncher is too big. We can put some function
> which executes the spark plan and collects job statistics to other class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)