[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909007#action_12909007
 ] 

Daniel Dai commented on PIG-1605:
---------------------------------

Changes are reasonably small. Here is a summary:
1. Add the following methods to the plan (both old and new):
{code}
public void createSoftLink(E from, E to)
public List<E> getSoftLinkPredecessors(E op)
public List<E> getSoftLinkSuccessors(E op)
{code}

2. All walkers need to change. When walker get predecessors/successors, it need 
to get both soft/regular link predecessors. The changes are straight forward, eg
from:
{code}
Collection<O> newSuccessors = mPlan.getSuccessors(suc);
{code}
to:
{code}
Collection<O> newSuccessors = mPlan.getSuccessors(suc);
newSuccessors.addAll(mPlan.getSoftLinkSuccessors(suc));
{code}

3. Change plan utility functions, such as replace, replaceAndAddSucessors, 
replaceAndAddPredecessors, etc
In new logical plan, there is no change since we only have minimum utility 
functions. In old logical plan, there should be some change to make those 
utility functions aware of soft link, but if we decide not support old logical 
plan going forward, no change needed, only need to note those utility functions 
does not deal with soft link within the function.

4. Change scalar to use soft link
This include creating soft link, maintaining soft link when doing transform 
(migrating to new plan, translating to physical plan). 

5. Change store-load to use soft link
This is an optional step. Currently we use regular link, conceptually we shall 
use soft link. It is Ok if we don't do this for now.

Also note in most cases, there is no soft link, the plan will behave just like 
before, so this change should be safe enough.

> Adding soft link to plan to solve input file dependency
> -------------------------------------------------------
>
>                 Key: PIG-1605
>                 URL: https://issues.apache.org/jira/browse/PIG-1605
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to