[
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai resolved PIG-1605.
-----------------------------
Hadoop Flags: [Reviewed]
Resolution: Fixed
Release audit warning is due to jdiff. No new file added. Patch committed to
both trunk and 0.8 branch.
> Adding soft link to plan to solve input file dependency
> -------------------------------------------------------
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1605-1.patch, PIG-1605-2.patch
>
>
> In scalar implementation, we need to deal with implicit dependencies.
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve
> the problem by adding a LOScalar operator. Here is a different approach. We
> will add a soft link to the plan, and soft link is only visible to the
> walkers. By doing this, we can make sure we visit LOStore which generate
> scalar first, and then LOForEach which use the scalar. All other part of the
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data
> flow in pipeline. In scalar, the dependency means an operator depends on a
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we
> introduce another UDF dependent on a file generated by another operator, we
> can use this mechanism to solve it.
> 4. With soft link, we can use scalar come from different sources in the same
> statement, which in my mind is not a rare use case. (eg: D = foreach C
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a
> store in the same script. This happens in multi-store case. Currently we
> solve it by regular link. It is better to use a soft link.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.