[ https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai resolved PIG-1605. ----------------------------- Hadoop Flags: [Reviewed] Resolution: Fixed Release audit warning is due to jdiff. No new file added. Patch committed to both trunk and 0.8 branch. > Adding soft link to plan to solve input file dependency > ------------------------------------------------------- > > Key: PIG-1605 > URL: https://issues.apache.org/jira/browse/PIG-1605 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.8.0 > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1605-1.patch, PIG-1605-2.patch > > > In scalar implementation, we need to deal with implicit dependencies. > [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve > the problem by adding a LOScalar operator. Here is a different approach. We > will add a soft link to the plan, and soft link is only visible to the > walkers. By doing this, we can make sure we visit LOStore which generate > scalar first, and then LOForEach which use the scalar. All other part of the > logical plan does not know the existence of the soft link. The benefits are: > 1. Logical plan do not need to deal with LOScalar, this makes logical plan > cleaner > 2. Conceptually scalar dependency is different. Regular link represent a data > flow in pipeline. In scalar, the dependency means an operator depends on a > file generated by the other operator. It's different type of data dependency. > 3. Soft link can solve other dependency problem in the future. If we > introduce another UDF dependent on a file generated by another operator, we > can use this mechanism to solve it. > 4. With soft link, we can use scalar come from different sources in the same > statement, which in my mind is not a rare use case. (eg: D = foreach C > generate c0/A.total, c1/B.count; ) > Currently, there are two cases we can use soft link: > 1. scalar dependency, where ReadScalar UDF will use a file generate by a > LOStore > 2. store-load dependency, where we will load a file which is generated by a > store in the same script. This happens in multi-store case. Currently we > solve it by regular link. It is better to use a soft link. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.