[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Attachment: PIG-1605-1.patch

 Adding soft link to plan to solve input file dependency
 ---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1605-1.patch


 In scalar implementation, we need to deal with implicit dependencies. 
 [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
 the problem by adding a LOScalar operator. Here is a different approach. We 
 will add a soft link to the plan, and soft link is only visible to the 
 walkers. By doing this, we can make sure we visit LOStore which generate 
 scalar first, and then LOForEach which use the scalar. All other part of the 
 logical plan does not know the existence of the soft link. The benefits are:
 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
 cleaner
 2. Conceptually scalar dependency is different. Regular link represent a data 
 flow in pipeline. In scalar, the dependency means an operator depends on a 
 file generated by the other operator. It's different type of data dependency.
 3. Soft link can solve other dependency problem in the future. If we 
 introduce another UDF dependent on a file generated by another operator, we 
 can use this mechanism to solve it. 
 4. With soft link, we can use scalar come from different sources in the same 
 statement, which in my mind is not a rare use case. (eg: D = foreach C 
 generate c0/A.total, c1/B.count; )
 Currently, there are two cases we can use soft link:
 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
 LOStore
 2. store-load dependency, where we will load a file which is generated by a 
 store in the same script. This happens in multi-store case. Currently we 
 solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Attachment: PIG-1605-2.patch

PIG-1605-2.patch fix findbug warnings.

test-patch result:
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 455 release 
audit warnings (more than the trunk's current 453 warning
s).

 Adding soft link to plan to solve input file dependency
 ---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1605-1.patch, PIG-1605-2.patch


 In scalar implementation, we need to deal with implicit dependencies. 
 [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
 the problem by adding a LOScalar operator. Here is a different approach. We 
 will add a soft link to the plan, and soft link is only visible to the 
 walkers. By doing this, we can make sure we visit LOStore which generate 
 scalar first, and then LOForEach which use the scalar. All other part of the 
 logical plan does not know the existence of the soft link. The benefits are:
 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
 cleaner
 2. Conceptually scalar dependency is different. Regular link represent a data 
 flow in pipeline. In scalar, the dependency means an operator depends on a 
 file generated by the other operator. It's different type of data dependency.
 3. Soft link can solve other dependency problem in the future. If we 
 introduce another UDF dependent on a file generated by another operator, we 
 can use this mechanism to solve it. 
 4. With soft link, we can use scalar come from different sources in the same 
 statement, which in my mind is not a rare use case. (eg: D = foreach C 
 generate c0/A.total, c1/B.count; )
 Currently, there are two cases we can use soft link:
 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
 LOStore
 2. store-load dependency, where we will load a file which is generated by a 
 store in the same script. This happens in multi-store case. Currently we 
 solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Description: 
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 
4. With soft link, we can use scalar come from different sources in the same 
statement, which in my mind is not a rare use case. (eg: D = foreach C generate 
c0/A.total, c1/B.count;)

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.

  was:
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.


 Adding soft link to plan to solve input file dependency
 ---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 In scalar implementation, we need to deal with implicit dependencies. 
 [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
 the problem by adding a LOScalar operator. Here is a different approach. We 
 will add a soft link to the plan, and soft link is only visible to the 
 walkers. By doing this, we can make sure we visit LOStore which generate 
 scalar first, and then LOForEach which use the scalar. All other part of the 
 logical plan does not know the existence of the soft link. The benefits are:
 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
 cleaner
 2. Conceptually scalar dependency is different. Regular link represent a data 
 flow in pipeline. In scalar, the dependency means an operator depends on a 
 file generated by the other operator. It's different type of data dependency.
 3. Soft link can solve other dependency problem in the future. If we 
 introduce another UDF dependent on a file generated by another operator, we 
 can use this mechanism to solve it. 
 4. With soft link, we can use scalar come from different sources in the same 
 statement, which in my mind is not a rare use case. (eg: D = foreach C 
 generate c0/A.total, c1/B.count;)
 Currently, there are two cases we can use soft link:
 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
 LOStore
 2. store-load dependency, where we will load a file which is 

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Description: 
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 
4. With soft link, we can use scalar come from different sources in the same 
statement, which in my mind is not a rare use case. (eg: D = foreach C generate 
c0/A.total, c1/B.count; )

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.

  was:
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 
4. With soft link, we can use scalar come from different sources in the same 
statement, which in my mind is not a rare use case. (eg: D = foreach C generate 
c0/A.total, c1/B.count;)

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.


 Adding soft link to plan to solve input file dependency
 ---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 In scalar implementation, we need to deal with implicit dependencies. 
 [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
 the problem by adding a LOScalar operator. Here is a different approach. We 
 will add a soft link to the plan, and soft link is only visible to the 
 walkers. By doing this, we can make sure we visit LOStore which generate 
 scalar first, and then LOForEach which use the scalar. All other part of the 
 logical plan does not know the existence of the soft link. The benefits are:
 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
 cleaner
 2. Conceptually scalar dependency is different. Regular link represent a data 
 flow in pipeline. In scalar, the dependency means an operator depends on a 
 file generated by the other operator. It's different type of data dependency.
 3. Soft link can solve other dependency problem in the future. If we 
 introduce another UDF dependent on a file generated by another operator, we 
 can use this mechanism to solve it. 
 4. With soft link, we can use scalar come from different sources in the same 
 statement, which in my mind is not a rare use case. (eg: D = foreach C 
 generate c0/A.total, c1/B.count; )
 Currently, there 

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Description: 
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.

  was:
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
All other part of the logical plan does not know the existence of the soft 
link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.


 Adding soft link to plan to solve input file dependency
 ---

 Key: PIG-1605
 URL: https://issues.apache.org/jira/browse/PIG-1605
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 In scalar implementation, we need to deal with implicit dependencies. 
 [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
 the problem by adding a LOScalar operator. Here is a different approach. We 
 will add a soft link to the plan, and soft link is only visible to the 
 walkers. By doing this, we can make sure we visit LOStore which generate 
 scalar first, and then LOForEach which use the scalar. All other part of the 
 logical plan does not know the existence of the soft link. The benefits are:
 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
 cleaner
 2. Conceptually scalar dependency is different. Regular link represent a data 
 flow in pipeline. In scalar, the dependency means an operator depends on a 
 file generated by the other operator. It's different type of data dependency.
 3. Soft link can solve other dependency problem in the future. If we 
 introduce another UDF dependent on a file generated by another operator, we 
 can use this mechanism to solve it. 
 Currently, there are two cases we can use soft link:
 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
 LOStore
 2. store-load dependency, where we will load a file which is generated by a 
 store in the same script. This happens in multi-store case. Currently we 
 solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.