[jira] Updated: (PIG-1562) Fix the version for the dependent packages for the maven

2010-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1562:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch. Thanks Niraj!.

> Fix the version for the dependent packages for the maven 
> -
>
> Key: PIG-1562
> URL: https://issues.apache.org/jira/browse/PIG-1562
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1562_1.patch, PIG-1562_2.patch, PIG_1562_0.patch
>
>
> We need to fix the set version so that, version is properly set for the 
> dependent packages in the maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908886#action_12908886
 ] 

Daniel Dai commented on PIG-1608:
-

pig should include pig-default.properties into pig.jar, but not pig.properties, 
just like hadoop does for core-default.xml, core-site.xml.

> pig should always include pig-default.properties and pig.properties in the 
> pig.jar
> --
>
> Key: PIG-1608
> URL: https://issues.apache.org/jira/browse/PIG-1608
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: niraj rai
>
> pig should always include pig-default.properties and pig.properties as a part 
> of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Description: 
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 
4. With soft link, we can use scalar come from different sources in the same 
statement, which in my mind is not a rare use case. (eg: D = foreach C generate 
c0/A.total, c1/B.count;)

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.

  was:
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.


> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count;)
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where

[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Description: 
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 
4. With soft link, we can use scalar come from different sources in the same 
statement, which in my mind is not a rare use case. (eg: D = foreach C generate 
c0/A.total, c1/B.count; )

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.

  was:
In scalar implementation, we need to deal with implicit dependencies. 
[PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
the problem by adding a LOScalar operator. Here is a different approach. We 
will add a soft link to the plan, and soft link is only visible to the walkers. 
By doing this, we can make sure we visit LOStore which generate scalar first, 
and then LOForEach which use the scalar. All other part of the logical plan 
does not know the existence of the soft link. The benefits are:

1. Logical plan do not need to deal with LOScalar, this makes logical plan 
cleaner
2. Conceptually scalar dependency is different. Regular link represent a data 
flow in pipeline. In scalar, the dependency means an operator depends on a file 
generated by the other operator. It's different type of data dependency.
3. Soft link can solve other dependency problem in the future. If we introduce 
another UDF dependent on a file generated by another operator, we can use this 
mechanism to solve it. 
4. With soft link, we can use scalar come from different sources in the same 
statement, which in my mind is not a rare use case. (eg: D = foreach C generate 
c0/A.total, c1/B.count;)

Currently, there are two cases we can use soft link:
1. scalar dependency, where ReadScalar UDF will use a file generate by a LOStore
2. store-load dependency, where we will load a file which is generated by a 
store in the same script. This happens in multi-store case. Currently we solve 
it by regular link. It is better to use a soft link.


> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1

[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908900#action_12908900
 ] 

Thejas M Nair commented on PIG-1605:


bq. 4. With soft link, we can use scalar come from different sources in the 
same statement, which in my mind is not a rare use case. (eg: D = foreach C 
generate c0/A.total, c1/B.count; )
This works with LOScalar as well. For above example, there will be two LOScalar 
operators preceding the LogicalOperator for D. It will be like - 

{code}
C -> LOScalar -> LOScalar -> D .
 A _^ B_^
{code}


> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908906#action_12908906
 ] 

Thejas M Nair commented on PIG-1605:


I think the first three benefits mentioned here are good reason to go for this 
approach instead of LOScalar.  


> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-366:
---

Assignee: Robert Gibbon  (was: Daniel Dai)

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908926#action_12908926
 ] 

Yan Zhou commented on PIG-366:
--

Robert, first, thanks for your effort to pick up this feature.

You mentioned in your 09/08 Comment that you "stripped back" a lot of 
functionality and focused on the script editor.  I'm wondering if it is 
possible to add your fixes/improvements on top of Shubham's patch. 
Specifically, I'm interested in the example generator use in PigPen, which 
seems to absent from your patches. FYI, I'm currently working on improving and 
enhancing the example generator left over by Shubham about 2 years ago.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908931#action_12908931
 ] 

Alan Gates commented on PIG-366:


Robert,

This looks great.  A couple of questions.  

# The most reasonable place to put this would be under contrib, since it's 
really a standalone tool for use with Pig.  Does that seem reasonable?
# As far as I know you're the only person working on this at the moment.  Do 
you see yourself continuing to work on it for a while?  If so, then I think 
this is a great contribution and we're happy to support your work on it.  If 
it's a one off thing I'm less inclined to check it in as it will rot again (as 
it did from 0.2 on) and not be useful for users.

And a question to the rest of the Pig community.  Any volunteers out there to 
take this for a drive around the block and see how it does?  Feedback from 
power Eclipse users would be particularly valuable.

A few nitpicks about the patch itself:

# We use 4 spaces rather than tabs in Pig code, so the files will have to be 
reformatted to match that.
# One file (MessageRunner.java) is missing the Apache license header.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908937#action_12908937
 ] 

Alan Gates commented on PIG-1605:
-

How in depth are the changes to the graphing package and the walkers to handle 
this different type of edge in the graph?

> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1589) add test cases for mapreduce operator which use distributed cache

2010-09-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1589:
---

Status: Patch Available  (was: Open)

> add test cases for mapreduce operator which use distributed cache
> -
>
> Key: PIG-1589
> URL: https://issues.apache.org/jira/browse/PIG-1589
> Project: Pig
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1589.1.patch, TestWordCount.jar
>
>
> '-files filename' can be specified in the parameters for mapreduce operator 
> to send files to distributed cache. Need to add test cases for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1589) add test cases for mapreduce operator which use distributed cache

2010-09-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1589:
---

Attachment: TestWordCount.jar
PIG-1589.1.patch

Attachment TestWordCount.jar should go to 
test/org/apache/pig/test/data/TestWordCount.jar
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 11 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> add test cases for mapreduce operator which use distributed cache
> -
>
> Key: PIG-1589
> URL: https://issues.apache.org/jira/browse/PIG-1589
> Project: Pig
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1589.1.patch, TestWordCount.jar
>
>
> '-files filename' can be specified in the parameters for mapreduce operator 
> to send files to distributed cache. Need to add test cases for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Robert Gibbon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908948#action_12908948
 ] 

Robert Gibbon commented on PIG-366:
---

Yan Zhou: I plan to reintroduce all the original features incrementally. Are 
you working on the example generator as part of the Pig server backend? I can 
hook the plugin in to a backend API or let you take over that part of the 
plugin. Let me know what is best for you.

Alan: 

contrib seems a sensible place. I would like to integrate it with the Apache 
build chain, via ivy. What are your thoughts on that?
I'm happy to run with this for as long as it is needed and useful.
It would be remiss of me not to mention that I built this on Eclipse Helios. It 
needs a bit of work to get it hooked up to Galileo, which I also plan to do.

The minors: Noted - always happy to comply with coding standards. Also it needs 
to be reformatted as a .patch, if I'm not mistaken? I'll rerelease asap

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908962#action_12908962
 ] 

Yan Zhou commented on PIG-366:
--

Yes. But the original patch by Shubham had hooked the plugin to the example 
generator interface unless you will have found something funky in that patch. I 
have no intention to change the interface.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1609) 'union onschema' should give a more useful error message when schema of one of the relations has null column name

2010-09-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1609:
---

Attachment: PIG-1609.1.patch

Pasting result of test patch for PIG-1609.1.patch
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> 'union onschema' should give a more useful error message when schema of one 
> of the relations has null column name
> -
>
> Key: PIG-1609
> URL: https://issues.apache.org/jira/browse/PIG-1609
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1609.1.patch
>
>
> A better error message needs to be given in this case -
> {code}
> grunt> l = load '/tmp/empty.bag' as (i : int);
> grunt> f = foreach l generate i+1;
> grunt> describe f;
> f: {int}
> grunt> u = union onschema l , f;
> 2010-09-10 18:08:13,000 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Error merging
> schemas for union operator
> Details at logfile: /Users/tejas/pig_nmr_syn/trunk/pig_1284167020897.log
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908971#action_12908971
 ] 

Yan Zhou commented on PIG-366:
--

One more clearification: by design example generator does not submit any jobs 
to hadoop, it just runs at the client as a local application.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1609) 'union onschema' should give a more useful error message when schema of one of the relations has null column name

2010-09-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1609:
---

Status: Patch Available  (was: Open)

> 'union onschema' should give a more useful error message when schema of one 
> of the relations has null column name
> -
>
> Key: PIG-1609
> URL: https://issues.apache.org/jira/browse/PIG-1609
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1609.1.patch
>
>
> A better error message needs to be given in this case -
> {code}
> grunt> l = load '/tmp/empty.bag' as (i : int);
> grunt> f = foreach l generate i+1;
> grunt> describe f;
> f: {int}
> grunt> u = union onschema l , f;
> 2010-09-10 18:08:13,000 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Error merging
> schemas for union operator
> Details at logfile: /Users/tejas/pig_nmr_syn/trunk/pig_1284167020897.log
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908984#action_12908984
 ] 

Olga Natkovich commented on PIG-366:


I think it used to use "true local mode in pig". However, we no longer support 
this and the new version need to be connected to the current local mode in pig 
which is basically hadoop's local mode

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-13 Thread Robert Gibbon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908993#action_12908993
 ] 

Robert Gibbon commented on PIG-366:
---

Ok I will implement a classloader to avoid hardwiring the plugin to a specific 
release. I'll do same for the parser feature. I made a new diff to fix the 
formatting but it is probably less useful than a tarball. I'll upload it in the 
morning (I'm on dialup)

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
> Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909008#action_12909008
 ] 

Daniel Dai commented on PIG-1605:
-

Yes, Thejas is right. The first 3 are the main reasons for the change.

> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909007#action_12909007
 ] 

Daniel Dai commented on PIG-1605:
-

Changes are reasonably small. Here is a summary:
1. Add the following methods to the plan (both old and new):
{code}
public void createSoftLink(E from, E to)
public List getSoftLinkPredecessors(E op)
public List getSoftLinkSuccessors(E op)
{code}

2. All walkers need to change. When walker get predecessors/successors, it need 
to get both soft/regular link predecessors. The changes are straight forward, eg
from:
{code}
Collection newSuccessors = mPlan.getSuccessors(suc);
{code}
to:
{code}
Collection newSuccessors = mPlan.getSuccessors(suc);
newSuccessors.addAll(mPlan.getSoftLinkSuccessors(suc));
{code}

3. Change plan utility functions, such as replace, replaceAndAddSucessors, 
replaceAndAddPredecessors, etc
In new logical plan, there is no change since we only have minimum utility 
functions. In old logical plan, there should be some change to make those 
utility functions aware of soft link, but if we decide not support old logical 
plan going forward, no change needed, only need to note those utility functions 
does not deal with soft link within the function.

4. Change scalar to use soft link
This include creating soft link, maintaining soft link when doing transform 
(migrating to new plan, translating to physical plan). 

5. Change store-load to use soft link
This is an optional step. Currently we use regular link, conceptually we shall 
use soft link. It is Ok if we don't do this for now.

Also note in most cases, there is no soft link, the plan will behave just like 
before, so this change should be safe enough.

> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-542) pig gets confused about schema, when joining a table that has a known schema with one that doesn't

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-542.


Resolution: Cannot Reproduce

I tested this against version 0.7 and it works fine.

> pig gets confused about schema, when joining a table that has a known schema 
> with one that doesn't
> --
>
> Key: PIG-542
> URL: https://issues.apache.org/jira/browse/PIG-542
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: types branch, running in local mode
>Reporter: Christopher Olston
> Fix For: 0.9.0
>
>
> query:
> A = load '/data/A' using myLoadFunc('...');
> A1 = foreach (group A by ($8)) generate group, COUNT($1);
> B = load '/data/B';
> J = join A1 by $0, B by $0;
> J1 = foreach J generate $0, $1, $3;<- crashes on attempt to parse 
> this line.
> problem:
> It knows the schema of A1 but not of B -- but it seems to think B has only
> one field.
> error message (on parsing J1=... line):
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Out of
> bound access. Trying to access non-existent column: 3. Schema {ID10::group:
> bytearray,long,bytearray} has 3 column(s).
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.DollarVar(QueryParser.ja
> va:5764)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.ja
> va:5713)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser
> .java:4018)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.ja
> va:3915)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.jav
> a:3869)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Query
> Parser.java:3778)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser
> .java:3704)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.ja
> va:3670)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(Qu
> eryParser.java:3596)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemLis
> t(QueryParser.java:3519)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryP
> arser.java:3463)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.
> java:2939)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParse
> r.java:2342)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.jav
> a:979)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:75
> 5)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:5
> 50)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder
> .java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
> ... 16 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-542) pig gets confused about schema, when joining a table that has a known schema with one that doesn't

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-542:
---

Fix Version/s: 0.7.0
   (was: 0.9.0)

> pig gets confused about schema, when joining a table that has a known schema 
> with one that doesn't
> --
>
> Key: PIG-542
> URL: https://issues.apache.org/jira/browse/PIG-542
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: types branch, running in local mode
>Reporter: Christopher Olston
> Fix For: 0.7.0
>
>
> query:
> A = load '/data/A' using myLoadFunc('...');
> A1 = foreach (group A by ($8)) generate group, COUNT($1);
> B = load '/data/B';
> J = join A1 by $0, B by $0;
> J1 = foreach J generate $0, $1, $3;<- crashes on attempt to parse 
> this line.
> problem:
> It knows the schema of A1 but not of B -- but it seems to think B has only
> one field.
> error message (on parsing J1=... line):
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Out of
> bound access. Trying to access non-existent column: 3. Schema {ID10::group:
> bytearray,long,bytearray} has 3 column(s).
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.DollarVar(QueryParser.ja
> va:5764)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.ja
> va:5713)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser
> .java:4018)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.ja
> va:3915)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.jav
> a:3869)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Query
> Parser.java:3778)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser
> .java:3704)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.ja
> va:3670)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(Qu
> eryParser.java:3596)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemLis
> t(QueryParser.java:3519)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryP
> arser.java:3463)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.
> java:2939)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParse
> r.java:2342)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.jav
> a:979)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:75
> 5)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:5
> 50)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder
> .java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
> ... 16 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-542) pig gets confused about schema, when joining a table that has a known schema with one that doesn't

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates closed PIG-542.
--


> pig gets confused about schema, when joining a table that has a known schema 
> with one that doesn't
> --
>
> Key: PIG-542
> URL: https://issues.apache.org/jira/browse/PIG-542
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: types branch, running in local mode
>Reporter: Christopher Olston
> Fix For: 0.7.0
>
>
> query:
> A = load '/data/A' using myLoadFunc('...');
> A1 = foreach (group A by ($8)) generate group, COUNT($1);
> B = load '/data/B';
> J = join A1 by $0, B by $0;
> J1 = foreach J generate $0, $1, $3;<- crashes on attempt to parse 
> this line.
> problem:
> It knows the schema of A1 but not of B -- but it seems to think B has only
> one field.
> error message (on parsing J1=... line):
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Out of
> bound access. Trying to access non-existent column: 3. Schema {ID10::group:
> bytearray,long,bytearray} has 3 column(s).
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.DollarVar(QueryParser.ja
> va:5764)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.ja
> va:5713)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser
> .java:4018)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.ja
> va:3915)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.jav
> a:3869)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Query
> Parser.java:3778)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser
> .java:3704)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.ja
> va:3670)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(Qu
> eryParser.java:3596)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemLis
> t(QueryParser.java:3519)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryP
> arser.java:3463)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.
> java:2939)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParse
> r.java:2342)
> at
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.jav
> a:979)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:75
> 5)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:5
> 50)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder
> .java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
> ... 16 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-696) Fatal error produced when malformed scalar types within complex type is converted to given type

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-696:
--

Assignee: Alan Gates

> Fatal error produced when malformed scalar types within complex type is 
> converted to given type
> ---
>
> Key: PIG-696
> URL: https://issues.apache.org/jira/browse/PIG-696
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Instead of fatal error, the failed conversions should result in null values.
> Example -
> grunt > cat cbag3.dat
> {(asdf)}
> {(2344)}
> {(2344}
> {(323423423423434)}
> {(323423423423434L)}
> {(asdff)}
> grunt> A = load 'cbag3.dat' as (f1:bag{t:tuple(i:int)});  B = foreach A 
> generate flatten(f1);  C = foreach B generate $0 + 1; dump C;
> 2009-03-03 14:25:19,604 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-03-03 14:25:44,628 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2009-03-03 14:25:44,642 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2043: Unexpected error during execution.
> Details at logfile: /d1/tejas/pig_1236118410343.log
> tail  /d1/tejas/pig_1236118410343.log
>   Caused by: java.lang.ClassCastException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:110)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:260)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:198)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> The 'conversion' of scalar types in complex types is happening in the 
> physicaloperators, and not in the loaders. The expressions (such as Add in 
> example) attempts to cast input to given type, and ClassCastException is 
> thrown when conversion fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-694) Schema merge should take into account bags with tuples and bags with schemas

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-694:
--

Assignee: Alan Gates  (was: Santhosh Srinivasan)

> Schema merge should take into account bags with tuples and bags with schemas
> 
>
> Key: PIG-694
> URL: https://issues.apache.org/jira/browse/PIG-694
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> The merge method in Schema does not treat bags with schemas and bags with 
> tuples as equivalent. This will bring closure to PIG-448 and PIG-577.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-698) Simple join fails on records not loaded with schema

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-698.


Fix Version/s: 0.7.0
   (was: 0.9.0)
   Resolution: Fixed

Tested this against version 0.7 and it works fine.

> Simple join fails on records not loaded with schema
> ---
>
> Key: PIG-698
> URL: https://issues.apache.org/jira/browse/PIG-698
> Project: Pig
>  Issue Type: Bug
>  Components: impl
> Environment: Yahoo! clusters.
>Reporter: Peter Arthur Ciccolo
> Fix For: 0.7.0
>
>
> Joins can fail with an out-of-bounds access to fields that are not referenced 
> in the script when records without schema (including all variable-length 
> records) are involved.
> Example by Ben Reed:
> i1:
> 1   c   D   E
> 1   a   B
> i2:
> 0
> 0   Q
> 1   x   z
> 1   a   b   c
> i1 = load 'i1';   
>   
>
> i2 = load 'i2';   
>   
>
> j = join i1 by $0, i2 by $0;  
>   
>
> dump j

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-698) Simple join fails on records not loaded with schema

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates closed PIG-698.
--


> Simple join fails on records not loaded with schema
> ---
>
> Key: PIG-698
> URL: https://issues.apache.org/jira/browse/PIG-698
> Project: Pig
>  Issue Type: Bug
>  Components: impl
> Environment: Yahoo! clusters.
>Reporter: Peter Arthur Ciccolo
> Fix For: 0.7.0
>
>
> Joins can fail with an out-of-bounds access to fields that are not referenced 
> in the script when records without schema (including all variable-length 
> records) are involved.
> Example by Ben Reed:
> i1:
> 1   c   D   E
> 1   a   B
> i2:
> 0
> 0   Q
> 1   x   z
> 1   a   b   c
> i1 = load 'i1';   
>   
>
> i2 = load 'i2';   
>   
>
> j = join i1 by $0, i2 by $0;  
>   
>
> dump j

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-723) Pig generates incorrect schema for generated bags after FOREACH.

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-723:
--

Assignee: Alan Gates

> Pig generates incorrect schema for generated bags after FOREACH.
> 
>
> Key: PIG-723
> URL: https://issues.apache.org/jira/browse/PIG-723
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.1.0
> Environment: Linux
> $pig --version
> Apache Pig version 0.1.0-dev (r750430)
> compiled Mar 07 2009, 09:20:13
>Reporter: Dhruv M
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> grunt> rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, 
> rhs:chararray, r:float, p:float, c:float);
> grunt> rf_grouped = GROUP rf_src BY rhs;  
> 
> grunt> lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, 
> r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
> grunt> describe lhs_grouped;
> lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}
> I think it should be:
> lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: 
> float}
> Because of this, we are not able to perform UNION on 2 sets because union on 
> incompatible schemas is causing a complete loss of schema information, making 
> further processing impossible.
> This is what we want to UNION with:
> grunt> asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, 
> a:int);
> grunt> aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as 
> lhs, -10F as p, -10F as c;
> grunt> describe aa;
> aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}
> If there is something wrong with what I am trying to do, please let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-730) problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema.

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-730:
--

Assignee: Alan Gates

> problem combining schema from a union of several LOAD expressions, with a 
> nested bag inside the schema.
> ---
>
> Key: PIG-730
> URL: https://issues.apache.org/jira/browse/PIG-730
> Project: Pig
>  Issue Type: Bug
> Environment: pig local mode
>Reporter: Christopher Olston
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> grunt> a = load 'foo' using BinStorage as 
> (url:chararray,outlinks:{t:(target:chararray,text:chararray)});
> grunt> b = union (load 'foo' using BinStorage as 
> (url:chararray,outlinks:{t:(target:chararray,text:chararray)})), (load 'bar' 
> using BinStorage as 
> (url:chararray,outlinks:{t:(target:chararray,text:chararray)}));
> grunt> c = foreach a generate flatten(outlinks.target);
> grunt> d = foreach b generate flatten(outlinks.target);
> ---> Would expect both C and D to work, but only C works. D gives the error 
> shown below.
> ---> Turns out using outlinks.t.target (instead of outlinks.target) works for 
> D but not for C.
> ---> I don't care which one, but the same syntax should work for both!
> 2009-03-24 13:15:05,376 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Invalid alias: target in {t: (target: 
> chararray,text: chararray)}
> Details at logfile: /echo/olston/data/pig_1237925683748.log
> grunt> quit
> $ cat pig_1237925683748.log 
> ERROR 1000: Error during parsing. Invalid alias: target in {t: (target: 
> chararray,text: chararray)}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Invalid alias: target in {t: (target: chararray,text: chararray)}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:317)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:276)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.main(Main.java:321)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: target in {t: (target: chararray,text: chararray)}
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6042)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5898)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BracketedSimpleProj(QueryParser.java:5423)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4100)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3967)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3920)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3829)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3755)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3721)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3617)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3557)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3514)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2985)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2395)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1028)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:804)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:595)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:310)
> ... 6 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-767) Schema reported from DESCRIBE and actual schema of inner bags are different.

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-767:
--

Assignee: Alan Gates

> Schema reported from DESCRIBE and actual schema of inner bags are different.
> 
>
> Key: PIG-767
> URL: https://issues.apache.org/jira/browse/PIG-767
> Project: Pig
>  Issue Type: Bug
>Reporter: George Mavromatis
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> The following script:
> urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, 
> pg:bytearray);
> -- describe and dump are in-sync
> DESCRIBE urlContents;
> DUMP urlContents;
> urlContentsG = GROUP urlContents BY url;
> DESCRIBE urlContentsG;
> urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;
> DESCRIBE urlContentsF;
> DUMP urlContentsF;
> Prints for the DESCRIBE commands:
> urlContents: {url: chararray,pg: chararray}
> urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
> urlContentsF: {group: chararray,pg: {pg: chararray}}
> The reported schemas for urlContentsG and urlContentsF are wrong. They are 
> also against the section "Schemas for Complex Data Types" in 
> http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.
> As expected, actual data observed from DUMP urlContentsG and DUMP 
> urlContentsF do contain the tuple inside the inner bags.
> The correct schema for urlContentsG is:  {group: chararray,urlContents: 
> {t1:(url: chararray,pg: chararray)}}
> This may sound like a technicality, but it isn't. For instance, a UDF that 
> assumes an inner bag of {chararray} will not work with {(chararray)}. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1610) 'union onschema' does handle some cases involving 'namespaced' variable names

2010-09-13 Thread Thejas M Nair (JIRA)
'union onschema' does handle some cases involving 'namespaced' variable names
-

 Key: PIG-1610
 URL: https://issues.apache.org/jira/browse/PIG-1610
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


case 1:

grunt> describe f;  
f: {l1::a: bytearray,l1::b: bytearray}
grunt> describe l1;
l1: {a: bytearray,b: bytearray}
grunt> dump f;
(1,11)
(2,22)
(3,33)

grunt> dump l1;
(1,11)
(2,22)
(3,33)

grunt> u = union onschema f, l1;
grunt> describe u;
u: {l1::a: bytearray,l1::b: bytearray}

-- the dump u gives incorrect results
grunt> dump u; 
(,)
(,)
(,)
(1,11)
(2,22)
(3,33)



case 2:
grunt> u = union onschema l1, f;
grunt> describe u;
2010-09-13 15:11:13,877 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1108: Duplicate schema alias: l1::a
Details at logfile: /Users/tejas/pig_unions_err2/trunk/pig_1284410413970.log



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-768) Schema of a relation reported by DESCRIBE and allowed operations on the relation are not compatible

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-768.


Resolution: Not A Problem

This is the way Pig is supposed to work.  If the loader or the user does not 
tell it what type a column is, it assumes that it is bytearray.  If later the 
script acts as if it is a certain type (by for example, applying the map 
dereference operator), then Pig assumes it is really of that type and casts it.

You are right that the loader would do better to return it as a bytearray and 
then cast it later when Pig asks it to.  However, since casts of a type to the 
same type work, what the loader does works out.

> Schema of a relation reported by DESCRIBE and allowed operations on the 
> relation are not compatible
> ---
>
> Key: PIG-768
> URL: https://issues.apache.org/jira/browse/PIG-768
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: George Mavromatis
> Fix For: 0.9.0
>
>
> The DESCIBE command in the following script  prints:
> {s: bytearray, pg: bytearray, wm: bytearray}
> However, the script later treats the s field of urlMap as a map instead of a 
> bytearray, as shown in s#'Url'.
> Pig does not complain about this contradiction and at execution time, the s 
> field is treated as hash, although it was reported as byterray at parse time.
> Pig should either not report s as a byterray or exit with a parsing error.
> Note that all above operations happen before the query executes at the 
> cluster.
> register WebDataProcessing.jar; 
> register opencrawl.jar; 
> urlMap = LOAD '$input' USING opencrawl.pigudf.WebDataLoader() AS (s, pg, wm);
> DESCRIBE urlMap;
> -- in fact the loader in the WebDataProcessing.jar populates s and pg as 
> s:map[], pg:bag{t1:(contents:bytearray)}
> -- and defines that in determineSchema() but pig describe ignores it!
> urlMap2 = LIMIT urlMap 20;
> urlList2 = FOREACH urlMap2 GENERATE s#'Url', pg;
> DESCRIBE urlList2;
> STORE urlList2 INTO 'output2' USING BinStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-768) Schema of a relation reported by DESCRIBE and allowed operations on the relation are not compatible

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates closed PIG-768.
--


> Schema of a relation reported by DESCRIBE and allowed operations on the 
> relation are not compatible
> ---
>
> Key: PIG-768
> URL: https://issues.apache.org/jira/browse/PIG-768
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: George Mavromatis
> Fix For: 0.9.0
>
>
> The DESCIBE command in the following script  prints:
> {s: bytearray, pg: bytearray, wm: bytearray}
> However, the script later treats the s field of urlMap as a map instead of a 
> bytearray, as shown in s#'Url'.
> Pig does not complain about this contradiction and at execution time, the s 
> field is treated as hash, although it was reported as byterray at parse time.
> Pig should either not report s as a byterray or exit with a parsing error.
> Note that all above operations happen before the query executes at the 
> cluster.
> register WebDataProcessing.jar; 
> register opencrawl.jar; 
> urlMap = LOAD '$input' USING opencrawl.pigudf.WebDataLoader() AS (s, pg, wm);
> DESCRIBE urlMap;
> -- in fact the loader in the WebDataProcessing.jar populates s and pg as 
> s:map[], pg:bag{t1:(contents:bytearray)}
> -- and defines that in determineSchema() but pig describe ignores it!
> urlMap2 = LIMIT urlMap 20;
> urlList2 = FOREACH urlMap2 GENERATE s#'Url', pg;
> DESCRIBE urlList2;
> STORE urlList2 INTO 'output2' USING BinStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1610) 'union onschema' does handle some cases involving 'namespaced' column names in schema

2010-09-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1610:
---

Summary: 'union onschema' does handle some cases involving 'namespaced' 
column names in schema  (was: 'union onschema' does handle some cases involving 
'namespaced' variable names)

> 'union onschema' does handle some cases involving 'namespaced' column names 
> in schema
> -
>
> Key: PIG-1610
> URL: https://issues.apache.org/jira/browse/PIG-1610
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> case 1:
> grunt> describe f;  
> f: {l1::a: bytearray,l1::b: bytearray}
> grunt> describe l1;
> l1: {a: bytearray,b: bytearray}
> grunt> dump f;
> (1,11)
> (2,22)
> (3,33)
> grunt> dump l1;
> (1,11)
> (2,22)
> (3,33)
> grunt> u = union onschema f, l1;
> grunt> describe u;
> u: {l1::a: bytearray,l1::b: bytearray}
> -- the dump u gives incorrect results
> grunt> dump u; 
> (,)
> (,)
> (,)
> (1,11)
> (2,22)
> (3,33)
> case 2:
> grunt> u = union onschema l1, f;
> grunt> describe u;
> 2010-09-13 15:11:13,877 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: Duplicate schema alias: l1::a
> Details at logfile: /Users/tejas/pig_unions_err2/trunk/pig_1284410413970.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-206) Right granularity for a pig script

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-206:
--

Assignee: Richard Ding

> Right granularity for a pig script
> --
>
> Key: PIG-206
> URL: https://issues.apache.org/jira/browse/PIG-206
> Project: Pig
>  Issue Type: Wish
>Reporter: Mathieu Poumeyrol
>Assignee: Richard Ding
> Fix For: 0.9.0
>
>
> I'd like to understand what people have in mind when they picture pig 
> scripts...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1542) log level not propogated to MR task loggers

2010-09-13 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1542:
---

Attachment: PIG-1542.patch

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-217) Syntax Errors

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-217:
--

Assignee: Xuefu Zhang

> Syntax Errors
> -
>
> Key: PIG-217
> URL: https://issues.apache.org/jira/browse/PIG-217
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Amir Youssefi
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> This is a sub-task for Syntax Errors and use cases for it. 
> Having PARALLEL in wrong places is confusing for many users. I just saw 
> somebody putting it after STORE. Adding it to FILTER is very common as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-239) illustrate followed by dump gives a runtime exception

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-239:
--

Assignee: Yan Zhou  (was: Shubham Chopra)

> illustrate followed by dump gives a runtime exception
> -
>
> Key: PIG-239
> URL: https://issues.apache.org/jira/browse/PIG-239
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Pradeep Kamath
>Assignee: Yan Zhou
> Fix For: 0.9.0
>
>
> Here is a session which outlines the issue:
> grunt> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, 
> age,gpa);
> grunt> b = filter a by name lt 'b';
> grunt> c = foreach b generate TOKENIZE(name);
> grunt> illustrate c;
> -
> | a | name  | age   | gpa   |
> -
> |   | tom xylophone | 69| 0.04  |
> |   | alice ovid| 75| 3.89  |
> -
> --
> | b | name   | age   | gpa   |
> --
> |   | alice ovid | 75| 3.89  |
> --
> -
> | c | (token )  |
> -
> |   | {(alice), (ovid)} |
> -
> grunt> dump c;
> 2008-05-15 14:35:54,476 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> java.lang.RuntimeException: java.io.IOException: Serialization error: 
> org.apache.pig.impl.util.
> LineageTracer
> at 
> org.apache.pig.backend.hadoop.executionengine.POMapreduce.copy(POMapreduce.java:242)
> at 
> org.apache.pig.backend.hadoop.executionengine.MapreducePlanCompiler.compile(MapreducePlanCompiler.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:209)
> at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:410)
> at org.apache.pig.PigServer.openIterator(PigServer.java:332)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:265)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:73)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> at org.apache.pig.Main.main(Main.java:270)
> Caused by: java.io.IOException: Serialization error: 
> org.apache.pig.impl.util.LineageTracer
> at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
> at 
> org.apache.pig.impl.util.ObjectSerializer.serialize(ObjectSerializer.java:44)
> at 
> org.apache.pig.backend.hadoop.executionengine.POMapreduce.copy(POMapreduce.java:233)
> ... 10 more
> Caused by: java.io.NotSerializableException: 
> org.apache.pig.impl.util.LineageTracer
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1081)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302)
> at java.util.ArrayList.writeObject(ArrayList.java:569)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:917)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1339)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302)
> at java.util.ArrayList.writeObject(ArrayList.ja

[jira] Commented: (PIG-217) Syntax Errors

2010-09-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909025#action_12909025
 ] 

Alan Gates commented on PIG-217:


Since part of the current language specification is that parallel is allowed 
after any operator, we cannot simply remove it.  We could add warnings to tell 
users when parallel isn't doing anything for them, and deprecate it's usage for 
those operators.

But even then I'm not sure how useful this is.  Some operators will sometimes 
make use of parallel and sometimes not, depending on implementation (e.g. join 
would use it for hash join, but not for merge join).  So users will always have 
to know when parallel is and isn't helping them.

> Syntax Errors
> -
>
> Key: PIG-217
> URL: https://issues.apache.org/jira/browse/PIG-217
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Amir Youssefi
> Fix For: 0.9.0
>
>
> This is a sub-task for Syntax Errors and use cases for it. 
> Having PARALLEL in wrong places is confusing for many users. I just saw 
> somebody putting it after STORE. Adding it to FILTER is very common as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-394) Syntax for ?: requires parens in FOREACH

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-394:
--

Assignee: Xuefu Zhang

> Syntax for ?: requires parens in FOREACH
> 
>
> Key: PIG-394
> URL: https://issues.apache.org/jira/browse/PIG-394
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.1.0
>Reporter: Ted Dunning
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> This fails
> clean = FOREACH log {
> ev = eventType eq '/rate/video'?'none':eventType;
> GENERATE ev as event, 1 as cnt;
> }
> but this works
> clean = FOREACH log {
> ev = (eventType eq '/rate/video'?'none':eventType);
> GENERATE ev as event, 1 as cnt;
> }
> The requirement for parens is bogus.  Also, this fails with very misleading 
> messages:
> clean = FOREACH log {
> ev = (eventType eq '/rate/video')?'none':eventType;
> GENERATE ev as event, 1 as cnt;
> }
> I think that the parser needs to be completely revamped to avoid this sort of 
> strangeness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-579) Adding newlines to format foreach statement with constants causes parse errors

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-579:
--

Assignee: Xuefu Zhang

> Adding newlines to format foreach statement with constants causes parse errors
> --
>
> Key: PIG-579
> URL: https://issues.apache.org/jira/browse/PIG-579
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The following code example files with parse errors on step D:
> {code}
> A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
> B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararray, 
> contributions: float);
> C = COGROUP A BY name, B BY name;
> D = FOREACH C GENERATE
> group,
> flatten((not IsEmpty(A) ? A : (bag{tuple(chararray, int, 
> float)}){(null, null, null)})),
> flatten((not IsEmpty(B) ? B : (bag{tuple(chararray, int, chararray, 
> float)}){(null,null,null, null)}));
> dump D;
> {code}
> I get the parse error:
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Encountered "not IsEmpty ( A ) ? A : ( bag { tuple ( chararray , int , float 
> ) } ;" at line 9, column 18.
> Was expecting one of:
> "(" ...
> "-" ...
> "tuple" ...
> "bag" ...
> "map" ...
> "int" ...
> "long" ...
> ...
> However, if I simply remove the new lines from statement D and make it:
> {code}
> D = FOREACH C GENERATE group, flatten((not IsEmpty(A) ? A : 
> (bag{tuple(chararray, int, float)}){(null, null, null)})), flatten((not 
> IsEmpty(B) ? B : (bag{tuple(chararray, int, chararray, 
> float)}){(null,null,null, null)}));
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1542) log level not propogated to MR task loggers

2010-09-13 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909027#action_12909027
 ] 

niraj rai commented on PIG-1542:


I have also changes the logic of setting the right log level, in case, the 
pig.logfile.level is passed from command line.


> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-502) Limit and Illustrate do not work together

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-502:
--

Assignee: Yan Zhou

> Limit and Illustrate do not work together
> -
>
> Key: PIG-502
> URL: https://issues.apache.org/jira/browse/PIG-502
> Project: Pig
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.2.0
> Environment: Hadoop 18
>Reporter: Viraj Bhat
>Assignee: Yan Zhou
> Fix For: 0.9.0
>
>
> Suppose a user wants to do an illustrate command after limiting his data to a 
> certain number of records, it does not seem to work..
> --
> {code}
> MYDATA = load 'testfilelarge.txt' as (f1, f2, f3, f4, f5);
> MYDATA  = limit MYDATA 10;
> describe MYDATA;
> illustrate MYDATA;
> {code}
> --
> Running this script produces the following output and error
> --
> MYDATA: {f1: bytearray,f2: bytearray,f3: bytearray,f4: bytearray,f5: 
> bytearray}
> 2008-10-18 02:14:26,900 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop fil
> e system at: hdfs://localhost:9000
> 2008-10-18 02:14:27,013 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce
>  job tracker at: localhost:9001
> java.lang.RuntimeException: Unrecognized logical operator.
> at 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(EquivalenceClasses.java:60)
> at 
> org.apache.pig.pen.DerivedDataVisitor.evaluateOperator(DerivedDataVisitor.java:368)
> at 
> org.apache.pig.pen.DerivedDataVisitor.visit(DerivedDataVisitor.java:273)
> at org.apache.pig.impl.logicalLayer.LOLimit.visit(LOLimit.java:71)
> at org.apache.pig.impl.logicalLayer.LOLimit.visit(LOLimit.java:10)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:98)
> at 
> org.apache.pig.pen.LineageTrimmingVisitor.(LineageTrimmingVisitor.java:90)
> at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:106)
> at org.apache.pig.PigServer.getExamples(PigServer.java:630)
> at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:279)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:183)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> --
> If I remove the illustrate and replace it with "dump MYDATA;"  it works..
> --
> {code}
> MYDATA = load 'testfilelarge.txt' as (f1, f2, f3, f4, f5);
> MYDATA  = limit MYDATA 10;
> describe MYDATA;
> -- illustrate MYDATA;
> dump MYDATA;
> {code}
> --

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-596) Anonymous tuples in bags create ParseExceptions

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-596:
--

Assignee: Xuefu Zhang

> Anonymous tuples in bags create ParseExceptions
> ---
>
> Key: PIG-596
> URL: https://issues.apache.org/jira/browse/PIG-596
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> {code}
> One = load 'one.txt' using PigStorage() as ( one: int );
> LabelledTupleInBag = foreach One generate { ( 1, 2 ) } as mybag { tuplelabel: 
> tuple ( a, b ) };
> AnonymousTupleInBag = foreach One generate { ( 2, 3 ) } as mybag { tuple ( a, 
> b ) }; -- Anonymous tuple creates bug
> Tuples = union LabelledTupleInBag, AnonymousTupleInBag;
> dump Tuples;
> {code}
> java.io.IOException: Encountered "{ tuple" at line 6, column 66.
> Was expecting one of:
> "parallel" ...
> ";" ...
> "," ...
> ":" ...
> "(" ...
> "{"  ...
> "{" "}" ...
> "[" ...
> 
> at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Encountered "{ tuple" at line 6, column 66.
> Why can't there be an anonymous tuple at the top level of a bag?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-618) Bad error message when period rather than comma appears as separator in UDF parameter list

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-618:
--

Assignee: Xuefu Zhang

> Bad error message when period rather than comma appears as separator in UDF 
> parameter list 
> ---
>
> Key: PIG-618
> URL: https://issues.apache.org/jira/browse/PIG-618
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Viraj Bhat
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> Pig script generates the following compile-time error as it contains a period 
> between 0.8 and 0.9 in the MYUDF parameter list. The "Invalid alias MYUDF" 
> message should be changed to something that is more meaningful for the user 
> to trace.
> {code}
> register 'MYUDF.jar';
> A = load 'mydata.txt' using PigStorage() as (
> col1:   int,
> col2:   chararray,
> col3:   long,
> col4:   int
> );
> B =  group A by (
> col1,
> col2
> );
> C = foreach B generate
> group,
> MYUDF(A.col3, 0.0, 0.8. 0.9) as stat: (min, max);
> describe C;
> {code}
> 
> java.io.IOException: Invalid alias: MYUDF in {group: (col1: int,col2: 
> chararray),A: {col1: int,col2: chararray,col
> 3: long,col4: int}}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:301)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:266)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: MYUDF in {group: (col1: int,col2
> : chararray),A: {col1: int,col2: chararray,col3: long,col4: int}}
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6005)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5863)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4049)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3946)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3900)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3809)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3735)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3701)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3627)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3550)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3494)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2969)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2384)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1019)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:795)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:590)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-709) Handling of NULL in Pig builtin functions needs to be reviewed

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-709:
--

Assignee: Alan Gates

> Handling of NULL in Pig builtin functions needs to be reviewed
> --
>
> Key: PIG-709
> URL: https://issues.apache.org/jira/browse/PIG-709
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Santhosh Srinivasan
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Pig builtin functions do not handle NULL consistently. Some examples are the 
> combiner versus non-combiner for AVG. All the builtins need a review of cases 
> where NULL is handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-731) Passing semicolon as a parameter in UDF causes parser error

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-731:
--

Assignee: Xuefu Zhang

> Passing semicolon as a parameter in UDF causes parser error 
> 
>
> Key: PIG-731
> URL: https://issues.apache.org/jira/browse/PIG-731
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: CONCATSEP.jar, semicolonerr.pig
>
>
> Pig script, which uses a UDF  loads in 3 chararray columns, and then 
> concatenates columns 2 and 3 using a semicolon.
> {code}
> register CONCATSEP.jar;
> A = LOAD 'someinput/*' USING PigStorage(';') as 
> (col1:chararray,col2:chararray,col3:chararray);
> B = FOREACH A GENERATE col1, string.CONCATSEP(';',col2,col3) as newcol;
> STORE B INTO 'someoutput' USING PigStorage(';');
> {code}
> The following script causes an error during the parsing stage due to the 
> semicolon present in the UDF.
> =
> 2009-03-24 15:50:56,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Lexical error at line 3, column 49.  Encountered: 
>  after : "\';"
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1237935055635.log
> =
> There is no workaround for the same, expect to hardcode this in the UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-679) error message suppressed due to class cast exception

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-679:
--

Assignee: Alan Gates

> error message suppressed due to class cast exception
> 
>
> Key: PIG-679
> URL: https://issues.apache.org/jira/browse/PIG-679
> Project: Pig
>  Issue Type: Bug
>Reporter: Christopher Olston
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> weekinclude 14:30:44 ~/workspace/Pig $ cat pig_1234564011522.log 
> ERROR 2999: Unexpected internal error. 
> org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to 
> java.lang.Error
> java.lang.ClassCastException: 
> org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to 
> java.lang.Error
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1096)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:802)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:595)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:303)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:269)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:441)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
> at org.apache.pig.Main.main(Main.java:296)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-534) Illustrate can't handle Map's or NULLs

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-534:
--

Assignee: Yan Zhou

> Illustrate can't handle Map's or NULLs
> --
>
> Key: PIG-534
> URL: https://issues.apache.org/jira/browse/PIG-534
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Reporter: Ian Holsman
>Assignee: Yan Zhou
> Fix For: 0.9.0
>
> Attachments: Illustrate.patch
>
>
> when I 'illustrate' a record that contains a map, or has a NULL it crashes 
> with a NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-798:
--

Assignee: Alan Gates

> Schema errors when using PigStorage and none when using BinStorage in 
> FOREACH??
> ---
>
> Key: PIG-798
> URL: https://issues.apache.org/jira/browse/PIG-798
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
> Fix For: 0.9.0
>
> Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using 
> PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, 
> url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> ===
> (Amy)
> (Fred)
> ===
> The above code work properly and returns the right results. If I use 
> PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> ===
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other 
> Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> ===
> So why should the semantics of BinStorage() be different from PigStorage() 
> where is ok not to specify a schema??? Should it not be consistent across 
> both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-838) Parser does not handle ctrl-m ('\u000d') as argument to PigStorage

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-838:
--

Assignee: Xuefu Zhang

> Parser does not handle ctrl-m ('\u000d') as argument to PigStorage
> --
>
> Key: PIG-838
> URL: https://issues.apache.org/jira/browse/PIG-838
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> An script which has 
> a = load 'input' using PigStorage('\u000d');
>  
> produces the following error:
> 2009-06-05 14:47:49,241 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Lexical error at line 1, column 47.  Encountered: 
> "\r" (13), after : "\'"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-630) provide indication that pig script only partially succeeded

2010-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-630.
--

 Assignee: Olga Natkovich
Fix Version/s: 0.8.0
   Resolution: Fixed

This jira has been fixed with MultiQuery optimization and Pig Stats.

> provide indication that pig script only partially succeeded
> ---
>
> Key: PIG-630
> URL: https://issues.apache.org/jira/browse/PIG-630
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.8.0
>
>
> Currently, if you have multiple queries (stores/dumps) within the same pig 
> script, the script return the result of the last one which does not provide 
> sufficient information to the users. We need to provide to the user the 
> following information:
> - return code that indicates the script only partioally succeeded
> - indication which parts have succeeded

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-900) ORDER BY syntax wrt parentheses is somewhat different than GROUP BY and FILTER BY

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-900:
--

Assignee: Xuefu Zhang

> ORDER BY syntax wrt parentheses is somewhat different than GROUP BY and 
> FILTER BY
> -
>
> Key: PIG-900
> URL: https://issues.apache.org/jira/browse/PIG-900
> Project: Pig
>  Issue Type: Bug
>Reporter: David Ciemiewicz
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> With GROUP BY, you must put parentheses around the aliases in the BY clause:
> {code}
> B = group A by ( a, b, c );
> {code}
> With FILTER BY, you can optionally put parentheses around the aliases in the 
> BY clause:
> {code}
> B = filter A by ( a is not null and b is not null and c is not null );
> {code}
> However, with ORDER BY, if you put parenthesis around the BY clause, you get 
> a syntax error:
> {code}
>  A = order A by ( a, b, c );
> {code}
> Produces the error:
> {code}
> 2009-08-03 18:26:29,544 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Encountered " "," ", "" at line 3, column 
> 19.
> Was expecting:
> ")" ...
> {code}
> This is an annoyance really.
> Here's my full code example ...
> {code}
> A = load 'data.txt' using PigStorage as (a: chararray, b: chararray, c: 
> chararray );
> A = order A by ( a, b, c );
> dump A;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-902) Allow schema matching for UDF with variable length arguments

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-902:
--

Assignee: Daniel Dai

> Allow schema matching for UDF with variable length arguments
> 
>
> Key: PIG-902
> URL: https://issues.apache.org/jira/browse/PIG-902
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
>
> Pig pick the right version of UDF using a similarity measurement. This 
> mechanism pick the UDF with right input schema to use. However, some UDFs use 
> various number of inputs and currently there is no way to declare such input 
> schema in UDF and similarity measurement do not match against variable number 
> of inputs. We can still write variable inputs UDF, but we cannot rely on 
> schema matching to pick the right UDF version and do the automatic data type 
> conversion.
> Eg:
> If we have:
> Integer udf1(Integer, ..);
> Integer udf1(String, ..);
> Currently we cannot do this:
> a: {chararray, chararray}
> b = foreach a generate udf1(a.$0, a.$1);  // Pig cannot pick the udf(String, 
> ..) automatically, currently, this statement fails
> Eg:
> If we have:
> Integer udf2(Integer, ..);
> Currently, this script fail
> a: {chararray, chararray}
> b = foreach a generate udf1(a.$0, a.$1);  // Currently, Pig cannot convert 
> a.$0 into Integer automatically

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-900) ORDER BY syntax wrt parentheses is somewhat different than GROUP BY and FILTER BY

2010-09-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909031#action_12909031
 ] 

Alan Gates commented on PIG-900:


Filter is different, because you are providing one boolean condition.  But I 
agree order by, group by, and join, all of which you are specifying a set of 
keys, should take the same syntax.  We cannot force a change in one version, 
but we can adopt one as standard and warn that the other is deprecated.  I 
propose we remove the parenthesis from group and join to match order, as that 
matches SQL and is less verbose.

> ORDER BY syntax wrt parentheses is somewhat different than GROUP BY and 
> FILTER BY
> -
>
> Key: PIG-900
> URL: https://issues.apache.org/jira/browse/PIG-900
> Project: Pig
>  Issue Type: Bug
>Reporter: David Ciemiewicz
> Fix For: 0.9.0
>
>
> With GROUP BY, you must put parentheses around the aliases in the BY clause:
> {code}
> B = group A by ( a, b, c );
> {code}
> With FILTER BY, you can optionally put parentheses around the aliases in the 
> BY clause:
> {code}
> B = filter A by ( a is not null and b is not null and c is not null );
> {code}
> However, with ORDER BY, if you put parenthesis around the BY clause, you get 
> a syntax error:
> {code}
>  A = order A by ( a, b, c );
> {code}
> Produces the error:
> {code}
> 2009-08-03 18:26:29,544 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Encountered " "," ", "" at line 3, column 
> 19.
> Was expecting:
> ")" ...
> {code}
> This is an annoyance really.
> Here's my full code example ...
> {code}
> A = load 'data.txt' using PigStorage as (a: chararray, b: chararray, c: 
> chararray );
> A = order A by ( a, b, c );
> dump A;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-903) ILLUSTRATE fails on 'Distinct' operator

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-903:
--

Assignee: Yan Zhou

> ILLUSTRATE fails on 'Distinct' operator
> ---
>
> Key: PIG-903
> URL: https://issues.apache.org/jira/browse/PIG-903
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Yan Zhou
> Fix For: 0.9.0
>
>
> Using the latest Pig from trunk (0.3+) in mapreduce mode, running through the 
> tutorial script script1-hadoop.pig works fine.
> However, executing the following illustrate command throws an exception:
> illustrate ngramed2
> Pig Stack Trace
> ---
> ERROR 2999: Unexpected internal error. Unrecognized logical operator.
> java.lang.RuntimeException: Unrecognized logical operator.
> at 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(EquivalenceClasses.java:60)
> at 
> org.apache.pig.pen.DerivedDataVisitor.evaluateOperator(DerivedDataVisitor.java:368)
> at 
> org.apache.pig.pen.DerivedDataVisitor.visit(DerivedDataVisitor.java:226)
> at 
> org.apache.pig.impl.logicalLayer.LODistinct.visit(LODistinct.java:104)
> at 
> org.apache.pig.impl.logicalLayer.LODistinct.visit(LODistinct.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:98)
> at 
> org.apache.pig.pen.LineageTrimmingVisitor.(LineageTrimmingVisitor.java:90)
> at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:106)
> at org.apache.pig.PigServer.getExamples(PigServer.java:724)
> at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:541)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:195)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> at org.apache.pig.Main.main(Main.java:361)
> 
> This works:
> illustrate ngramed1;
> Although it does throw a few NPEs :
> java.lang.NullPointerException
>   at 
> org.apache.pig.pen.util.DisplayExamples.ShortenField(DisplayExamples.java:205)
>   at 
> org.apache.pig.pen.util.DisplayExamples.MakeArray(DisplayExamples.java:190)
>   at 
> org.apache.pig.pen.util.DisplayExamples.PrintTabular(DisplayExamples.java:86)
> [...]
> (illustrate also doesn't work on bzipped input, but that's a separate issue)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-998) revisit frontend logic and pig-latin semantics

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-998:
--

Assignee: Alan Gates

> revisit frontend logic and pig-latin semantics
> --
>
> Key: PIG-998
> URL: https://issues.apache.org/jira/browse/PIG-998
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> This jira has been created to keep track of issues with current frontend 
> logic and pig-latin semantics.
> One example is handling of type information of map-values. At time of  query 
> plan generation pig does not know the type for map-values and assumes it is 
> bytearray. This leads to problems when the loader returns map-value of other 
> types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-969) Default constructor of UDF gets called for UDF with parameterised constructor , if the udf has a getArgToFuncMapping function defined

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-969:
--

Assignee: Daniel Dai

> Default constructor of UDF gets called for UDF with parameterised constructor 
> , if the udf has a getArgToFuncMapping function defined
> -
>
> Key: PIG-969
> URL: https://issues.apache.org/jira/browse/PIG-969
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
>
> This issue is discussed in  
> http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg00524.html . I am 
> able to reproduce the issue. While it is easy to fix the udf, it can take a 
> lot of time to figure out the problem (until they find this email 
> conversation!).
> The root cause is that when getArgToFuncMapping is defined in the udf , the 
> FuncSpec returned by the method replaces one set by define statement . The 
> constructor arguments get lost.  We can handle this in following ways -
> 1. Preserve the constructor arguments, and use it with the class name of the 
> matching FuncSpec from getArgToFuncMapping . 
> 2. Give an error if constructor paramerters are given for a udf which has 
> FuncSpecs returned from getArgToFuncMapping .
> The problem with  approach 1 is that we are letting the user define the 
> FuncSpec , so user could have defined a FuncSpec with constructor (though 
> they don't have a valid reason to do so.). It is also possible the the 
> constructor of the different class that matched might not support same 
> constructor parameters. The use of this function outside builtin udfs are 
> also probably not common.
> With option 2, we are telling the user that this is not a supported use case, 
> and user can easily change the udf to fix the issue, or use the udf which 
> would have matched given parameters (which unlikely to have the 
> getArgToFuncMapping method defined).
> I am proposing that we go with option 2 . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-999) sorting on map-value fails if map-value is not of bytearray type

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-999:
--

Assignee: Alan Gates

> sorting on map-value fails if map-value is not of bytearray type
> 
>
> Key: PIG-999
> URL: https://issues.apache.org/jira/browse/PIG-999
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> When query execution plan is created by pig, it assumes the type to be 
> bytearray because there is no schema information associated with map fields.
> But at run time, the loader might return the actual type. This results in a 
> ClassCastException.
> This issue points to the larger issue of the way pig is handling types for 
> map-value. 
> This issue should be fixed in the context of revisiting the frontend logic 
> and pig-latin semantics.
> This is related to PIG-880 . The patch in PIG-880 changed PigStorage to 
> always return bytearray for map values to work around this, but other loaders 
> like BinStorage can return the actual type causing this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1065:
---

Assignee: Alan Gates

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1152) bincond operator throws parser error

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1152:
---

Assignee: Xuefu Zhang

> bincond operator throws parser error
> 
>
> Key: PIG-1152
> URL: https://issues.apache.org/jira/browse/PIG-1152
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> Bincond operator throws parser error when true condition contains a constant 
> bag with 1 tuple containing a single field of int type with -ve value. 
> Here is the script to reproduce the issue
> A = load 'A' as (s: chararray, x: int, y: int);
> B = group A by s;
> C = foreach B generate group, flatten(((COUNT(A) < 1L) ? {(-1)} : A.x));
> dump C;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1066) ILLUSTRATE called after DESCRIBE results in "Grunt: ERROR 2999: Unexpected internal error. null"

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1066:
---

Assignee: Yan Zhou

> ILLUSTRATE called after DESCRIBE results in "Grunt: ERROR 2999: Unexpected 
> internal error. null"
> 
>
> Key: PIG-1066
> URL: https://issues.apache.org/jira/browse/PIG-1066
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Bogdan Dorohonceanu
>Assignee: Yan Zhou
> Fix For: 0.9.0
>
>
> -- load the QID_CT_QP20 data
> x = LOAD '$FS_TD/$QID_IN_FILES' USING PigStorage('\t') AS 
> (unstem_qid:chararray, jid_score_pairs:chararray);
> DESCRIBE x;
> --ILLUSTRATE x;
> -- load the ID_RQ data
> y0 = LOAD '$FS_USER/$ID_RQ_IN_FILE' USING PigStorage('\t') AS (sid:chararray, 
> query:chararray);
> -- force parallelization
> -- y1 = ORDER y0 BY sid PARALLEL $NUM;
> -- compute unstem_qid
> DEFINE f `text_streamer_query j3_unicode.dat prop.dat normal.txt TAB TAB 
> 1:yes:UNSTEM_ID:%llx` INPUT(stdin USING PigStorage('\t')) OU\
> TPUT(stdout USING PigStorage('\t')) SHIP('$USER/text_streamer_query', 
> '$USER/j3_unicode.dat', '$USER/prop.dat', '$USER/normal.txt');
> y = STREAM y0 THROUGH f AS (sid:chararray, query:chararray, 
> unstem_qid:chararray);
> DESCRIBE y;
> --ILLUSTRATE y;
> rmf /user/vega/zoom/y_debug
> STORE y INTO '/user/vega/zoom/y_debug' USING PigStorage('\t');
> 2009-10-30 13:36:48,437 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: 
> hdfs://dd-9c32d03:8887/,/teoma/dd-9c34d04/middleware/hadoop.test.data/dfs/name
> 09/10/30 13:36:48 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: 
> hdfs://dd-9c32d03:8887/,/teoma/dd-9c34d04/middleware/hadoop.test.data/dfs/name
> 2009-10-30 13:36:48,495 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: dd-9c32d04:8889
> 09/10/30 13:36:48 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: dd-9c32d04:8889
> 2009-10-30 13:36:49,242 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. null
> 09/10/30 13:36:49 ERROR grunt.Grunt: ERROR 2999: Unexpected internal error. 
> null
> Details at logfile: /disk1/vega/zoom/pig_1256909801304.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1112) FLATTEN eliminates the alias

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1112:
---

Assignee: Alan Gates  (was: Daniel Dai)

> FLATTEN eliminates the alias
> 
>
> Key: PIG-1112
> URL: https://issues.apache.org/jira/browse/PIG-1112
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> If schema for a field of type 'bag' is partially defined then FLATTEN() 
> incorrectly eliminates the field and throws an error. 
> Consider the following example:-
> A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, 
> ladder:bag{});  
> B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second; 
>   
> C = GROUP B by (first,third);
> This throws the error
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
> Invalid alias: third in {first: chararray,second: chararray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1188) Padding nulls to the input tuple according to input schema

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1188:
---

Assignee: Alan Gates  (was: Richard Ding)

> Padding nulls to the input tuple according to input schema
> --
>
> Key: PIG-1188
> URL: https://issues.apache.org/jira/browse/PIG-1188
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Currently, the number of fields in the input tuple is determined by the data. 
> When we have schema, we should generate input data according to the schema, 
> and padding nulls if necessary. Here is one example:
> Pig script:
> {code}
> a = load '1.txt' as (a0, a1);
> dump a;
> {code}
> Input file:
> {code}
> 1   2
> 1   2   3
> 1
> {code}
> Current result:
> {code}
> (1,2)
> (1,2,3)
> (1)
> {code}
> Desired result:
> {code}
> (1,2)
> (1,2)
> (1, null)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1277) Pig should give error message when cogroup on tuple keys of different inner type

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1277:
---

Assignee: Alan Gates

> Pig should give error message when cogroup on tuple keys of different inner 
> type
> 
>
> Key: PIG-1277
> URL: https://issues.apache.org/jira/browse/PIG-1277
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> When we cogroup on a tuple, if the inner type of tuple does not match, we 
> treat them as different keys. This is confusing. It is desirable to give 
> error/warnings when it happens.
> Here is one example:
> UDF:
> {code}
> public class MapGenerate extends EvalFunc {
> @Override
> public Map exec(Tuple input) throws IOException {
> // TODO Auto-generated method stub
> Map m = new HashMap();
> m.put("key", new Integer(input.size()));
> return m;
> }
> 
> @Override
> public Schema outputSchema(Schema input) {
> return new Schema(new Schema.FieldSchema(null, DataType.MAP));
> }
> }
> {code}
> Pig script: 
> {code}
> a = load '1.txt' as (a0);
> b = foreach a generate a0, MapGenerate(*) as m:map[];
> c = foreach b generate a0, m#'key' as key;
> d = load '2.txt' as (c0, c1);
> e = cogroup c by (a0, key), d by (c0, c1);
> dump e;
> {code}
> 1.txt
> {code}
> 1
> {code}
> 2.txt
> {code}
> 1 1
> {code}
> User expected result (which is not right):
> {code}
> ((1,1),{(1,1)},{(1,1)})
> {code}
> Real result:
> {code}
> ((1,1),{(1,1)},{})
> ((1,1),{},{(1,1)})
> {code}
> We shall give user the message that we can not merge the key due to the type 
> mismatch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1281) Detect org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple type of errors at Compile Type during creation of logical plan

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1281:
---

Assignee: Alan Gates

> Detect org.apache.pig.data.DataByteArray cannot be cast to 
> org.apache.pig.data.Tuple type of errors at Compile Type during creation of 
> logical plan
> ---
>
> Key: PIG-1281
> URL: https://issues.apache.org/jira/browse/PIG-1281
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> This is more of an enhancement request, where we can detect simple errors 
> during compile time during creation of Logical plan rather than at the 
> backend.
> I created a script which contains an error which gets detected in the backend 
> as a cast error when in fact we can detect it in the front end(group is a 
> single element so group.$0 projection operation will not work).
> {code}
> inputdata = LOAD '/user/viraj/mymapdata' AS (co1, col2, col3, col4);
> projdata = FILTER inputdata BY (col1 is not null);
> groupprojdata = GROUP projdata BY col1;
> cleandata = FOREACH groupprojdata {
>  bagproj = projdata.col1;
>  dist_bags = DISTINCT bagproj;
>  GENERATE group.$0 as newcol1, COUNT(dist_bags) as 
> newcol2;
>   };
> cleandata1 = GROUP cleandata by newcol2;
> cleandata2 = FOREACH cleandata1 { GENERATE group.$0 as finalcol1, 
> COUNT(cleandata.newcol1) as finalcol2; };
> ordereddata = ORDER cleandata2 by finalcol2;
> store into 'finalresult' using PigStorage();
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1387) Syntactical Sugar for PIG-1385

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1387:
---

Assignee: Xuefu Zhang

> Syntactical Sugar for PIG-1385
> --
>
> Key: PIG-1387
> URL: https://issues.apache.org/jira/browse/PIG-1387
> Project: Pig
>  Issue Type: Wish
>  Components: grunt
>Affects Versions: 0.6.0
>Reporter: hc busy
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> From this conversation, extend PIG-1385 to instead of calling UDF use 
> built-in behavior when the (),{},[] groupings are encountered.
> > > What about making them part of the language using symbols?
> > >
> > > instead of
> > >
> > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> > >
> > > have language support
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> > >
> > > or even:
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> > >
> > >
> > > Is there reason not to do the second or third other than being more
> > > complicated?
> > >
> > > Certainly I'd volunteer to put the top implementation in to the util
> > > package and submit them for builtin's, but the latter syntactic candies
> > > seems more natural..
> > >
> > >
> > >
> > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> > >
> > >> The grouping package in piggybank is left over from back when Pig
> > allowed
> > >> users to define grouping functions (0.1).  Functions like these should
> > go in
> > >> evaluation.util.
> > >>
> > >> However, I'd consider putting these in builtin (in main Pig) instead.
> > >>  These are things everyone asks for and they seem like a reasonable
> > addition
> > >> to the core engine.  This will be more of a burden to write (as we'll
> > hold
> > >> them to a higher standard) but of more use to people as well.
> > >>
> > >> Alan.
> > >>
> > >>
> > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> > >>
> > >>  Some times I wonder... I mean, somebody went to the trouble of making a
> > >>> path
> > >>> called
> > >>>
> > >>> org.apache.pig.piggybank.grouping
> > >>>
> > >>> (where it seems like this code belong), but didn't check in any java
> > code
> > >>> into that package.
> > >>>
> > >>>
> > >>> Any comment about where to put this kind of utility classes?
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> > >>>
> > >>>  2010/4/19 hc busy 
> > 
> >   That's just the way it is right now, you can't make bags or tuples
> > > directly... Maybe we should have some UDF's in piggybank for these:
> > >
> > > toBag()
> > > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > > TupleToBag(); --some times you need it this way for some reason.
> > >
> > >
> > >  Ok. I place my current code here, may be later I make a patch (if
> > such
> >  implementation is acceptable of course).
> > 
> >  import org.apache.pig.EvalFunc;
> >  import org.apache.pig.data.BagFactory;
> >  import org.apache.pig.data.DataBag;
> >  import org.apache.pig.data.Tuple;
> >  import org.apache.pig.data.TupleFactory;
> > 
> >  import java.io.IOException;
> > 
> >  /**
> >  * Convert any sequence of fields to bag with specified count of
> >  fields
> >  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
> >  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
> >  *
> >  * @author astepachev
> >  */
> >  public class ToBag extends EvalFunc {
> >   public BagFactory bagFactory;
> >   public TupleFactory tupleFactory;
> > 
> >   public ToBag() {
> >   bagFactory = BagFactory.getInstance();
> >   tupleFactory = TupleFactory.getInstance();
> >   }
> > 
> >   @Override
> >   public DataBag exec(Tuple input) throws IOException {
> >   if (input.isNull())
> >   return null;
> >   final DataBag bag = bagFactory.newDefaultBag();
> >   final Integer couter = (Integer) input.get(0);
> >   if (couter == null)
> >   return null;
> >   Tuple tuple = tupleFactory.newTuple();
> >   for (int i = 0; i < input.size() - 1; i++) {
> >   if (i % couter == 0) {
> >   tuple = tupleFactory.newTuple();
> >   bag.add(tuple);
> >   }
> >   tuple.append(input.get(i + 1));
> >   }
> >   return bag;
> >   }
> >  }
> > 
> >  import org.apache.pig.ExecType;
> >  import org.apache.pig.PigServer;
> >  import org.junit.Before;
> >  import org.junit.Test;
> > 
> >  import java.io.IOException;
> >  import java.net.URISyntaxException;
> >  import java.net.URL;
> > 
> >  import static org.junit.Assert.asser

[jira] Assigned: (PIG-621) Casts swallow exceptions when there are issues with conversion of bytes to Pig types

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-621:
--

Assignee: Alan Gates

> Casts swallow exceptions when there are issues with conversion of bytes to 
> Pig types
> 
>
> Key: PIG-621
> URL: https://issues.apache.org/jira/browse/PIG-621
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> In the current implementation of casts, exceptions thrown while converting 
> bytes to Pig types are swallowed. Pig needs to either return NULL or rethrow 
> the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1462) No informative error message on parse problem

2010-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1462:
---

Assignee: Xuefu Zhang

> No informative error message on parse problem
> -
>
> Key: PIG-1462
> URL: https://issues.apache.org/jira/browse/PIG-1462
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ankur
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> Consider the following script
> in = load 'data' using PigStorage() as (m:map[]);
> tags = foreach in generate m#'k1' as (tagtuple: tuple(chararray));
> dump tags;
> This throws the following error message that does not really say that this is 
> a bad declaration
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Encountered "" at line 2, column 38.
> Was expecting one of:
> 
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>   at org.apache.pig.Main.main(Main.java:391)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1589) add test cases for mapreduce operator which use distributed cache

2010-09-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909042#action_12909042
 ] 

Thejas M Nair commented on PIG-1589:


Unit tests have passed patch is ready for review.
Summary of changes -
WordCount.java - The hadoop wordcount example modified to use a stopwords file. 
The stopwords file is sent using distributed cache.
TestNativeMapReduce.java - Modified test case to use distributed cache. Changed 
the class name for the udf in other tests.


> add test cases for mapreduce operator which use distributed cache
> -
>
> Key: PIG-1589
> URL: https://issues.apache.org/jira/browse/PIG-1589
> Project: Pig
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1589.1.patch, TestWordCount.jar
>
>
> '-files filename' can be specified in the parameters for mapreduce operator 
> to send files to distributed cache. Need to add test cases for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1589) add test cases for mapreduce operator which use distributed cache

2010-09-13 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909061#action_12909061
 ] 

Richard Ding commented on PIG-1589:
---

+1

> add test cases for mapreduce operator which use distributed cache
> -
>
> Key: PIG-1589
> URL: https://issues.apache.org/jira/browse/PIG-1589
> Project: Pig
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1589.1.patch, TestWordCount.jar
>
>
> '-files filename' can be specified in the parameters for mapreduce operator 
> to send files to distributed cache. Need to add test cases for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1542) log level not propogated to MR task loggers

2010-09-13 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1542:
---

Attachment: PIG-1542_1.patch

Unset the hadoop debug messages and only printing the pig debug messages.

> log level not propogated to MR task loggers
> ---
>
> Key: PIG-1542
> URL: https://issues.apache.org/jira/browse/PIG-1542
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1542.patch, PIG-1542_1.patch
>
>
> Specifying "-d DEBUG" does not affect the logging of the MR tasks .
> This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-239) illustrate followed by dump gives a runtime exception

2010-09-13 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou resolved PIG-239.
--

Fix Version/s: 0.8.0
   (was: 0.9.0)
   Resolution: Cannot Reproduce

Can not reproduce using 0.8.

> illustrate followed by dump gives a runtime exception
> -
>
> Key: PIG-239
> URL: https://issues.apache.org/jira/browse/PIG-239
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Pradeep Kamath
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
>
> Here is a session which outlines the issue:
> grunt> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, 
> age,gpa);
> grunt> b = filter a by name lt 'b';
> grunt> c = foreach b generate TOKENIZE(name);
> grunt> illustrate c;
> -
> | a | name  | age   | gpa   |
> -
> |   | tom xylophone | 69| 0.04  |
> |   | alice ovid| 75| 3.89  |
> -
> --
> | b | name   | age   | gpa   |
> --
> |   | alice ovid | 75| 3.89  |
> --
> -
> | c | (token )  |
> -
> |   | {(alice), (ovid)} |
> -
> grunt> dump c;
> 2008-05-15 14:35:54,476 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> java.lang.RuntimeException: java.io.IOException: Serialization error: 
> org.apache.pig.impl.util.
> LineageTracer
> at 
> org.apache.pig.backend.hadoop.executionengine.POMapreduce.copy(POMapreduce.java:242)
> at 
> org.apache.pig.backend.hadoop.executionengine.MapreducePlanCompiler.compile(MapreducePlanCompiler.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:209)
> at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:410)
> at org.apache.pig.PigServer.openIterator(PigServer.java:332)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:265)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:73)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> at org.apache.pig.Main.main(Main.java:270)
> Caused by: java.io.IOException: Serialization error: 
> org.apache.pig.impl.util.LineageTracer
> at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
> at 
> org.apache.pig.impl.util.ObjectSerializer.serialize(ObjectSerializer.java:44)
> at 
> org.apache.pig.backend.hadoop.executionengine.POMapreduce.copy(POMapreduce.java:233)
> ... 10 more
> Caused by: java.io.NotSerializableException: 
> org.apache.pig.impl.util.LineageTracer
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1081)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302)
> at java.util.ArrayList.writeObject(ArrayList.java:569)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:917)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1339)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputS