[jira] Resolved: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1605.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Release audit warning is due to jdiff. No new file added. Patch committed to 
both trunk and 0.8 branch.

> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1605-1.patch, PIG-1605-2.patch
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Attachment: PIG-1605-2.patch

PIG-1605-2.patch fix findbug warnings.

test-patch result:
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 455 release 
audit warnings (more than the trunk's current 453 warning
s).

> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1605-1.patch, PIG-1605-2.patch
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1635:
--

Status: Patch Available  (was: Open)

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1635.patch
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1635:
--

Attachment: PIG-1635.patch

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1635.patch
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913339#action_12913339
 ] 

Thejas M Nair commented on PIG-1636:


+1 

> Scalar fail if the scalar variable is generated by limit
> 
>
> Key: PIG-1636
> URL: https://issues.apache.org/jira/browse/PIG-1636
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1636-1.patch
>
>
> The following script fail:
> {code}
> a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
> b = group a all;
> c = foreach b generate SUM(a.age) as total;
> c1= limit c 1;
> d = foreach a generate name, age/(double)c1.total as d_sum;
> store d into '111';
> {code}
> The problem is we have a reference to c1 in d. In the optimizer, we push 
> limit before foreach, d still reference to limit, and we get the wrong schema 
> for the scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913335#action_12913335
 ] 

Thejas M Nair commented on PIG-1605:


Looks good. +1
Possible optimizations - (can be done in future )-
1. If column-pruning rule removes the relation-as-scalar column, then the 
soft-link can be removed.
2. split-filter rule will be disabled if it has a relation-as-scalar in the 
filter expression. If we filter expressions has the relation-as-scalar and 
update soft-links accordingly, we don't need to disable this rule.


> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1605-1.patch
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1598) Pig gobbles up error messages - Part 2

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1598:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch looks good. Committed to both trunk and 0.8 branch.

> Pig gobbles up error messages - Part 2
> --
>
> Key: PIG-1598
> URL: https://issues.apache.org/jira/browse/PIG-1598
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ashutosh Chauhan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: PIG-1598_0.patch
>
>
> Another case of PIG-1531 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-772) Semantics of Filter statement inside ForEach should support filtering on aliases used in the Group statement preceding it

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-772.


Resolution: Invalid

The error message here is bad, but this is an error.  You are trying to 
secretly do a join in the filter line by referencing two relations (N and A).  
Pig does not allow a filter operator to have multiple inputs.


> Semantics of Filter statement inside ForEach should support filtering on 
> aliases used in the Group statement preceding it
> -
>
> Key: PIG-772
> URL: https://issues.apache.org/jira/browse/PIG-772
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: half.txt
>
>
> I have  a Pig script which tries to display all bags which are greater than 
> the average value in the group.
> Input: half.txt
> ===
> A   1
> A   2
> A   3
> B   1
> B   3
> 
> {code}
> A = LOAD 'half.txt' AS (key:CHARARRAY, val:INT);
> B = GROUP A BY key;
> C = FOREACH B {
>N = AVG(A.val);
>HALF = FILTER A by val >= N;
> GENERATE
>FLATTEN(GROUP),
>HALF;
> };
> dump C;
> {code}
> 
> Expected Output:
> 
> (A,{(A,2),(A,3)})
> (B,{(B,3)})
> 
> Presently the semantics of the Filter statement inside the FOREACH does not 
> support these types of operations.
> Error when running the above script.
> =
> ERROR 1000: Error during parsing. Invalid alias: A in {key: chararray,val: 
> int}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Invalid alias: A in {key: chararray,val: int}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:320)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:279)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:364)
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1639:


Assignee: Xuefu Zhang  (was: Daniel Dai)

> New logical plan: PushUpFilter should not optimize if filter condition 
> contains UDF
> ---
>
> Key: PIG-1639
> URL: https://issues.apache.org/jira/browse/PIG-1639
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
>
> The following script fail:
> {code}
> a = load 'file' AS (f1, f2, f3);
> b = group a by f1;
> c = filter b by COUNT(a) > 1;
> dump c;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1640) bin/pig does not run in local mode due to classes missing from classpath

2010-09-21 Thread Olga Natkovich (JIRA)
bin/pig does not run in local mode due to classes missing from classpath


 Key: PIG-1640
 URL: https://issues.apache.org/jira/browse/PIG-1640
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
 Fix For: 0.8.0


This issue was reported by one of Yahoo users. I have not verified the problem. 
Here is the report

"when do bin/pig -x local, the shell doesn't come up.  It complained about 
jline not being found.  Here is a patch to bin/pig:

+for f in $PIG_HOME/build/ivy/lib/Pig/*.jar; do
+CLASSPATH=${CLASSPATH}:$f;
+done
+"


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1639:


Description: 
The following script fail:
{code}
a = load 'file' AS (f1, f2, f3);
b = group a by f1;
c = filter b by COUNT(a) > 1;
dump c;
{code}

> New logical plan: PushUpFilter should not optimize if filter condition 
> contains UDF
> ---
>
> Key: PIG-1639
> URL: https://issues.apache.org/jira/browse/PIG-1639
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> The following script fail:
> {code}
> a = load 'file' AS (f1, f2, f3);
> b = group a by f1;
> c = filter b by COUNT(a) > 1;
> dump c;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF

2010-09-21 Thread Daniel Dai (JIRA)
New logical plan: PushUpFilter should not optimize if filter condition contains 
UDF
---

 Key: PIG-1639
 URL: https://issues.apache.org/jira/browse/PIG-1639
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1636:


Attachment: PIG-1636-1.patch

This patch depends on PIG-1605.

> Scalar fail if the scalar variable is generated by limit
> 
>
> Key: PIG-1636
> URL: https://issues.apache.org/jira/browse/PIG-1636
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1636-1.patch
>
>
> The following script fail:
> {code}
> a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
> b = group a all;
> c = foreach b generate SUM(a.age) as total;
> c1= limit c 1;
> d = foreach a generate name, age/(double)c1.total as d_sum;
> store d into '111';
> {code}
> The problem is we have a reference to c1 in d. In the optimizer, we push 
> limit before foreach, d still reference to limit, and we get the wrong schema 
> for the scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-21 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1638:
---

Status: Patch Available  (was: Open)

> sh output gets mixed up with the grunt prompt
> -
>
> Key: PIG-1638
> URL: https://issues.apache.org/jira/browse/PIG-1638
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.8.0
>Reporter: niraj rai
>Assignee: niraj rai
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1638_0.patch
>
>
> Many times, the grunt prompt gets mixed up with the sh output.e.g.
> grunt> sh ls
> 000
> autocomplete
> bin
> build
> build.xml
> grunt> CHANGES.txt
> conf
> contrib
> In the above case,  grunt> is mixed up with the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-21 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1638:
---

Attachment: PIG-1638_0.patch

> sh output gets mixed up with the grunt prompt
> -
>
> Key: PIG-1638
> URL: https://issues.apache.org/jira/browse/PIG-1638
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.8.0
>Reporter: niraj rai
>Assignee: niraj rai
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1638_0.patch
>
>
> Many times, the grunt prompt gets mixed up with the sh output.e.g.
> grunt> sh ls
> 000
> autocomplete
> bin
> build
> build.xml
> grunt> CHANGES.txt
> conf
> contrib
> In the above case,  grunt> is mixed up with the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1638) sh output gets mixed up with the grunt prompt

2010-09-21 Thread niraj rai (JIRA)
sh output gets mixed up with the grunt prompt
-

 Key: PIG-1638
 URL: https://issues.apache.org/jira/browse/PIG-1638
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.8.0
Reporter: niraj rai
Assignee: niraj rai
Priority: Minor
 Fix For: 0.8.0


Many times, the grunt prompt gets mixed up with the sh output.e.g.
grunt> sh ls
000
autocomplete
bin
build
build.xml
grunt> CHANGES.txt
conf
contrib

In the above case,  grunt> is mixed up with the output.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-419) Combiner optimizations extended to nested foreach statements as well

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-419:
--

Assignee: Thejas M Nair

> Combiner optimizations extended to nested foreach statements as well
> 
>
> Key: PIG-419
> URL: https://issues.apache.org/jira/browse/PIG-419
> Project: Pig
>  Issue Type: Improvement
>Reporter: Anand Murugappan
>Assignee: Thejas M Nair
>
> While Pig 2.0 seems to have optimized foreach statements by using the 
> combiner more aggressively, nested foreach statements lack this 
> functionality. Given that several of our projects use nested foreach 
> statements, we would like to see the optimizations extended to those cases as 
> well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-453) Scope resolution operators in flattened schemas need to be fixed

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-453:
---

 Assignee: Alan Gates  (was: Santhosh Srinivasan)
Fix Version/s: 0.9.0

> Scope resolution operators in flattened schemas need to be fixed
> 
>
> Key: PIG-453
> URL: https://issues.apache.org/jira/browse/PIG-453
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> Currently, the scope resolution operator :: is stored as part of the field 
> schema alias. As a result, users may get confused by queries like:
> {code}
> a = load 'st10k' as (name, age, gpa);
> b = group a by name;
> c = foreach b generate flatten(a);
> d = filter c by name != 'fred';
> e = group d by name;
> f = foreach e generate flatten(d);
> g = foreach f generate name;
> {code}
> With PIG-451, the schema for f will have a column with aliases a::name and 
> d::a::name. The use of d::a::name is particularly confusing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-438) Handle realiasing of existing Alias (A=B;)

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-438:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> Handle realiasing of existing Alias (A=B;) 
> ---
>
> Key: PIG-438
> URL: https://issues.apache.org/jira/browse/PIG-438
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> We do not handle re-aliasing of an existing alias - this should be handled 
> correctly.
> The following script should work:
> {code}
> a = load 'studenttab10k';
> b = filter a by $1 > '25';
> c = b;
> -- use b
> d = cogroup b by $0, a by $0;
> e = foreach d generate flatten(b), flatten(a);
> dump e
> -- use c
> f = cogroup c by $0, a by $0;
> g = foreach f generate flatten(c), flatten(a);
> dump g;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-479) PERFORMANCE: more extensive use of the combier

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-479:
--

Assignee: Thejas M Nair

> PERFORMANCE: more extensive use of the combier
> --
>
> Key: PIG-479
> URL: https://issues.apache.org/jira/browse/PIG-479
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>Assignee: Thejas M Nair
>
>  On types branch, the combiner is used anytime a foreach includes only simple 
> projections and/or algebraic functions.  It would also be useful to invoke 
> the combiner in cases where algebraic and non-algebraic operations are mixed, 
> or where expression evaluation is included in the foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-496) project of bags from complex data causes failures

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-496:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> project of bags from complex data causes failures
> -
>
> Key: PIG-496
> URL: https://issues.apache.org/jira/browse/PIG-496
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> A = load 'complex data' as (x: bag{});
> B = foreach A generate x.($1, $2);
> produces stack trace:
> 2008-10-14 15:11:07,639 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (reduce) 
> task_200809241441_9923_r_00java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:183)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:215)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:166)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:252)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:222)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> Pradeep suspects that the problem is in 
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java;
>  line 374

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-516) order by

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-516:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0

> order by 
> -
>
> Key: PIG-516
> URL: https://issues.apache.org/jira/browse/PIG-516
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.2.0
>Reporter: Christopher Olston
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> I want to do ORDER A BY f($0). Should be allowed. (Workaround of adding a 
> column, sorting, then removing column is yucky and in fact impossible if I 
> don't know the schema.)
> Important use case: ORDER A BY Random(), to do random sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-666) Bug in Schema comparison for equality

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-666:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0

> Bug in Schema comparison for equality
> -
>
> Key: PIG-666
> URL: https://issues.apache.org/jira/browse/PIG-666
> Project: Pig
>  Issue Type: Bug
>  Components: build
> Environment: i686 i386 GNU/Linux
>Reporter: Araceli Henley
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> Santhosh is currently improving Error Handling, I ran these tests against : 
> pig_phase_3.jar
> This is a bug in the schema comparison for equality
>  # valid use of MAX with Bag as value
> TEST: AggregateFunc_23.pig
>  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE A.Fint, MAX( ( BAG{tuple(int)}) A.Fbag.age ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_23.out' USING 
> PigStorage();
>  # valid use of SUM with int with valid cast
> TEST:  AggregateFunc_231.pig
>  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE A.Fint, SUM( ( BAG{ tuple(double)} ) A.Fbag.age ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_231.out' USING 
> PigStorage();
>  # valid use of SUM with cast for field in bag
> TEST: AggregateFunc_26.pig
> A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) )
> ;B =GROUP A ALL; 
> X =FOREACH B GENERATE SUM ( (BAG{tuple(int)}) A.Fbag.age ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_26.out' USING 
> PigStorage();
> # valid use of MIN with cast for field in bag
> TEST:  AggregateFunc_27.pig
>  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE MIN ( (BAG{tuple(int)}) A.Fbag.age ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_27.out' USING 
> PigStorage();
>  # valid use of AVG with Long as value
> TEST: AggregateFunc_46.pig
>  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE A.Fint, AVG( ( BAG{tuple(double)} ) A.Fint ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_47.out' USING 
> PigStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-667) Error in projection implementation or in typechecking when casting a member of Bag

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-667:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> Error in projection implementation or in typechecking  when casting a member 
> of Bag
> ---
>
> Key: PIG-667
> URL: https://issues.apache.org/jira/browse/PIG-667
> Project: Pig
>  Issue Type: Bug
> Environment: i686 i386 GNU/Linux
>Reporter: Araceli Henley
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> As one of its members, a bag contains "age" of type "int". When this value is 
> used as an argument to DIFF and cast as an int for the comparison, the 
> following error is thrown:
> 9/02/11 14:20:46 INFO mapReduceLayer.MapReduceLauncher: 50% complete
> 09/02/11 14:21:31 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job 
> failed
> 09/02/11 14:21:31 ERROR mapReduceLayer.MapReduceLauncher: Number of failed 
> jobs: 1
> 09/02/11 14:21:31 ERROR mapReduceLayer.MapReduceLauncher: Job failed!
> error message for task: map
> error message for task: reduce
> 09/02/11 14:21:31 ERROR grunt.Grunt: ERROR 1072: Out of bounds access: 
> Request for field number 1 exceeds tuple size of 1
> Steps to reproduce 
>  # valid use of DIFF with valid cast for bag field
> TEST ErrorHandling.AggregateFunc_601
>  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE DIFF ( ( BAG{tuple(int)} ) A.Fbag.age, A.Fint );
>  STORE X INTO 
> '/user/pig/tests/results/araceli.1234390832/AggregateFunc_601.out' USING 
> PigStorage();
>  # invalid use of DIFF with valid cast for bag field, DIFF contains one 
> argument instead off two
> TEST ErrorHandling.AggregateFunc_60
>  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE DIFF ( ( BAG{tuple(int)} ) A.Fbag.age ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_60.out' USING 
> PigStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-669) Bug in Schema comparison for casting

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-669.


Resolution: Not A Problem

This is correct behavior.  SUM does not accept two arguments.

> Bug in Schema comparison for casting
> 
>
> Key: PIG-669
> URL: https://issues.apache.org/jira/browse/PIG-669
> Project: Pig
>  Issue Type: Bug
> Environment: i686 i386 GNU/Linux
>Reporter: Araceli Henley
>
> This is a bug int he Schema comparison for casting. This is a valid use of a 
> cast in SUM,  the first and second arguments are a cast to a Bag with an int.
>  ERROR 1045: Could not infer the matching function for 
> org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an 
> explicit cast.
> TEST: AggregateFunc_61 
> A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE SUM ( ( BAG{tuple(int)} ) A.Fbag.age, ( BAG{tuple(int)} 
> ) A.Fbag.age); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234465985/AggregateFunc_61.out' USING 
> PigStorage();
> Suggest you also try:
> X =FOREACH B GENERATE SUM ( ( BAG{tuple(int)} ) A.Fbag.age, A.Fint ); 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-670) DIFF contains an invalid expression - possible parser error

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-670:
---

 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> DIFF contains an invalid expression - possible parser error
> ---
>
> Key: PIG-670
> URL: https://issues.apache.org/jira/browse/PIG-670
> Project: Pig
>  Issue Type: Bug
> Environment:  i686 i386 GNU/Linux
>Reporter: Araceli Henley
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.9.0
>
>
> Requires further investigation.
> This test takes in an invalid expression as the first argument in the DIFF 
> function and results in the following error:
> ERROR 1000: Error during parsing. Invalid alias: DIFF
> Why is the parser interpreting DIFF as an alias? 
> TEST: AggregateFunc_131
> A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
> Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
> Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
> name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
> B =GROUP A ALL; 
> X =FOREACH B GENERATE DIFF( A.Fint + A.Fint + ); 
> STORE X INTO 
> '/user/pig/tests/results/araceli.1234381533/AggregateFunc_131.out' USING 
> PigStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-931) Samples Syntax Error in Pig UDF Manual

2010-09-21 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-931.
-

Resolution: Fixed

Fixed as part of Pig 080 beta-1 (see pig-1600 for patch).

> Samples Syntax Error in Pig UDF Manual
> --
>
> Key: PIG-931
> URL: https://issues.apache.org/jira/browse/PIG-931
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.2.0, 0.3.0
> Environment: Windows XP, firefox 3.5.2
>Reporter: Yiwei Chen
>Assignee: Corinne Chandel
>Priority: Trivial
> Fix For: 0.8.0
>
>
> All samples with 'extends EvalFunc' have syntax errors in 
> http://hadoop.apache.org/pig/docs/r0.3.0/udf.html .
> There shouldn't be parentheses; they are angle brackets.
> For example in "How to Write a Simple Eval Function" section:
>   public class UPPER extends EvalFunc (String)
> should be 
>   public class UPPER extends EvalFunc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-678) "as" support for group-by

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-678:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> "as" support for group-by
> -
>
> Key: PIG-678
> URL: https://issues.apache.org/jira/browse/PIG-678
> Project: Pig
>  Issue Type: Improvement
>Reporter: Christopher Olston
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> I should be able to use "as" with GROUP the same way I use it with LOAD, i.e. 
> rename the entire schema. This is especially important b/c the system 
> automatically assigns schema names for the output of group that many people 
> find unintuitive.
> e.g. this should work:
> grouped = GROUP data BY url AS (url, history);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-719) store into 'filename'; should be valid syntax, but does not work

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-719:
---

 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0

> store  into 'filename'; should be valid syntax, but does not work
> ---
>
> Key: PIG-719
> URL: https://issues.apache.org/jira/browse/PIG-719
> Project: Pig
>  Issue Type: Bug
> Environment: pig local model (although I think it's a parsing 
> problem, not an execution problem)
>Reporter: Christopher Olston
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.9.0
>
>
> This pig script should work:
> STORE (LOAD 'inputfile') INTO 'outputfile';
> but it does not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-749) No attempt to check if 'flatten(group) as' has the same cardinality as 'group alias by'

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-749:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0

> No attempt to check if 'flatten(group) as' has the same cardinality as 'group 
> alias by'
> ---
>
> Key: PIG-749
> URL: https://issues.apache.org/jira/browse/PIG-749
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> Pig script which does grouping for 3 columns and flattens as 4 columns works 
> when in principle it should not and maybe fail as a front-end error.
> {code}
> A = load 'groupcardinalitycheck.txt' using PigStorage() as (col1:chararray, 
> col2:chararray, col3:int, col4:chararray);
> B = group A by (col1, col2, col3);
> C = foreach B generate
>flatten(group) as (col1, col2, col3, col4),
>SIZE(A) as frequency;
> dump C;
> {code}
> ==
> Data
> ==
> hello   CC  1   there
> hello   YSO 2   out
> ouchCC  2   hey
> ==
> Result of the preceding script
> ==
> (ouch,CC,2,1L)
> (hello,CC,1,1L)
> (hello,YSO,2,1L)
> ==

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-750) Use combiner when a mix of algebraic and non-algebraic functions are used

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-750:
--

Assignee: Thejas M Nair

Our performance tests have shown that having combiner and non-combiner 
functions in the same MR job actually severly slows things down.  We suspect 
that this is because you have to pass the bags for the non-combiner functions 
through the combiner and you pay for the multiple (de)serialization passes.

However, the other things noted in this bug, such as the need to use the 
combiner when algebraic UDFs are involved in simple expressions is valid, and 
is along the lines of issues Thejas is working on for the combiner.  So I'm 
assigning the issue to him.

> Use combiner when a mix of algebraic and non-algebraic functions are used
> -
>
> Key: PIG-750
> URL: https://issues.apache.org/jira/browse/PIG-750
> Project: Pig
>  Issue Type: Improvement
>Reporter: Amir Youssefi
>Assignee: Thejas M Nair
>Priority: Minor
>
> Currently Pig uses combiner when all a,b, c,... are algebraic (e.g. SUM, AVG 
> etc.) in foreach:
> foreach X generate a,b,c,... 
>  It's a performance improvement if it uses combiner when a mix of algebraic 
> and non-algebraic functions are used as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning

2010-09-21 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy resolved PIG-916.
---

Fix Version/s: 0.8.0
   Resolution: Duplicate

Fixed in PIG-1205

> Change the pig hbase interface to get more than one row at a time when 
> scanning
> ---
>
> Key: PIG-916
> URL: https://issues.apache.org/jira/browse/PIG-916
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alex Newman
>Assignee: Dmitriy V. Ryaboy
>Priority: Trivial
> Fix For: 0.8.0
>
>
> It should be significantly faster to get numerous rows at the same time 
> rather than one row at a time for large table extraction processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-772) Semantics of Filter statement inside ForEach should support filtering on aliases used in the Group statement preceding it

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-772:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0

> Semantics of Filter statement inside ForEach should support filtering on 
> aliases used in the Group statement preceding it
> -
>
> Key: PIG-772
> URL: https://issues.apache.org/jira/browse/PIG-772
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: half.txt
>
>
> I have  a Pig script which tries to display all bags which are greater than 
> the average value in the group.
> Input: half.txt
> ===
> A   1
> A   2
> A   3
> B   1
> B   3
> 
> {code}
> A = LOAD 'half.txt' AS (key:CHARARRAY, val:INT);
> B = GROUP A BY key;
> C = FOREACH B {
>N = AVG(A.val);
>HALF = FILTER A by val >= N;
> GENERATE
>FLATTEN(GROUP),
>HALF;
> };
> dump C;
> {code}
> 
> Expected Output:
> 
> (A,{(A,2),(A,3)})
> (B,{(B,3)})
> 
> Presently the semantics of the Filter statement inside the FOREACH does not 
> support these types of operations.
> Error when running the above script.
> =
> ERROR 1000: Error during parsing. Invalid alias: A in {key: chararray,val: 
> int}
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Invalid alias: A in {key: chararray,val: int}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:320)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:279)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:364)
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-827) Redesign graph operations in OperatorPlan

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-827.


Resolution: Fixed

The new optimizer and plan structure introduced in 0.7 cover this.

> Redesign graph operations in OperatorPlan
> -
>
> Key: PIG-827
> URL: https://issues.apache.org/jira/browse/PIG-827
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Santhosh Srinivasan
>
> The graph operations swap, insertBetween, pushBefore, etc. have to be 
> re-implemented in a layered fashion. The layering will facilitate the re-use 
> of operations. In addition, use of operator.rewire in the aforementioned 
> operations requires transaction like ability due to various pre-conditions. 
> Often, the result of one of the operations leaves the graph in an 
> inconsistent state for the rewire operation. Clear layering and assignment of 
> the ability to rewire will remove these inconsistencies. For now, use of 
> rewire has resulted in a slightly less maintainable code along with the 
> necessity to use rewire with discretion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-828) Problem accessing a tuple within a bag

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-828:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0

> Problem accessing a tuple within a bag
> --
>
> Key: PIG-828
> URL: https://issues.apache.org/jira/browse/PIG-828
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Alan Gates
> Fix For: 0.9.0
>
> Attachments: studenttab5, tupleacc.pig
>
>
> Below pig script creates a tuple which contains 3 columns, 2 of which are 
> chararray's and the third column is a bag of constant chararray. The script 
> later projects the tuple within a bag.
> {code}
> a = load 'studenttab5' as (name, age, gpa);
> b = foreach a generate ('viraj', {('sms')}, 'pig') as 
> document:(id,singlebag:{singleTuple:(single)}, article);
> describe b;
> c = foreach b generate document.singlebag;
> dump c;
> {code}
> When we run this script we get a run-time error in the Map phase.
> 
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:402)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:183)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:400)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:183)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:245)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:236)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-836) Allow setting of end-of-record delimiter in PigStorage

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-836.


Resolution: Won't Fix

PigStorage now depends on TextInputFormat to parse lines.  It does not allow 
the user to specify the end of line indicator.  If it does at some point in the 
future then Pig can make use of that.  We are not going to rewrite 
TextInputFormat for ourselves just to get this feature.

> Allow setting of end-of-record delimiter in PigStorage
> --
>
> Key: PIG-836
> URL: https://issues.apache.org/jira/browse/PIG-836
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: George Mavromatis
>Assignee: Benjamin Reed
>
> PigStorage allows overriding the default field delimiter ('\t'), but does not 
> allow overriding the record delimiter ('\n').
> It is a valid use case that fields contain new lines, e.g. because they are 
> contents of a document/web page. It is possible for the user to create a 
> custom load/store UDF to achieve that, but that is extra work on the user, 
> many users will have to do it , and that udf would be the exact code 
> duplicate of the PigStorage except for the delimiter.
> Thus, PigStorage() should allow to configure both field and record separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1076) Make PigOutputCommitter conform with new FileOututCommitter in hadoop trunk

2010-09-21 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913242#action_12913242
 ] 

Pradeep Kamath commented on PIG-1076:
-

The patch will need new hadoop sources which have not yet been released on 
apache - so until then the patch can be used against hadoop trunk but since pig 
build picks released hadoop this would not be seemless.

> Make PigOutputCommitter conform with new FileOututCommitter in hadoop trunk
> ---
>
> Key: PIG-1076
> URL: https://issues.apache.org/jira/browse/PIG-1076
> Project: Pig
>  Issue Type: Improvement
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1076.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-847:
---

Fix Version/s: 0.9.0

> Setting twoLevelAccessRequired field in a bag schema should not be required 
> to access fields in the tuples of the bag
> -
>
> Key: PIG-847
> URL: https://issues.apache.org/jira/browse/PIG-847
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Currently Pig interprets the result type of a relation as a bag. However the 
> schema of the relation directly contains the schema describing the fields in 
> the tuples for the relation. However when a udf wants to return a bag or if 
> there is a bag in input data or if the user creates a bag constant, the 
> schema of the bag has one field schema which is that of the tuple. The 
> Tuple's schema has the types of the fields. To be able to access the fields 
> from the bag directly in such a case by using something like 
> . or ., the schema of the bag should 
> have the twoLevelAccess set to true so that pig's type system can get 
> traverse the tuple schema and get to the field in question. This is confusing 
> - we should try and see if we can avoid needing this extra flag. A possible 
> solution is to treat bags the same way - whether they represent relations or 
> real bags. Another way is to introduce a special "relation" datatype for the 
> result type of a relation and bag type would be used only for true bags. In 
> this case, we would always need bag schema to have a tuple schema which would 
> describe the fields. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-871) Improve distribution of keys in reduce phase

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-871:
--

Assignee: Thejas M Nair

> Improve distribution of keys in reduce phase
> 
>
> Key: PIG-871
> URL: https://issues.apache.org/jira/browse/PIG-871
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Ankur
>Assignee: Thejas M Nair
>
> The default hashing scheme used to distribute keys in reduce phase sometimes 
> results in an uneven distribution of keys resulting in 5 - 10 % of reducers 
> being overloaded with data. This bottleneck makes the PIG jobs really slow 
> and gives users a bad impression.
> While there is no bullet proof solution to the problem in general, the 
> hashing can certainly be improved for better distribution. The proposal here 
> is to evaluate and incorporate other hashing schemes that give high avalanche 
> and more even distribution. We can start by evaluating MurmurHash which is 
> Apache 2.0 licensed and freely available here - 
> http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-904) Conversion from double to chararray for udf input arguments does not occur

2010-09-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913227#action_12913227
 ] 

Alan Gates commented on PIG-904:


I don't understand what the issue is here.  CONCAT does not take doubles.  The 
script above tries to pass it a double, and Pig properly says you can't do 
that.  Is the issue that an implicit cast isn't inserted here?  I don't think 
Pig currently does implicit casts to match possible UDF signatures.

> Conversion from double to chararray for udf input arguments does not occur
> --
>
> Key: PIG-904
> URL: https://issues.apache.org/jira/browse/PIG-904
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Script showing the problem:
> {noformat}
>  "a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, 
> gpa:double); b = foreach a generate CONCAT(gpa, 'dummy'); dump b;"
> Error shown:
> 2009-08-03 17:04:27,573 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1045: Could not infer the matching function for org.apache.pig.builtin.CONCAT 
> as multiple or none of them fit. Please use an explicit cast.
> {noformat}
> The error goes away if gpa is casted to chararray.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-904) Conversion from double to chararray for udf input arguments does not occur

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-904:
---

 Assignee: Alan Gates
Fix Version/s: 0.9.0

> Conversion from double to chararray for udf input arguments does not occur
> --
>
> Key: PIG-904
> URL: https://issues.apache.org/jira/browse/PIG-904
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Script showing the problem:
> {noformat}
>  "a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, 
> gpa:double); b = foreach a generate CONCAT(gpa, 'dummy'); dump b;"
> Error shown:
> 2009-08-03 17:04:27,573 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1045: Could not infer the matching function for org.apache.pig.builtin.CONCAT 
> as multiple or none of them fit. Please use an explicit cast.
> {noformat}
> The error goes away if gpa is casted to chararray.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-946) Combiner optimizer does not optimize when limit follow group, foreach

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-946:
---

 Assignee: Thejas M Nair
Fix Version/s: 0.9.0

> Combiner optimizer does not optimize when limit follow group, foreach
> -
>
> Key: PIG-946
> URL: https://issues.apache.org/jira/browse/PIG-946
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-946-codechange-draft.patch
>
>
> The following script is combinable but is not optimized:
> a = load '/user/pig/tests/data/singlefile/studenttab10k';
> b = group a by $1;
> c = foreach b generate group, AVG(a.$2);
> d = limit c 10;
> dump d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-916:
---

Assignee: Dmitriy V. Ryaboy

Dmitriy, isn't this fixed by your recent changes to HBaseStorage?

> Change the pig hbase interface to get more than one row at a time when 
> scanning
> ---
>
> Key: PIG-916
> URL: https://issues.apache.org/jira/browse/PIG-916
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alex Newman
>Assignee: Dmitriy V. Ryaboy
>Priority: Trivial
>
> It should be significantly faster to get numerous rows at the same time 
> rather than one row at a time for large table extraction processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Allow map to take non-bytearray value types.

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1016:


 Assignee: Alan Gates
Fix Version/s: 0.9.0

> Allow map to take non-bytearray value types.
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
>Assignee: Alan Gates
> Fix For: 0.9.0
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1076) Make PigOutputCommitter conform with new FileOututCommitter in hadoop trunk

2010-09-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913205#action_12913205
 ] 

Alan Gates commented on PIG-1076:
-

Why did this get abandoned?

> Make PigOutputCommitter conform with new FileOututCommitter in hadoop trunk
> ---
>
> Key: PIG-1076
> URL: https://issues.apache.org/jira/browse/PIG-1076
> Project: Pig
>  Issue Type: Improvement
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1076.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1222) cast ends up with NULL value

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1222:


 Assignee: Alan Gates
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> cast ends up with NULL value
> 
>
> Key: PIG-1222
> URL: https://issues.apache.org/jira/browse/PIG-1222
> Project: Pig
>  Issue Type: Bug
>Reporter: Ying He
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.9.0
>
>
> I want to generate data with bags, so I did this,
> take a simple text file b.txt
> 100  apple
> 200  orange
> 300  pear
> 400  apple
> then run query:
> a = load 'b.txt' as (id, f);
> b = group a by id;
> store b into 'g' using BinStorage();
> then run another query to load data generated from previous step.
> a = load 'g/part*' using BinStorage() as (id, d:bag{t:(v, s)});
> b = foreach a generate (double)id, flatten(d);
> dump b;
> then I got the following result:
> (,100,apple)
> (,100,apple)
> (,200,orange)
> (,200,apple)
> (,300,strawberry)
> (,300,pear)
> (,400,pear)
> the value for id is gone.  If there is no cast, then the result is correct.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1244) parameter syntax in scripts, add support for ${VAR} (in addition to current $VAR)

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1244:


 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0
 Priority: Minor  (was: Major)

> parameter syntax in scripts, add support for ${VAR} (in addition to current 
> $VAR)
> -
>
> Key: PIG-1244
> URL: https://issues.apache.org/jira/browse/PIG-1244
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
> Environment: all
>Reporter: Alejandro Abdelnur
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.9.0
>
>
> Currently parameter syntax in pig scripts is $VAR.
> This complicates scripts as parameter-literal concatenation is not supported. 
> For example:
> An occurrence of '$OUT_tmp' in a script resolves to a parameter 'OUT_tmp', it 
> would be desirable this to resolve to a contactenation of $OUT&_tmp
> This can be solved by supporting parameter syntax ${VAR}, so the pig parser 
> can identify the end of the parameter name.
> Adding support for ${VAR} syntax in addition of $VAR would maintain backwards 
> compatibility. Changing to syntax ${VAR} syntax will break backwards 
> compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1297) algebraic interface of udf does not get used if the foreach with udf projects column within group

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1297:


 Assignee: Thejas M Nair
Fix Version/s: 0.9.0

> algebraic interface of udf does not get used if the foreach with udf projects 
> column within group
> -
>
> Key: PIG-1297
> URL: https://issues.apache.org/jira/browse/PIG-1297
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> grunt> l = load 'file' as (a,b,c);
> grunt> g = group l by (a,b);
> grunt> f = foreach g generate SUM(l.c), group.a;
> grunt> explain f;
> ...
> ...
> #--
> # Map Reduce Plan
> #--
> MapReduce node 1-752
> Map Plan
> Local Rearrange[tuple]{tuple}(false) - 1-742
> |   |
> |   Project[bytearray][0] - 1-743
> |   |
> |   Project[bytearray][1] - 1-744
> |
> |---Load(file:///Users/tejas/pig/trunk/file:org.apache.pig.builtin.PigStorage)
>  - 1-739
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-751
> |
> |---New For Each(false,false)[bag] - 1-750
> |   |
> |   POUserFunc(org.apache.pig.builtin.SUM)[double] - 1-747
> |   |
> |   |---Project[bag][2] - 1-746
> |   |
> |   |---Project[bag][1] - 1-745
> |   |
> |   Project[bytearray][0] - 1-749
> |   |
> |   |---Project[tuple][0] - 1-748
> |
> |---Package[tuple]{tuple} - 1-741
> Global sort: false
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function

2010-09-21 Thread Daniel Dai (JIRA)
Combiner not use because optimizor inserts a foreach between group and algebric 
function


 Key: PIG-1637
 URL: https://issues.apache.org/jira/browse/PIG-1637
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script does not use combiner after new optimization change.

{code}
A = load ':INPATH:/pigmix/page_views' using 
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, 
estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)timespent as timespent, 
(double)estimated_revenue as estimated_revenue;
C = group B all; 
D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
store D into ':OUTPATH:';
{code}

This is because after group, optimizer detect group key is not used afterward, 
it add a foreach statement after C. This is how it looks like after 
optimization:
{code}
A = load ':INPATH:/pigmix/page_views' using 
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, 
estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)timespent as timespent, 
(double)estimated_revenue as estimated_revenue;
C = group B all; 
C1 = foreach C generate B;
D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
store D into ':OUTPATH:';
{code}

That cancel the combiner optimization for D. 

The way to solve the issue is to merge the C1 we inserted and D. Currently, we 
do not merge these two foreach. The reason is that one output of the first 
foreach (B) is referred twice in D, and currently rule assume after merge, we 
need to calculate B twice in D. Actually, C1 is only doing projection, no 
calculation of B. Merging C1 and D will not result calculating B twice. So C1 
and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1636) Scalar fail if the scalar variable is generated by limit

2010-09-21 Thread Daniel Dai (JIRA)
Scalar fail if the scalar variable is generated by limit


 Key: PIG-1636
 URL: https://issues.apache.org/jira/browse/PIG-1636
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script fail:
{code}
a = load 'studenttab10k' as (name: chararray, age: int, gpa: float);
b = group a all;
c = foreach b generate SUM(a.age) as total;
c1= limit c 1;
d = foreach a generate name, age/(double)c1.total as d_sum;
store d into '111';
{code}

The problem is we have a reference to c1 in d. In the optimizer, we push limit 
before foreach, d still reference to limit, and we get the wrong schema for the 
scalar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1337) Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc

2010-09-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913154#action_12913154
 ] 

Alan Gates commented on PIG-1337:
-

The problem with allowing load and store functions access to the config file is 
that the config file they see is not the config file that goes to Hadoop.  This 
is not all Pig's fault (see comments above on this).  The other problem is that 
multiple instances of the same load and store function may be operating in a 
given script, so there are namespace issues to resolve.

The proposal for Hadoop 0.22 is that rather than providing access to the config 
file at all Hadoop will serialize objects such as InputFormat and OutputFormat 
and pass those to the backend.  It will make sense for Pig to follow suit and 
serialize all UDFs on the front end.  This will remove the need for the  
UDFContext black magic that we do at the moment and should allow all UDFs to 
easily transfer information from front end to backend.

So, hopefully this can get resolved when Pig migrates to Hadoop 0.22, whenever 
that is.

> Need a way to pass distributed cache configuration information to hadoop 
> backend in Pig's LoadFunc
> --
>
> Key: PIG-1337
> URL: https://issues.apache.org/jira/browse/PIG-1337
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Chao Wang
>
> The Zebra storage layer needs to use distributed cache to reduce name node 
> load during job runs.
> To to this, Zebra needs to set up distributed cache related configuration 
> information in TableLoader (which extends Pig's LoadFunc) .
> It is doing this within getSchema(conf). The problem is that the conf object 
> here is not the one that is being serialized to map/reduce backend. As such, 
> the distributed cache is not set up properly.
> To work over this problem, we need Pig in its LoadFunc to ensure a way that 
> we can use to set up distributed cache information in a conf object, and this 
> conf object is the one used by map/reduce backend.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1371:


Fix Version/s: 0.9.0

> Pig should handle deep casting of complex types 
> 
>
> Key: PIG-1371
> URL: https://issues.apache.org/jira/browse/PIG-1371
> Project: Pig
>  Issue Type: Bug
>Reporter: Pradeep Kamath
>Assignee: Alan Gates
> Fix For: 0.9.0
>
> Attachments: PIG-1371-partial.patch
>
>
> Consider input data in BinStorage format which has a field of bag type - 
> bg:{t:(i:int)}. In the load statement if the schema specified has the type 
> for this field specified as bg:{t:(c:chararray)}, the current behavior is 
> that Pig thinks of the field to be of type specified in the load statement 
> (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to 
> bag of chararray (the user specified schema) is made.
> There are two issues currently:
> 1) The TypeCastInserter only considers the byte 'type' between the loader 
> presented schema and user specified schema to decided whether to introduce a 
> cast or not. In the above case since both schema have the type "bag" no cast 
> is inserted. This check has to be extended to consider the full FieldSchema 
> (with inner subschema) in order to decide whether a cast is needed.
> 2) POCast should be changed to handle casting a complex type to the type 
> specified the user supplied FieldSchema. Here is there is one issue to be 
> considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} 
> and the real data had only one field what should the result of the cast be:
>  * A bag with two fields - the int field and a null? - In this approach pig 
> is assuming the lone field in the data is the first field which might be 
> incorrect if it in fact is the second field.
>  * A null bag to indicate that the bag is of unknown value - this is the one 
> I personally prefer
>  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1339) International characters in column names not supported

2010-09-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1339:


Fix Version/s: 0.9.0

We should see if the new parser makes this easier and if so fix it. 

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Viraj Bhat
> Fix For: 0.9.0
>
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> ==
> Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1339) International characters in column names not supported

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1339:


Assignee: Xuefu Zhang

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Viraj Bhat
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> ==
> Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1412) Make Pig OwlLoader work with remote HDFS in secure mode

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-1412.
-

Resolution: Won't Fix

Owl is dead, thus there is no need to fix OwlLoader.

> Make Pig OwlLoader work with remote HDFS in secure mode
> ---
>
> Key: PIG-1412
> URL: https://issues.apache.org/jira/browse/PIG-1412
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>
> PIG-1403 does not address the case which LoadFunc does not expose hdfs URL to 
> Pig. One major use case is OwlLoader. We need to change OwlLoader to add 
> remote namenode to JobConf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1429:


Fix Version/s: 0.9.0

> Add Boolean Data Type to Pig
> 
>
> Key: PIG-1429
> URL: https://issues.apache.org/jira/browse/PIG-1429
> Project: Pig
>  Issue Type: New Feature
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
> Fix For: 0.9.0
>
> Attachments: working_boolean.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
> I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
> plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1479) Embed Pig in scripting languages

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1479:


 Assignee: Richard Ding
Fix Version/s: 0.9.0

> Embed Pig in scripting languages
> 
>
> Key: PIG-1479
> URL: https://issues.apache.org/jira/browse/PIG-1479
> Project: Pig
>  Issue Type: New Feature
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek-test.tar, 
> pig-greek-test.tar, pig-greek.tgz
>
>
> It should be possible to embed Pig calls in a scripting language and let 
> functions defined in the same script available as UDFs.
> This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which 
> lets users define UDFs in scripting languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1491) Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to POLocalRearrange

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1491:


Fix Version/s: 0.9.0

> Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to 
> POLocalRearrange
> 
>
> Key: PIG-1491
> URL: https://issues.apache.org/jira/browse/PIG-1491
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Scott Carey
> Fix For: 0.9.0
>
>
> I have a failure that occurs during planning while using DISTINCT in a nested 
> FOREACH. 
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizer.visitMROp(SecondaryKeyOptimizer.java:352)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:218)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:40)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1545) Secondary alias gives problem, when it has alias in the group by statement.

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1545:


 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0

This is a parser issue.  If you do the same operation in the generate it works.

> Secondary alias gives problem, when it has alias in the group by statement.
> ---
>
> Key: PIG-1545
> URL: https://issues.apache.org/jira/browse/PIG-1545
> Project: Pig
>  Issue Type: Bug
>Reporter: niraj rai
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> When I run the following script, I get the error: Could not open iterator for 
> C.
> A = LOAD '/tmp' as (a:int, b:chararray, c:int);
> B = GROUP A BY (a, b);
> C = FOREACH B { bg = A.(b,c); GENERATE group, bg; } ;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1605) Adding soft link to plan to solve input file dependency

2010-09-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1605:


Attachment: PIG-1605-1.patch

> Adding soft link to plan to solve input file dependency
> ---
>
> Key: PIG-1605
> URL: https://issues.apache.org/jira/browse/PIG-1605
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1605-1.patch
>
>
> In scalar implementation, we need to deal with implicit dependencies. 
> [PIG-1603|https://issues.apache.org/jira/browse/PIG-1603] is trying to solve 
> the problem by adding a LOScalar operator. Here is a different approach. We 
> will add a soft link to the plan, and soft link is only visible to the 
> walkers. By doing this, we can make sure we visit LOStore which generate 
> scalar first, and then LOForEach which use the scalar. All other part of the 
> logical plan does not know the existence of the soft link. The benefits are:
> 1. Logical plan do not need to deal with LOScalar, this makes logical plan 
> cleaner
> 2. Conceptually scalar dependency is different. Regular link represent a data 
> flow in pipeline. In scalar, the dependency means an operator depends on a 
> file generated by the other operator. It's different type of data dependency.
> 3. Soft link can solve other dependency problem in the future. If we 
> introduce another UDF dependent on a file generated by another operator, we 
> can use this mechanism to solve it. 
> 4. With soft link, we can use scalar come from different sources in the same 
> statement, which in my mind is not a rare use case. (eg: D = foreach C 
> generate c0/A.total, c1/B.count; )
> Currently, there are two cases we can use soft link:
> 1. scalar dependency, where ReadScalar UDF will use a file generate by a 
> LOStore
> 2. store-load dependency, where we will load a file which is generated by a 
> store in the same script. This happens in multi-store case. Currently we 
> solve it by regular link. It is better to use a soft link.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1554) PERF: create accumulative bag in RelationToExpressionProject

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1554:


 Assignee: Thejas M Nair
Fix Version/s: 0.9.0

> PERF: create accumulative bag in RelationToExpressionProject
> 
>
> Key: PIG-1554
> URL: https://issues.apache.org/jira/browse/PIG-1554
> Project: Pig
>  Issue Type: Improvement
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> In nested-foreach, RelationToExpressionProject creates a DefaultDataBag out 
> of the results of PODistinct and POSort . If the results of the plan are 
> going to be consumed by a operations that support accumulative interface such 
> as COUNT the results can be linked to a new Accumulative bag .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1576) Difference in Semantics between Load statement in Pig and HDFS client on Command line

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1576:


Fix Version/s: 0.9.0

> Difference in Semantics between Load statement in Pig and HDFS client on 
> Command line
> -
>
> Key: PIG-1576
> URL: https://issues.apache.org/jira/browse/PIG-1576
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Viraj Bhat
> Fix For: 0.9.0
>
>
> Here is my directory structure on HDFS which I want to access using Pig. 
> This is a sample, but in real use case I have more than 100 of these 
> directories.
> {code}
> $ hadoop fs -ls /user/viraj/recursive/
> Found 3 items
> drwxr-xr-x   - viraj supergroup  0 2010-08-26 11:25 
> /user/viraj/recursive/20080615
> drwxr-xr-x   - viraj supergroup  0 2010-08-26 11:25 
> /user/viraj/recursive/20080616
> drwxr-xr-x   - viraj supergroup  0 2010-08-26 11:25 
> /user/viraj/recursive/20080617
> {code}
> Using the command line I am access them using variety of options:
> {code}
> $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/
> -rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
> /user/viraj/recursive/20080617/kv2.txt
> $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/
> -rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
> /user/viraj/recursive/20080615/kv2.txt
> -rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
> /user/viraj/recursive/20080616/kv2.txt
> -rw-r--r--   1 viraj supergroup   5791 2010-08-26 11:25 
> /user/viraj/recursive/20080617/kv2.txt
> {code}
> I have written a Pig script, all the below combination of load statements do 
> not work?
> {code}
> --A = load '/user/viraj/recursive/{200806}{15..17}/' using 
> PigStorage('\u0001') as (k:int, v:chararray);
> A = load '/user/viraj/recursive/{20080615..20080617}/' using 
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> I get the following error in Pig 0.8
> {noformat}
> 2010-08-27 16:34:27,704 [main] ERROR 
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2010-08-27 16:34:27,711 [main] INFO  org.apache.pig.tools.pigstats.PigStats - 
> Script Statistics: 
> HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  
> Features
> 0.20.2  0.8.0-SNAPSHOT  viraj   2010-08-27 16:34:24 2010-08-27 16:34:27   
>   LIMIT
> Failed!
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> N/A A,ALMessage: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
> create input splits for: /user/viraj/recursive/{20080615..20080617}/
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
> at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
> Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} 
> matches 0 files
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268)
> ... 7 more
> hdfs://localhost:9000/tmp/temp241388470/tmp987803889,
> {noformat}
> The following works:
> {code}
> A = load '/user/viraj/recursive/{200806}{15,16,17}/' using 
> PigStorage('\u0001') as (k:int, v:chararray);
> AL = limit A 10;
> dump AL;
> {code}
> Why is there an inconsistency between HDFS client and Pig?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1581) Parser fails to recognize semicolons in quoted strings

2010-09-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1581:


 Assignee: Xuefu Zhang
Fix Version/s: 0.9.0

> Parser fails to recognize semicolons in quoted strings
> --
>
> Key: PIG-1581
> URL: https://issues.apache.org/jira/browse/PIG-1581
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.7.0
> Environment: CentOS 5.5
>Reporter: Christopher Hackman
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.9.0
>
>
> Within some contexts, the parser fails to treat semicolons correctly, and 
> sees them as an EOL.
> Given an input file:
> /test1.txt (in the hdfs)
> 1;a
> 2;b
> 3;c
> 4;d
> 5;e
> And the following Pig script:
> REGISTER /tmp/piggybank.jar ;
> DEFINE REGEXEXTRACTALL 
> org.apache.pig.piggybank.evaluation.string.RegexExtractAll();
> lines = LOAD '/test1.txt' AS (line:chararray);
> delimited = FOREACH lines GENERATE FLATTEN (
> REGEXEXTRACTALL(line, '^(\\d+);(\\w+)$')
> ) AS (
> digit:int,
> word:chararray
> );
> DUMP delimited;
> I receive the following error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
> Lexical error at line 5, column 40.  Encountered:  after : "\'^(d+);"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1633) Using an alias withing Nested Foreach causes indeterminate behaviour

2010-09-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913120#action_12913120
 ] 

Alan Gates commented on PIG-1633:
-

This is a design decision we made when implementing nested foreach.  Each 
expression in the generate list has its own pipeline.  This had the advantage 
that it was easy to implement.  The disadvantages are that it invokes certain 
operators (like your random function) multiple times.  This is inefficient 
performance wise.  In the case of indeterminate functions it also produces 
strange results.  We could not think of any use cases where users would have 
indeterminate functions so we did not worry about that too much.  If you have a 
real use case we would be interested.

> Using an alias withing Nested Foreach causes indeterminate behaviour
> 
>
> Key: PIG-1633
> URL: https://issues.apache.org/jira/browse/PIG-1633
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0, 0.5.0, 0.6.0, 0.7.0
>Reporter: Viraj Bhat
>
> I have created a RANDOMINT function which generates random numbers between (0 
> and specified value), For example RANDOMINT(4) gives random numbers between 0 
> and 3 (inclusive)
> {code}
> $hadoop fs -cat rand.dat
> f
> g
> h
> i
> j
> k
> l
> m
> {code}
> The pig script is as follows:
> {code}
> register math.jar;
> A = load 'rand.dat' using PigStorage() as (data);
> B = foreach A {
> r = math.RANDOMINT(4);
> generate
> data,
> r as random,
> ((r == 3)?1:0) as quarter;
> };
> dump B;
> {code}
> The results are as follows:
> {code}
> {color:red} 
> (f,0,0)
> (g,3,0)
> (h,0,0)
> (i,2,0)
> (j,3,0)
> (k,2,0)
> (l,0,1)
> (m,1,0)
> {color} 
> {code}
> If you observe, (j,3,0) is created because r is used both in the foreach and 
> generate clauses and generate different values.
> Modifying the above script to below solves the issue. The M/R jobs from both 
> scripts are the same. It is just a matter of convenience. 
> {code}
> A = load 'rand.dat' using PigStorage() as (data);
> B = foreach A generate
> data,
> math.RANDOMINT(4) as r;
> C = foreach B generate
> data,
> r,
> ((r == 3)?1:0) as quarter;
> dump C;
> {code}
> Is this issue related to PIG:747?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1634) Multiple names for the "group" field

2010-09-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913115#action_12913115
 ] 

Alan Gates commented on PIG-1634:
-

In Pig's semantics c.group, c.foo, and c.bar are all separate columns, and only 
the first one is $0.  Because the bags from the cogroup contain all columns in 
the row (not just non-key columns) foo is in a and bar in b.  

Changing something like this would be a radical shift of Pig semantics.

> Multiple names for the "group" field
> 
>
> Key: PIG-1634
> URL: https://issues.apache.org/jira/browse/PIG-1634
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0
>Reporter: Viraj Bhat
>
> I am hoping that in Pig if I type 
> {quote} c = cogroup a by foo, b by bar", the fields c.group, c.foo  and c.bar 
> should all map to c.$0 {quote} 
> This would improve the readability  of the Pig script.
> Here's a real usecase:
> {code}
> ---
> pages = LOAD 'pages.dat'  AS (url, pagerank);
> visits = LOAD 'user_log.dat'  AS (user_id, url);
> page_visits = COGROUP pages BY url, visits BY url;
> frequent_visits = FILTER page_visits BY COUNT(visits) >= 2;
> answer = FOREACH frequent_visits  GENERATE url, FLATTEN(pages.pagerank);
> ---
> {code}
> (The important part is the final GENERATE statement, which references   the 
> field "url", which was the grouping field in the earlier COGROUP.)  To get it 
>  to work I have to write it in a less intuitive way.
> Maybe with the new parser changes in Pig 0.9 it would be easier to specify 
> that.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1635:


Fix Version/s: 0.8.0

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1531) Pig gobbles up error messages

2010-09-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913048#action_12913048
 ] 

Ashutosh Chauhan commented on PIG-1531:
---

Oh Hudson, oh well...

Ran the full suite of 400 minutes of unit tests; all passed. Patch is ready for 
review.

> Pig gobbles up error messages
> -
>
> Key: PIG-1531
> URL: https://issues.apache.org/jira/browse/PIG-1531
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: pig-1531_3.patch, pig-1531_4.patch, PIG_1531.patch, 
> PIG_1531_2.patch
>
>
> Consider the following. I have my own Storer implementing StoreFunc and I am 
> throwing FrontEndException (and other Exceptions derived from PigException) 
> in its various methods. I expect those error messages to be shown in error 
> scenarios. Instead Pig gobbles up my error messages and shows its own generic 
> error message like: 
> {code}
> 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2116: Unexpected error. Could not validate the output specification for: 
> default.partitoned
> Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log
> {code}
> Instead I expect it to display my error messages which it stores away in that 
> log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913036#action_12913036
 ] 

Yan Zhou commented on PIG-1635:
---

This is regarding a new feature (PIG-1399) added for 0.8.

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1635:
--

Affects Version/s: 0.8.0

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-21 Thread Yan Zhou (JIRA)
Logical simplifier does not simplify away constants under AND and OR; after 
simplificaion the ordering of operands of AND and OR may get changed


 Key: PIG-1635
 URL: https://issues.apache.org/jira/browse/PIG-1635
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor


b = FILTER a by (( f1 > 1) AND (1 == 1))

or 

b = FILTER a by ((f1 > 1) OR ( 1==0))

should be simplified to

b = FILTER a by f1 > 1;

Regarding ordering change, an example is that 

b = filter a by ((f1 is not null) AND (f2 is not null));

Even without possible simplification, the expression is changed to

b = filter a by ((f2 is not null) AND (f1 is not null));

Even though the ordering change in this case, and probably in most other cases, 
does not create any difference, but for two reasons some users might care about 
the ordering: if stateful UDFs are used as operands of AND or OR; and if the 
ordering is intended by the application designer to maximize the chances to 
shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'

2010-09-21 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913029#action_12913029
 ] 

Yan Zhou commented on PIG-1628:
---

+1. Patch looks good.

> log this message at debug level : 'Pig Internal storage in use'
> ---
>
> Key: PIG-1628
> URL: https://issues.apache.org/jira/browse/PIG-1628
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1628.1.patch
>
>
> The temporary storage functions used are logging at the INFO level. This 
> should change to debug level, they are reducing the visibility of more useful 
> INFO messages. The messages include  'Pig Internal storage in use' from 
> InterStorage and  'TFile storage in use' from TFileStorage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'

2010-09-21 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1628:
---

Attachment: PIG-1628.1.patch

Patch passes unit tests and test-patch. Ready for review.


> log this message at debug level : 'Pig Internal storage in use'
> ---
>
> Key: PIG-1628
> URL: https://issues.apache.org/jira/browse/PIG-1628
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1628.1.patch
>
>
> The temporary storage functions used are logging at the INFO level. This 
> should change to debug level, they are reducing the visibility of more useful 
> INFO messages. The messages include  'Pig Internal storage in use' from 
> InterStorage and  'TFile storage in use' from TFileStorage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'

2010-09-21 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1628:
---

Status: Patch Available  (was: Open)

> log this message at debug level : 'Pig Internal storage in use'
> ---
>
> Key: PIG-1628
> URL: https://issues.apache.org/jira/browse/PIG-1628
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1628.1.patch
>
>
> The temporary storage functions used are logging at the INFO level. This 
> should change to debug level, they are reducing the visibility of more useful 
> INFO messages. The messages include  'Pig Internal storage in use' from 
> InterStorage and  'TFile storage in use' from TFileStorage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.