[jira] [Commented] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case

2011-05-12 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032803#comment-13032803
 ] 

Viraj Bhat commented on PIG-2069:
-

Daniel is this an issue since Multiquery optimization was introduced (0.6, 
0.7), or is this specific to Pig 0.8 and 0.9.

> LoadFunc jar does not ship to backend in MultiQuery case
> 
>
> Key: PIG-2069
> URL: https://issues.apache.org/jira/browse/PIG-2069
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
> Fix For: 0.9.0
>
>
> Pig is able to automatically figure out the jar containing the LoadFunc and 
> ship them to backend. However, the following script didn't:
> {code}
> A = load '1.txt' using SomeLoadFunc();
> B = filter A by $0==0;
> C = filter A by $1==1;
> D = join B by $0, C by $0;
> dump D;
> {code}
> The reason is this query is a multiquery (A is reused and thus create an 
> implicit split). When we merge multiquery into one job, we didn't merge udfs 
> list properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2067:


Attachment: PIG-2067-1-0.8.patch

PIG-2067-1-0.8.patch is for 0.8 branch.

> FilterLogicExpressionSimplifier removed some branches in some cases
> ---
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
> Attachments: PIG-2067-1-0.8.patch, PIG-2067-1.patch
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND COUNT(A)>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case

2011-05-12 Thread Daniel Dai (JIRA)
LoadFunc jar does not ship to backend in MultiQuery case


 Key: PIG-2069
 URL: https://issues.apache.org/jira/browse/PIG-2069
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Daniel Dai
 Fix For: 0.9.0


Pig is able to automatically figure out the jar containing the LoadFunc and 
ship them to backend. However, the following script didn't:
{code}
A = load '1.txt' using SomeLoadFunc();
B = filter A by $0==0;
C = filter A by $1==1;
D = join B by $0, C by $0;
dump D;
{code}

The reason is this query is a multiquery (A is reused and thus create an 
implicit split). When we merge multiquery into one job, we didn't merge udfs 
list properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2055) inconsistentcy behavior in parser generated during build

2011-05-12 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032783#comment-13032783
 ] 

Koji Noguchi commented on PIG-2055:
---

I hit this as well on my macbook.  It drove me crazy.
Using antlr-3.3 (instead of 3.2) seems to have fixed it for me.

> inconsistentcy behavior in parser generated during build 
> -
>
> Key: PIG-2055
> URL: https://issues.apache.org/jira/browse/PIG-2055
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>
> On certain builds, i see that pig fails to support this syntax -
> {code}
> grunt> l = load 'x' using PigStorage(':');   
> 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched input '(' expecting SEMI_COLON
> Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log
> {code}
> I seem to be the only one who has seen this behavior, and I have seen on 
> occassion when I build on mac. It could be problem with antlr and apple jvm 
> interaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032774#comment-13032774
 ] 

Daniel Dai commented on PIG-2067:
-

This issue happens when:
1. We have AND in filter plan
2. Two branch of AND is the same UDF, but the input for the UDF is different

LogicExpressionSimplifier will erroneously believe two branches are the same 
and merge them.

> FilterLogicExpressionSimplifier removed some branches in some cases
> ---
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
> Attachments: PIG-2067-1.patch
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND COUNT(A)>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2044) Patten match bug in org.apache.pig.newplan.optimizer.Rule

2011-05-12 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-2044:
--

Attachment: PIG-2044-00.patch

Taking out 'break' statement which made the for-loop meaningless.  Added one 
test. 

> Patten match bug in org.apache.pig.newplan.optimizer.Rule
> -
>
> Key: PIG-2044
> URL: https://issues.apache.org/jira/browse/PIG-2044
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Koji Noguchi
> Fix For: 0.10
>
> Attachments: PIG-2044-00.patch
>
>
> Koji find that we have a bug org.apache.pig.newplan.optimizer.Rule. The 
> "break" in line 179 seems to be wrong. This multiple branch matching is not 
> used in Pig, but could be a problem for the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2067:


Attachment: PIG-2067-1.patch

> FilterLogicExpressionSimplifier removed some branches in some cases
> ---
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
> Attachments: PIG-2067-1.patch
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND COUNT(A)>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk-commit #808

2011-05-12 Thread Apache Jenkins Server
See 

Changes:

[thejas] PIG-1938: support project-range as udf argument

--
[...truncated 5562 lines...]
  [javadoc] Loading source files for package 
org.apache.pig.backend.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.executionengine...
  [javadoc] Loading source files for package org.apache.pig.backend.hadoop...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.regex...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.hbase...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.streaming...
  [javadoc] Loading source files for package org.apache.pig.builtin...
  [javadoc] Loading source files for package org.apache.pig.classification...
  [javadoc] Loading source files for package org.apache.pig.data...
  [javadoc] Loading source files for package org.apache.pig.impl...
  [javadoc] Loading source files for package org.apache.pig.impl.builtin...
  [javadoc] Loading source files for package org.apache.pig.impl.io...
  [javadoc] Loading source files for package org.apache.pig.impl.logicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.schema...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.validators...
  [javadoc] Loading source files for package org.apache.pig.impl.plan...
  [javadoc] Loading source files for package 
org.apache.pig.impl.plan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package org.apache.pig.tools.streams...
  [javadoc] Loading source files for package org.apache.pig.tools.timer...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_23
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...

javadoc-jar:
  [jar] Building jar: 


[jira] [Resolved] (PIG-2063) Regression: an invalid query regarding union onschema becoming valid

2011-05-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-2063.


Resolution: Invalid

The behavior of union-onschema has changed as a result of PIG-1536. I have 
created PIG-2068 to update the documentation.


> Regression: an invalid query regarding union onschema becoming valid
> 
>
> Key: PIG-2063
> URL: https://issues.apache.org/jira/browse/PIG-2063
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> The following query fails in 0.8:
> A = load 'x' as (x:long, y:chararray);
> B = load 'y' as (x:long, y:float);
> C = union onschema A, B;
> grunt> C = union onschema A, B;
> 2011-05-12 09:01:47,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatible types for merging schemas. Field schema: y: chararray 
> Other field schema: y: float
> However, in 0.9 validation doesn't catch the error. It seems float is cast to 
> chararray automatically.
> grunt> describe C;
> C: {x: long,y: chararray}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2068) documentation of union onschema restrictions need to to be updated

2011-05-12 Thread Thejas M Nair (JIRA)
documentation of union onschema restrictions need to to be updated
--

 Key: PIG-2068
 URL: https://issues.apache.org/jira/browse/PIG-2068
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Corinne Chandel
 Fix For: 0.9.0


The following requirement mentioned under union-onschema section is no longer 
applicable -
{verbatim}
The data type for columns with same name in different input schemas should be 
compatible:

Numeric types are compatible, and if column having same name in different 
input schemas have different numeric types, an implicit conversion will happen.
Bytearray type is considered compatible with all other types, a cast will 
be added to convert to other type.
Bags or tuples having different inner schema are considered incompatible.
{verbatim}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2068) documentation of union onschema restrictions need to to be updated

2011-05-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032766#comment-13032766
 ] 

Thejas M Nair commented on PIG-2068:


I mean, in 0.9 such a restriction is not there for union onschema. This is a 
result of the changes in PIG-1536.


> documentation of union onschema restrictions need to to be updated
> --
>
> Key: PIG-2068
> URL: https://issues.apache.org/jira/browse/PIG-2068
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
>
> The following requirement mentioned under union-onschema section is no longer 
> applicable -
> {code}
> The data type for columns with same name in different input schemas should be 
> compatible:
> Numeric types are compatible, and if column having same name in different 
> input schemas have different numeric types, an implicit conversion will 
> happen.
> Bytearray type is considered compatible with all other types, a cast will 
> be added to convert to other type.
> Bags or tuples having different inner schema are considered incompatible.
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2068) documentation of union onschema restrictions need to to be updated

2011-05-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2068:
---

Description: 
The following requirement mentioned under union-onschema section is no longer 
applicable -
{code}
The data type for columns with same name in different input schemas should be 
compatible:

Numeric types are compatible, and if column having same name in different 
input schemas have different numeric types, an implicit conversion will happen.
Bytearray type is considered compatible with all other types, a cast will 
be added to convert to other type.
Bags or tuples having different inner schema are considered incompatible.
{code}

  was:
The following requirement mentioned under union-onschema section is no longer 
applicable -
{verbatim}
The data type for columns with same name in different input schemas should be 
compatible:

Numeric types are compatible, and if column having same name in different 
input schemas have different numeric types, an implicit conversion will happen.
Bytearray type is considered compatible with all other types, a cast will 
be added to convert to other type.
Bags or tuples having different inner schema are considered incompatible.
{verbatim}


> documentation of union onschema restrictions need to to be updated
> --
>
> Key: PIG-2068
> URL: https://issues.apache.org/jira/browse/PIG-2068
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Corinne Chandel
> Fix For: 0.9.0
>
>
> The following requirement mentioned under union-onschema section is no longer 
> applicable -
> {code}
> The data type for columns with same name in different input schemas should be 
> compatible:
> Numeric types are compatible, and if column having same name in different 
> input schemas have different numeric types, an implicit conversion will 
> happen.
> Bytearray type is considered compatible with all other types, a cast will 
> be added to convert to other type.
> Bags or tuples having different inner schema are considered incompatible.
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2067:


Description: 
The following script produce wrong result:
{code}
A = load 'a.dat' as (cookie);
B = load 'b.dat' as (cookie);
C = cogroup A by cookie, B by cookie;
E = filter C by COUNT(B)>0 AND COUNT(A)>0;
explain E;
{code}

a.dat:
1   1
2   2
3   3
4   4
5   5
6   6
7   7

b.dat:
3   3
4   4
5   5
6   6
7   7
8   8

Expected output:
(3,{(3)},{(3)})
(4,{(4)},{(4)})
(5,{(5)},{(5)})
(6,{(6)},{(6)})
(7,{(7)},{(7)})

We get:
(3,{(3)},{(3)})
(4,{(4)},{(4)})
(5,{(5)},{(5)})
(6,{(6)},{(6)})
(7,{(7)},{(7)})
(8,{},{(8)})


  was:
The following script produce wrong result:
{code}
A = load 'a.dat' as (cookie);
B = load 'b.dat' as (cookie);
C = cogroup A by cookie, B by cookie;
E = filter C by COUNT(B)>0 AND group>0;
explain E;
{code}

a.dat:
1   1
2   2
3   3
4   4
5   5
6   6
7   7

b.dat:
3   3
4   4
5   5
6   6
7   7
8   8

Expected output:
(3,{(3)},{(3)})
(4,{(4)},{(4)})
(5,{(5)},{(5)})
(6,{(6)},{(6)})
(7,{(7)},{(7)})

We get:
(3,{(3)},{(3)})
(4,{(4)},{(4)})
(5,{(5)},{(5)})
(6,{(6)},{(6)})
(7,{(7)},{(7)})
(8,{},{(8)})


Summary: FilterLogicExpressionSimplifier removed some branches in some 
cases  (was: FilterLogicExpressionSimplifier mess up uid in some cases)

> FilterLogicExpressionSimplifier removed some branches in some cases
> ---
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND COUNT(A)>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032739#comment-13032739
 ] 

Daniel Dai commented on PIG-2067:
-

Actually it erroneously remove one branch:
{code}
#---
# New Logical Plan:
#---
E: (Name: LOStore Schema: 
group#30:bytearray,A#31:bag{#242:tuple(cookie#10:bytearray)},B#33:bag{#243:tuple(cookie#11:bytearray)})
|
|---E: (Name: LOFilter Schema: 
group#30:bytearray,A#31:bag{#242:tuple(cookie#10:bytearray)},B#33:bag{#243:tuple(cookie#11:bytearray)})
|   |
|   (Name: GreaterThan Type: boolean Uid: 38)
|   |
|   |---(Name: UserFunc(org.apache.pig.builtin.COUNT) Type: long Uid: 35)
|   |   |
|   |   |---B:(Name: Project Type: bag Uid: 33 Input: 0 Column: 2)
|   |
|   |---(Name: Cast Type: long Uid: 36)
|   |
|   |---(Name: Constant Type: int Uid: 36)
|
|---C: (Name: LOCogroup Schema: 
group#30:bytearray,A#31:bag{#242:tuple(cookie#10:bytearray)},B#33:bag{#243:tuple(cookie#11:bytearray)})
|   |
|   cookie:(Name: Project Type: bytearray Uid: 10 Input: 0 Column: 0)
|   |
|   cookie:(Name: Project Type: bytearray Uid: 11 Input: 1 Column: 0)
|
|---A: (Name: LOLoad Schema: cookie#10:bytearray)RequiredFields:null
|
|---B: (Name: LOLoad Schema: cookie#11:bytearray)RequiredFields:null

{code}

> FilterLogicExpressionSimplifier removed some branches in some cases
> ---
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND COUNT(A)>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier mess up uid in some cases

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032723#comment-13032723
 ] 

Daniel Dai commented on PIG-2067:
-

The format mess up:
Here is the logical plan:
{code}
E: (Name: LOStore Schema: 
group#29:bytearray,A#30:bag{#240:tuple(cookie#10:bytearray)},B#32:bag{#241:tuple(cookie#11:bytearray)})
|
|---E: (Name: LOFilter Schema: 
group#29:bytearray,A#30:bag{#240:tuple(cookie#10:bytearray)},B#32:bag{#241:tuple(cookie#11:bytearray)})
|   |
|   (Name: And Type: boolean Uid: 41)
|   |
|   |---(Name: GreaterThan Type: boolean Uid: 37)
|   |   |
|   |   |---(Name: UserFunc(org.apache.pig.builtin.COUNT) Type: long Uid: 
34)
|   |   |   |
|   |   |   |---B:(Name: Project Type: bag Uid: 32 Input: 0 Column: 2)
|   |   |
|   |   |---(Name: Cast Type: long Uid: 35)
|   |   |
|   |   |---(Name: Constant Type: int Uid: 35)
|   |
|   |---(Name: GreaterThan Type: boolean Uid: 40)
|   |
|   |---(Name: Cast Type: int Uid: 29)
|   |   |
|   |   |---group:(Name: Project Type: bytearray Uid: 29 Input: 0 
Column: 0)
|   |
|   |---(Name: Constant Type: int Uid: 39)
|
|---C: (Name: LOCogroup Schema: 
group#29:bytearray,A#30:bag{#240:tuple(cookie#10:bytearray)},B#32:bag{#241:tuple(cookie#11:bytearray)})
|   |
|   cookie:(Name: Project Type: bytearray Uid: 10 Input: 0 Column: 0)
|   |
|   cookie:(Name: Project Type: bytearray Uid: 11 Input: 1 Column: 0)
|
|---A: (Name: LOLoad Schema: cookie#10:bytearray)RequiredFields:null
|
|---B: (Name: LOLoad Schema: cookie#11:bytearray)RequiredFields:null

{code}

One branch of GreaterThan is on group rather than A.

> FilterLogicExpressionSimplifier mess up uid in some cases
> -
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND group>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier mess up uid in some cases

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032722#comment-13032722
 ] 

Daniel Dai commented on PIG-2067:
-

The logical plan after LogicExpressionSimplifier is wrong:

E: (Name: LOStore Schema: 
group#29:bytearray,A#30:bag{#240:tuple(cookie#10:bytearray)},B#32:bag{#241:tuple(cookie#11:bytearray)})
|
|---E: (Name: LOFilter Schema: 
group#29:bytearray,A#30:bag{#240:tuple(cookie#10:bytearray)},B#32:bag{#241:tuple(cookie#11:bytearray)})
|   |
|   (Name: And Type: boolean Uid: 41)
|   |
|   |---(Name: GreaterThan Type: boolean Uid: 37)
|   |   |
|   |   |---(Name: UserFunc(org.apache.pig.builtin.COUNT) Type: long Uid: 
34)
|   |   |   |
|   |   |   |---B:(Name: Project Type: bag Uid: 32 Input: 0 Column: 2)
|   |   |
|   |   |---(Name: Cast Type: long Uid: 35)
|   |   |
|   |   |---(Name: Constant Type: int Uid: 35)
|   |
|   |---(Name: GreaterThan Type: boolean Uid: 40)
|   |
|   |---(Name: Cast Type: int Uid: 29)
|   |   |
|   |   |---group:(Name: Project Type: bytearray Uid: 29 Input: 0 
Column: 0)
|   |
|   |---(Name: Constant Type: int Uid: 39)
|
|---C: (Name: LOCogroup Schema: 
group#29:bytearray,A#30:bag{#240:tuple(cookie#10:bytearray)},B#32:bag{#241:tuple(cookie#11:bytearray)})
|   |
|   cookie:(Name: Project Type: bytearray Uid: 10 Input: 0 Column: 0)
|   |
|   cookie:(Name: Project Type: bytearray Uid: 11 Input: 1 Column: 0)
|
|---A: (Name: LOLoad Schema: cookie#10:bytearray)RequiredFields:null
|
|---B: (Name: LOLoad Schema: cookie#11:bytearray)RequiredFields:null

One branch of GreaterThan is on group rather than A.

> FilterLogicExpressionSimplifier mess up uid in some cases
> -
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND group>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2067) FilterLogicExpressionSimplifier mess up uid in some cases

2011-05-12 Thread Daniel Dai (JIRA)
FilterLogicExpressionSimplifier mess up uid in some cases
-

 Key: PIG-2067
 URL: https://issues.apache.org/jira/browse/PIG-2067
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.1, 0.9.0


The following script produce wrong result:
{code}
A = load 'a.dat' as (cookie);
B = load 'b.dat' as (cookie);
C = cogroup A by cookie, B by cookie;
E = filter C by COUNT(B)>0 AND group>0;
explain E;
{code}

a.dat:
1   1
2   2
3   3
4   4
5   5
6   6
7   7

b.dat:
3   3
4   4
5   5
6   6
7   7
8   8

Expected output:
(3,{(3)},{(3)})
(4,{(4)},{(4)})
(5,{(5)},{(5)})
(6,{(6)},{(6)})
(7,{(7)},{(7)})

We get:
(3,{(3)},{(3)})
(4,{(4)},{(4)})
(5,{(5)},{(5)})
(6,{(6)},{(6)})
(7,{(7)},{(7)})
(8,{},{(8)})


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032711#comment-13032711
 ] 

Daniel Dai commented on PIG-2059:
-

+1

> PIG doesn't validate incomplete query in batch mode even if -c option is given
> --
>
> Key: PIG-2059
> URL: https://issues.apache.org/jira/browse/PIG-2059
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2059-2.patch, PIG-2059.patch
>
>
> Given the following in a file to Pig, pig doesn't report any error, even if 
> -c option is given:
> A = load 'x' as (u, v);
> B = foreach A generate $3;
> It's questionable whether to validate the query in batch mode as it doesn't 
> contain any store/dump statement. However, if -c option is given, validation 
> should be nevertheless performed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2059) PIG doesn't validate incomplete query in batch mode even if -c option is given

2011-05-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-2059:
-

Attachment: PIG-2059-2.patch

Previous patch has some undesirable side-effect. Also included in the patch is 
fixes for some test cases.

> PIG doesn't validate incomplete query in batch mode even if -c option is given
> --
>
> Key: PIG-2059
> URL: https://issues.apache.org/jira/browse/PIG-2059
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2059-2.patch, PIG-2059.patch
>
>
> Given the following in a file to Pig, pig doesn't report any error, even if 
> -c option is given:
> A = load 'x' as (u, v);
> B = foreach A generate $3;
> It's questionable whether to validate the query in batch mode as it doesn't 
> contain any store/dump statement. However, if -c option is given, validation 
> should be nevertheless performed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1938) support project-range as udf argument

2011-05-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1938.


Resolution: Fixed

Patch committed to trunk and 0.9 branch.


> support project-range as udf argument
> -
>
> Key: PIG-1938
> URL: https://issues.apache.org/jira/browse/PIG-1938
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1938.1.patch, PIG-1938.2.patch
>
>
> With changes in PIG-1693, project-range ('..') is supported in all use cases 
> where '*' (project-star) is supported, except as udf argument. 
> To be consistent with usage of project-star, project-range should be 
> supported as udf argument as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1938) support project-range as udf argument

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032675#comment-13032675
 ] 

Daniel Dai commented on PIG-1938:
-

+1

> support project-range as udf argument
> -
>
> Key: PIG-1938
> URL: https://issues.apache.org/jira/browse/PIG-1938
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1938.1.patch, PIG-1938.2.patch
>
>
> With changes in PIG-1693, project-range ('..') is supported in all use cases 
> where '*' (project-star) is supported, except as udf argument. 
> To be consistent with usage of project-star, project-range should be 
> supported as udf argument as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk-commit #807

2011-05-12 Thread Apache Jenkins Server
See 

Changes:

[gates] PIG-2048: Add zookeeper to pig jar

--
[...truncated 5557 lines...]
  [javadoc] Loading source files for package 
org.apache.pig.backend.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.executionengine...
  [javadoc] Loading source files for package org.apache.pig.backend.hadoop...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.regex...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.hbase...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.streaming...
  [javadoc] Loading source files for package org.apache.pig.builtin...
  [javadoc] Loading source files for package org.apache.pig.classification...
  [javadoc] Loading source files for package org.apache.pig.data...
  [javadoc] Loading source files for package org.apache.pig.impl...
  [javadoc] Loading source files for package org.apache.pig.impl.builtin...
  [javadoc] Loading source files for package org.apache.pig.impl.io...
  [javadoc] Loading source files for package org.apache.pig.impl.logicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.schema...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.validators...
  [javadoc] Loading source files for package org.apache.pig.impl.plan...
  [javadoc] Loading source files for package 
org.apache.pig.impl.plan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package org.apache.pig.tools.streams...
  [javadoc] Loading source files for package org.apache.pig.tools.timer...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_23
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...

javadoc-jar:
  [jar] Building jar: 


[jira] [Updated] (PIG-1938) support project-range as udf argument

2011-05-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1938:
---

Attachment: PIG-1938.2.patch

PIG-1938.2.patch  - Patch with additional test cases for statements other than 
foreach.


> support project-range as udf argument
> -
>
> Key: PIG-1938
> URL: https://issues.apache.org/jira/browse/PIG-1938
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1938.1.patch, PIG-1938.2.patch
>
>
> With changes in PIG-1693, project-range ('..') is supported in all use cases 
> where '*' (project-star) is supported, except as udf argument. 
> To be consistent with usage of project-star, project-range should be 
> supported as udf argument as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2048) Add zookeeper to pig jar

2011-05-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-2048:
---

Assignee: Greg Bowyer

> Add zookeeper to pig jar
> 
>
> Key: PIG-2048
> URL: https://issues.apache.org/jira/browse/PIG-2048
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
>Reporter: Greg Bowyer
>Assignee: Greg Bowyer
>  Labels: hbase, zookeeper
> Fix For: 0.10
>
> Attachments: 
> 0001-Added-zookeeper-to-the-pig-phat-jar-to-allow-HBaseSt.patch
>
>
> Currently the pig jar does not bundle zookeeper in the same fashion as hbase, 
> guava etc. This means that it is unable to run the HBaseStorage on recent 
> versions of HBase. Attached is a patch that fixes this 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2048) Add zookeeper to pig jar

2011-05-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-2048.
-

   Resolution: Fixed
Fix Version/s: 0.10

Patch checked in.  Thanks Greg.

> Add zookeeper to pig jar
> 
>
> Key: PIG-2048
> URL: https://issues.apache.org/jira/browse/PIG-2048
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
>Reporter: Greg Bowyer
>Assignee: Greg Bowyer
>  Labels: hbase, zookeeper
> Fix For: 0.10
>
> Attachments: 
> 0001-Added-zookeeper-to-the-pig-phat-jar-to-allow-HBaseSt.patch
>
>
> Currently the pig jar does not bundle zookeeper in the same fashion as hbase, 
> guava etc. This means that it is unable to run the HBaseStorage on recent 
> versions of HBase. Attached is a patch that fixes this 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk-commit #806

2011-05-12 Thread Apache Jenkins Server
See 

Changes:

[rding] PIG-2056: Jython error messages should show script name

--
[...truncated 5559 lines...]
  [javadoc] Loading source files for package 
org.apache.pig.backend.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.executionengine...
  [javadoc] Loading source files for package org.apache.pig.backend.hadoop...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.regex...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.hbase...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.streaming...
  [javadoc] Loading source files for package org.apache.pig.builtin...
  [javadoc] Loading source files for package org.apache.pig.classification...
  [javadoc] Loading source files for package org.apache.pig.data...
  [javadoc] Loading source files for package org.apache.pig.impl...
  [javadoc] Loading source files for package org.apache.pig.impl.builtin...
  [javadoc] Loading source files for package org.apache.pig.impl.io...
  [javadoc] Loading source files for package org.apache.pig.impl.logicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.schema...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.validators...
  [javadoc] Loading source files for package org.apache.pig.impl.plan...
  [javadoc] Loading source files for package 
org.apache.pig.impl.plan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package org.apache.pig.tools.streams...
  [javadoc] Loading source files for package org.apache.pig.tools.timer...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_23
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...

javadoc-jar:
  [jar] Building jar: 


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032594#comment-13032594
 ] 

Alan Gates commented on PIG-1824:
-

Woody, 

This patch now conflicts with the changes that were checked in as part of 
PIG-2056.  I don't understand how to resolve the conflicts.  You could upload a 
new patch or just tell me how to do the resolution so I can continue testing.

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, 
> 1824d.patch
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2064) Link for old releases on site is stale

2011-05-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2064:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  I also did 'svn up' on people.apache.org to update the 
website.

> Link for old releases on site is stale
> --
>
> Key: PIG-2064
> URL: https://issues.apache.org/jira/browse/PIG-2064
> Project: Pig
>  Issue Type: Bug
>  Components: site
>Affects Versions: site
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: site
>
> Attachments: PIG-2064.patch
>
>
> Hadoop cleaned up all of the old release artifacts for former subprojects.  
> Our site still points to Hadoop for 0.7 and previous releases (since we were 
> a subproject then).  We need to update the link to point to the archives 
> where the releases still are accessible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2066) Accumulators should be able to early-terminate

2011-05-12 Thread Dmitriy V. Ryaboy (JIRA)
Accumulators should be able to early-terminate
--

 Key: PIG-2066
 URL: https://issues.apache.org/jira/browse/PIG-2066
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy


Accumulators are currently forced to process the whole bag; getValue() is 
called at the very end.

Early termination is a handy feature to be able to use (for IsEmpty, for 
example).

We can add this as a new interface extending Accumulator.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2065) IsEmpty should be Accumulative

2011-05-12 Thread Dmitriy V. Ryaboy (JIRA)
IsEmpty should be Accumulative
--

 Key: PIG-2065
 URL: https://issues.apache.org/jira/browse/PIG-2065
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy


IsEmpty is trivial to implement as an accumulator. We should do that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2064) Link for old releases on site is stale

2011-05-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2064:


Status: Patch Available  (was: Open)

> Link for old releases on site is stale
> --
>
> Key: PIG-2064
> URL: https://issues.apache.org/jira/browse/PIG-2064
> Project: Pig
>  Issue Type: Bug
>  Components: site
>Affects Versions: site
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: site
>
> Attachments: PIG-2064.patch
>
>
> Hadoop cleaned up all of the old release artifacts for former subprojects.  
> Our site still points to Hadoop for 0.7 and previous releases (since we were 
> a subproject then).  We need to update the link to point to the archives 
> where the releases still are accessible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2064) Link for old releases on site is stale

2011-05-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2064:


Attachment: PIG-2064.patch

> Link for old releases on site is stale
> --
>
> Key: PIG-2064
> URL: https://issues.apache.org/jira/browse/PIG-2064
> Project: Pig
>  Issue Type: Bug
>  Components: site
>Affects Versions: site
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: site
>
> Attachments: PIG-2064.patch
>
>
> Hadoop cleaned up all of the old release artifacts for former subprojects.  
> Our site still points to Hadoop for 0.7 and previous releases (since we were 
> a subproject then).  We need to update the link to point to the archives 
> where the releases still are accessible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Set difference in Pig

2011-05-12 Thread Dmitriy Ryaboy
Use an outer join and filter by null instead of using a cogroup (you
don't need to realize the bags just to flatten them back out).

On Thu, May 12, 2011 at 9:14 AM,   wrote:
> I saw this somewhere. 'Anti-join' doesn't seem very descriptive to me, but 
> that is what it was called.
>
>
> Anti-join (set difference) idiom in pig:
> A = load 'input1' as (x, y);
> B = load 'input2' as (u, v);
> C = cogroup A by x, B by u;
> D = filter C by IsEmpty(B);
> E = foreach D generate flatten(A);
>
>
> William F Dowling
> Sr Technical Specialist, Software Engineering
> Thomson Reuters
> 0 +1 215 823 3853
>
>
> -Original Message-
> From: Deepak Singh [mailto:sin...@yahoo-inc.com]
> Sent: Wednesday, May 11, 2011 9:43 PM
> To: u...@pig.apache.org; dev@pig.apache.org
> Subject: Set difference in Pig
>
> Hi,
>   Can we do set difference in pig ?
>
>  The set difference  is defined by:
>  A-B = {x: x element of A and x is not element of B }
>
>
> Thanks
> Deepak
>
>


[jira] [Created] (PIG-2064) Link for old releases on site is stale

2011-05-12 Thread Alan Gates (JIRA)
Link for old releases on site is stale
--

 Key: PIG-2064
 URL: https://issues.apache.org/jira/browse/PIG-2064
 Project: Pig
  Issue Type: Bug
  Components: site
Affects Versions: site
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: site


Hadoop cleaned up all of the old release artifacts for former subprojects.  Our 
site still points to Hadoop for 0.7 and previous releases (since we were a 
subproject then).  We need to update the link to point to the archives where 
the releases still are accessible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 0.7 download mirror sites not working

2011-05-12 Thread Alan Gates
Hadoop has removed the release artifacts of its former subprojects  
(including Pig) from the mirrors.  You can still find the release in  
Apache's archive:  http://archive.apache.org/dist/hadoop/pig/pig-0.7.0/


Alan.

On May 12, 2011, at 9:16 AM, Subhramanian, Deepak wrote:


The mirror sites I am trying to use are in the link below.

http://www.apache.org/dyn/closer.cgi/hadoop/pig



On 12 May 2011 17:14, Subhramanian, Deepak <
deepak.subhraman...@newsint.co.uk> wrote:

I am trying to download pig 0.7 versions from the mirror sites. But  
none of

the mirror sites working. Any suggestions.  ?

Thanks , Deepak





--
Deepak Subhramanian
Data & Analytics
News International, Digital Technology
Email: deepak.subhraman...@newsint.co.uk 

--
"Please consider the environment before printing this e-mail"

The Newspaper Marketing Agency: Opening Up Newspapers:
www.nmauk.co.uk

This e-mail and any attachments are confidential, may be legally  
privileged and are the property of
News International Limited (which is the holding company for the  
News International group, is
registered in England under number 81701 and whose registered office  
is 3 Thomas More Square,
London E98 1XY, VAT number GB 243 8054 69), on whose systems they  
were generated.


If you have received this e-mail in error, please notify the sender  
immediately and do not use,
distribute, store or copy it in any way. Statements or opinions in  
this e-mail or any attachment are
those of the author and are not necessarily agreed or authorised by  
News International Limited or
any member of its group. News International Limited may monitor  
outgoing or incoming emails as
permitted by law. It accepts no liability for viruses introduced by  
this e-mail or attachments.




[jira] [Resolved] (PIG-2056) Jython error messages should show script name

2011-05-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2056.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass. Patch committed to trunk and 0.9 branch.

> Jython error messages should show script name
> -
>
> Key: PIG-2056
> URL: https://issues.apache.org/jira/browse/PIG-2056
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2056.patch
>
>
> Instead of messages like
> {code}
> Traceback (most recent call last):
>   File "", line 12, in 
> {code}
> It should display the script file name:
> {code}
> Traceback (most recent call last):
>   File "test.py", line 12, in 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2063) Regression: an invalid query regarding union onschema becoming valid

2011-05-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032522#comment-13032522
 ] 

Xuefu Zhang commented on PIG-2063:
--

The same problem occurs for cases where a column is a bag or tuple with 
different columns. Refer to 
TestUnionOnSchema.testUnionOnSchemaIncompatibleTypes() for examples.

> Regression: an invalid query regarding union onschema becoming valid
> 
>
> Key: PIG-2063
> URL: https://issues.apache.org/jira/browse/PIG-2063
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> The following query fails in 0.8:
> A = load 'x' as (x:long, y:chararray);
> B = load 'y' as (x:long, y:float);
> C = union onschema A, B;
> grunt> C = union onschema A, B;
> 2011-05-12 09:01:47,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatible types for merging schemas. Field schema: y: chararray 
> Other field schema: y: float
> However, in 0.9 validation doesn't catch the error. It seems float is cast to 
> chararray automatically.
> grunt> describe C;
> C: {x: long,y: chararray}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2062) Script silently ended

2011-05-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-2062:
---

Assignee: Xuefu Zhang

> Script silently ended
> -
>
> Key: PIG-2062
> URL: https://issues.apache.org/jira/browse/PIG-2062
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The following script ended silently without execution.
> {code}
> a = load '1.txt' as (a0, a1);
> b = load '2.txt' as (b0, b1);
> all = join a by a0, b by b0;
> store all into '';
> {code}
> If change the alias "all", it will run. We need to throw exception saying 
> "all" is a keyword.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Build failed in Jenkins: Pig-trunk-commit #805

2011-05-12 Thread Dmitriy Ryaboy
As https://builds.apache.org/hudson/job/Pig-trunk-commit/805/changes
shows, no changes to ivy or build were made. This failure is due to
"simpledeploy" not working due to "wagon" not being found.
I haven't heard of either of the quoted things ever before so I assume
it's not something I did...

D

On Thu, May 12, 2011 at 3:04 AM, Apache Jenkins Server
 wrote:
> See 
>
> Changes:
>
> [dvryaboy] PIG-2014: SAMPLE should not be pushed up
>
> --
> [...truncated 5565 lines...]
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.datastorage...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.executionengine...
>  [javadoc] Loading source files for package org.apache.pig.backend.hadoop...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.datastorage...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.regex...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.util...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.executionengine.util...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.hbase...
>  [javadoc] Loading source files for package 
> org.apache.pig.backend.hadoop.streaming...
>  [javadoc] Loading source files for package org.apache.pig.builtin...
>  [javadoc] Loading source files for package org.apache.pig.classification...
>  [javadoc] Loading source files for package org.apache.pig.data...
>  [javadoc] Loading source files for package org.apache.pig.impl...
>  [javadoc] Loading source files for package org.apache.pig.impl.builtin...
>  [javadoc] Loading source files for package org.apache.pig.impl.io...
>  [javadoc] Loading source files for package 
> org.apache.pig.impl.logicalLayer...
>  [javadoc] Loading source files for package 
> org.apache.pig.impl.logicalLayer.schema...
>  [javadoc] Loading source files for package 
> org.apache.pig.impl.logicalLayer.validators...
>  [javadoc] Loading source files for package org.apache.pig.impl.plan...
>  [javadoc] Loading source files for package 
> org.apache.pig.impl.plan.optimizer...
>  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
>  [javadoc] Loading source files for package org.apache.pig.impl.util...
>  [javadoc] Loading source files for package org.apache.pig.newplan...
>  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
>  [javadoc] Loading source files for package 
> org.apache.pig.newplan.logical.expression...
>  [javadoc] Loading source files for package 
> org.apache.pig.newplan.logical.optimizer...
>  [javadoc] Loading source files for package 
> org.apache.pig.newplan.logical.relational...
>  [javadoc] Loading source files for package 
> org.apache.pig.newplan.logical.rules...
>  [javadoc] Loading source files for package 
> org.apache.pig.newplan.logical.visitor...
>  [javadoc] Loading source files for package 
> org.apache.pig.newplan.optimizer...
>  [javadoc] Loading source files for package org.apache.pig.parser...
>  [javadoc] Loading source files for package org.apache.pig.pen...
>  [javadoc] Loading source files for package org.apache.pig.pen.util...
>  [javadoc] Loading source files for package org.apache.pig.scripting...
>  [javadoc] Loading source files for package org.apache.pig.scripting.js...
>  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
>  [javadoc] Loading source files for package org.apache.pig.tools...
>  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
>  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
>  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
>  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
>  [javadoc] Loading sourc

[jira] [Created] (PIG-2063) Regression: an invalid query regarding union onschema becoming valid

2011-05-12 Thread Xuefu Zhang (JIRA)
Regression: an invalid query regarding union onschema becoming valid


 Key: PIG-2063
 URL: https://issues.apache.org/jira/browse/PIG-2063
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Thejas M Nair
 Fix For: 0.9.0


The following query fails in 0.8:

A = load 'x' as (x:long, y:chararray);
B = load 'y' as (x:long, y:float);
C = union onschema A, B;

grunt> C = union onschema A, B;
2011-05-12 09:01:47,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1031: Incompatible types for merging schemas. Field schema: y: chararray Other 
field schema: y: float

However, in 0.9 validation doesn't catch the error. It seems float is cast to 
chararray automatically.

grunt> describe C;
C: {x: long,y: chararray}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk-commit #805

2011-05-12 Thread Apache Jenkins Server
See 

Changes:

[dvryaboy] PIG-2014: SAMPLE should not be pushed up

--
[...truncated 5565 lines...]
  [javadoc] Loading source files for package 
org.apache.pig.backend.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.executionengine...
  [javadoc] Loading source files for package org.apache.pig.backend.hadoop...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.datastorage...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.regex...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.executionengine.util...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.hbase...
  [javadoc] Loading source files for package 
org.apache.pig.backend.hadoop.streaming...
  [javadoc] Loading source files for package org.apache.pig.builtin...
  [javadoc] Loading source files for package org.apache.pig.classification...
  [javadoc] Loading source files for package org.apache.pig.data...
  [javadoc] Loading source files for package org.apache.pig.impl...
  [javadoc] Loading source files for package org.apache.pig.impl.builtin...
  [javadoc] Loading source files for package org.apache.pig.impl.io...
  [javadoc] Loading source files for package org.apache.pig.impl.logicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.schema...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.validators...
  [javadoc] Loading source files for package org.apache.pig.impl.plan...
  [javadoc] Loading source files for package 
org.apache.pig.impl.plan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package org.apache.pig.tools.streams...
  [javadoc] Loading source files for package org.apache.pig.tools.timer...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_23
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...

javadoc-jar:
  [jar] Building jar: 


[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-12 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2014:
---

   Resolution: Fixed
Fix Version/s: 0.10
   Status: Resolved  (was: Patch Available)

committed to 0.9 and trunk.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0, 0.10
>
> Attachments: PIG-2014.2.patch, PIG-2014.3.patch, PIG-2014.4.patch, 
> PIG-2014.5.patch, PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-12 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2014:
---

Attachment: PIG-2014.5.patch

An unrelated change snuck into the previously attached file; this is what I am 
actually committing.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.2.patch, PIG-2014.3.patch, PIG-2014.4.patch, 
> PIG-2014.5.patch, PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-12 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2014:
---

Attachment: PIG-2014.4.patch

Attaching combined final version, complete with the renamed method.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.2.patch, PIG-2014.3.patch, PIG-2014.4.patch, 
> PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2014:


Attachment: PIG-2014.3.patch

PIG-2014.3.patch address PushDownFlattenForEach

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.2.patch, PIG-2014.3.patch, PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032290#comment-13032290
 ] 

Daniel Dai commented on PIG-2014:
-

Also change filterHasNonDeterministicUdf to planHasNonDeterministicUdf. I can 
reuse it in PushDownForeachFlatten.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.2.patch, PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2014) SAMPLE shouldn't be pushed up

2011-05-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032287#comment-13032287
 ] 

Daniel Dai commented on PIG-2014:
-

+1 for PIG-2014.2.patch.

I will submit another patch for PushDownForeachFlatten.

> SAMPLE shouldn't be pushed up
> -
>
> Key: PIG-2014
> URL: https://issues.apache.org/jira/browse/PIG-2014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.10
>Reporter: Jacob Perkins
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.9.0
>
> Attachments: PIG-2014.2.patch, PIG-2014.patch
>
>
> Consider the following code:
> {code:none}
> tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, 
> weight:double);
> grouped   = GROUP tfidf_all BY doc_id;
> vectors   = FOREACH grouped GENERATE group AS doc_id, tfidf_all.(token, 
> weight) AS vector;
> DUMP vectors;
> {code}
> This, of course, runs just fine. In a real example, tfidf_all contains 
> 1,428,280 records. The reduce output records should be exactly the number of 
> documents, which turn out to be 18,863 in this case. All well and good.
> The strangeness comes when you add a SAMPLE command:
> {code:none}
> sampled = SAMPLE vectors 0.0012;
> DUMP sampled;
> {code}
> Running this results in 1,513 reduce output records. The reduce output 
> records be much much closer to 22 or 23 records (eg. 0.0012*18863).
> Evidently, Pig rewrites SAMPLE into filter, and then pushes that filter in 
> front of the group. It shouldn't push that filter  
> since the UDF is non-deterministic.  
> Quick fix: If you add "-t PushUpFilter" to your command line when invoking 
> pig this won't happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira