Build failed in Hudson: Pig-trunk #693

2010-03-02 Thread Apache Hudson Server
See 

Changes:

[pradeepkth] PIG-1265: Change LoadMetadata and StoreMetadata to use Job instead 
of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface 
(pradeepkth)

--
[...truncated 1437 lines...]
A contrib/piggybank/java
A contrib/piggybank/java/lib
A contrib/piggybank/java/src
A contrib/piggybank/java/src/test
A contrib/piggybank/java/src/test/java
A contrib/piggybank/java/src/test/java/org
A contrib/piggybank/java/src/test/java/org/apache
A contrib/piggybank/java/src/test/java/org/apache/pig
A contrib/piggybank/java/src/test/java/org/apache/pig/piggybank
A contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/filtering
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestHelper.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestSequenceFileLoader.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCommonLogLoader.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestRegExLoader.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMyRegExLoader.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCombinedLogLoader.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/decode
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/decode/TestDecode.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/string
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/string/TestLookupInFiles.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/string/TestRegex.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/string/TestHashFNV.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/TestMathUDF.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/TestStat.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/datetime
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/datetime/TestDiffDate.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/apachelogparser
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/apachelogparser/TestSearchEngineExtractor.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/apachelogparser/TestDateExtractor.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/apachelogparser/TestHostExtractor.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/apachelogparser/TestSearchTermExtractor.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/TestTop.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/util/TestSearchQuery.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/TestEvalString.java
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/comparison
A 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/grouping
A contrib/piggybank/java/src/main
A contrib/piggybank/java/src/main/java
A contrib/piggybank/java/src/main/java/org
A contrib/piggybank/java/src/main/java/org/apache
A contrib/piggybank/java/src/main/java/org/apache/pig
A contrib/piggybank/java/src/main/java/org/apache/pig/piggybank
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/filtering
A contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/SequenceFileLoader.java
A 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/RegExLoader.java
A 

[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-03-02 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840130#action_12840130
 ] 

Gerrit Jansen van Vuuren commented on PIG-1248:
---

Hi,

I think returning bag on split makes more sense given that,
normally when you split a string you expect to have some type 
that allows you to iterate through the split values one at a time.  





> [piggybank] useful String functions
> ---
>
> Key: PIG-1248
> URL: https://issues.apache.org/jira/browse/PIG-1248
> Project: Pig
>  Issue Type: New Feature
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff
>
>
> Pig ships with very few evalFuncs for working with strings. This jira is for 
> adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-03-02 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840178#action_12840178
 ] 

Jeff Zhang commented on PIG-259:


I also think it make sense to have the StoreFuncInterface contain the 
overwriting interface. For users it is easy to understand the interface and 
easy to maintenance.

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.8.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.8.0
>
> Attachments: Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, 
> Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1269:
-

Attachment: zebra.0302

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)
[Zebra] Restrict schema definition for collection
-

 Key: PIG-1269
 URL: https://issues.apache.org/jira/browse/PIG-1269
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.7.0
 Attachments: zebra.0302

Currently Zebra grammar for schema definition for collection field allows many 
types of definition. To reduce complexity and remove ambiguity, and more 
importantly, to make the meta data more representative of the actual data 
instances, the grammar rules need to be changed. Only a record type is allowed 
and required for collection definition. Thus,  
fieldName:collection(record(c1:int, c2:string)) is legal, while 
fieldName:collection(c1:int, c2:string), fieldName:collection(f:record(c1:int, 
c2:string)), fieldName:collection(c1:int), or feildName:collection(int) is 
illegal.

This will have some impact on existing Zebra M/R programs or Pig scripts that 
use Zebra. Schema acceptable in previous release now may become illegal because 
of this change. This should be clearly documented.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1269:
-

Status: Patch Available  (was: Open)

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1262) Additional findbugs and javac warnings

2010-03-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840263#action_12840263
 ] 

Olga Natkovich commented on PIG-1262:
-

+1

> Additional findbugs and javac warnings
> --
>
> Key: PIG-1262
> URL: https://issues.apache.org/jira/browse/PIG-1262
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1262-1.patch
>
>
> After a while, we have introduced some new findbugs and javacc warnings. Will 
> fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1269:
-

Status: Open  (was: Patch Available)

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1238) Dump does not respect the schema

2010-03-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1238:
---

Assignee: Daniel Dai

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Daniel Dai
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-03-02 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840279#action_12840279
 ] 

Bill Graham commented on PIG-1248:
--

How exactly would split differ from the TOKENIZE function if split returned a 
bag? TOKENIZE returns an unordered bag of words. Having a function that returns 
an ordered tuple of words is very useful IMO. I had to write my own version of 
a tokenize UDF to do this. 

> [piggybank] useful String functions
> ---
>
> Key: PIG-1248
> URL: https://issues.apache.org/jira/browse/PIG-1248
> Project: Pig
>  Issue Type: New Feature
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff
>
>
> Pig ships with very few evalFuncs for working with strings. This jira is for 
> adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1270) Push limit into loader

2010-03-02 Thread Daniel Dai (JIRA)
Push limit into loader
--

 Key: PIG-1270
 URL: https://issues.apache.org/jira/browse/PIG-1270
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai


We can optimize limit operation by stopping early in PigRecordReader. In 
general, we need a way to communicate between PigRecordReader and execution 
pipeline. POLimit could instruct PigRecordReader that we have already had 
enough records and stop feeding more data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1263) Script producing varying number of records when COGROUPing value of map data type with and without types

2010-03-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1263:
---

Assignee: Daniel Dai

> Script producing varying number of records when COGROUPing value of map data 
> type with and without types
> 
>
> Key: PIG-1263
> URL: https://issues.apache.org/jira/browse/PIG-1263
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
>
> I have a Pig script which I am experimenting upon. [[Albeit this is not 
> optimized and can be done in variety of ways]] I get different record counts 
> by placing load store pairs in the script.
> Case 1: Returns 424329 records
> Case 2: Returns 5859 records
> Case 3: Returns 5859 records
> Case 4: Returns 5578 records
> I am wondering what the correct result is?
> Here are the scripts.
> Case 1: 
> {code}
> register udf.jar
> A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l);
> B = FOREACH A GENERATE
> s#'key1' as key1,
> s#'key2' as key2;
> C = FOREACH B generate key2;
> D = filter C by (key2 IS NOT null);
> E = distinct D;
> store E into 'unique_key_list' using PigStorage('\u0001');
> F = Foreach E generate key2, MapGenerate(key2) as m;
> G = FILTER F by (m IS NOT null);
> H = foreach G generate key2, m#'id1' as id1, m#'id2' as id2, m#'id3' as id3, 
> m#'id4' as id4, m#'id5' as id5, m#'id6' as id6, m#'id7' as id7, m#'id8' as 
> id8, m#'id9' as id9, m#'id10' as id10, m#'id11' as id11, m#'id12' as id12;
> I = GROUP H BY (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12);
> J = Foreach I generate group.id1 as id1, group.id2 as id2, group.id3 as id3, 
> group.id4 as id4,group.id5 as id5, group.id6 as id6, group.id7 as id7, 
> group.id8 as id8, group.id9 as id9, group.id10 as id10, group.id11 as id11, 
> group.id12 as id12;
> --load previous days data
> K = LOAD '/user/viraj/data/20100202' USING PigStorage('\u0001') as (id1, id2, 
> id3, id4, id5, id6, id7, id8, id9, id10, id11, id12);
> L = COGROUP  K by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER,
>  J by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER;
> M = filter L by IsEmpty(K);
> store M into 'cogroupNoTypes' using PigStorage();
> {code}
> Case 2:  Storing and loading intermediate results in J 
> {code}
> register udf.jar
> A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l);
> B = FOREACH A GENERATE
> s#'key1' as key1,
> s#'key2' as key2;
> C = FOREACH B generate key2;
> D = filter C by (key2 IS NOT null);
> E = distinct D;
> store E into 'unique_key_list' using PigStorage('\u0001');
> F = Foreach E generate key2, MapGenerate(key2) as m;
> G = FILTER F by (m IS NOT null);
> H = foreach G generate key2, m#'id1' as id1, m#'id2' as id2, m#'id3' as id3, 
> m#'id4' as id4, m#'id5' as id5, m#'id6' as id6, m#'id7' as id7, m#'id8' as 
> id8, m#'id9' as id9, m#'id10' as id10, m#'id11' as id11, m#'id12' as id12;
> I = GROUP H BY (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12);
> J = Foreach I generate group.id1 as id1, group.id2 as id2, group.id3 as id3, 
> group.id4 as id4,group.id5 as id5, group.id6 as id6, group.id7 as id7, 
> group.id8 as id8, group.id9 as id9, group.id10 as id10, group.id11 as id11, 
> group.id12 as id12;
> --store intermediate data to HDFS and re-read
> store J into 'output/20100203/J' using PigStorage('\u0001');
> --load previous days data
> K = LOAD '/user/viraj/data/20100202' USING PigStorage('\u0001') as (id1, id2, 
> id3, id4, id5, id6, id7, id8, id9, id10, id11, id12);
> --read J into K1
> K1 = LOAD 'output/20100203/J' using PigStorage('\u0001') as (id1, id2, id3, 
> id4, id5, id6, id7, id8, id9, id10, id11, id12);
> L = COGROUP  K by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER,
>  K1 by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER;
> M = filter L by IsEmpty(K);
> store M into 'cogroupNoTypesIntStore' using PigStorage();
> {code}
> Case 3: Types information specified but no intermediate store of J
> {code}
> register udf.jar
> A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l);
> B = FOREACH A GENERATE
> s#'key1' as key1,
> s#'key2' as key2;
> C = FOREACH B generate key2;
> D = filter C by (key2 IS NOT null);
> E = distinct D;
> store E into 'unique_key_list' using PigStorage('\u0001');
> F = Foreach E generate key2, MapGenerate(key2) as m;
> G = FILTER F by (m IS NOT null);
> H = foreach G generate key2, (long)m#'id1' as id1, (long)m#'id2' as id2, 
> (long)m#'id3' as id3, (long)m#'id4' as id4, (long)m#'id5' as id5, 
> 

[jira] Created: (PIG-1271) Provide a more flexible data format to load complex field (bag/tuple/map) in PigStorage

2010-03-02 Thread Daniel Dai (JIRA)
Provide a more flexible data format to load complex field (bag/tuple/map) in 
PigStorage
---

 Key: PIG-1271
 URL: https://issues.apache.org/jira/browse/PIG-1271
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai


With [PIG-613|https://issues.apache.org/jira/browse/PIG-613], we are able to 
load txt files containing complex data type (map/bag/tuple) according to 
schema. However, the format of complex data field is very strict. User have to 
use pre-determined special characters to mark the beginning and end of each 
field, and those special characters can not be used in the content. The goals 
of this issue are:

1. Provide a way for user to escape special characters
2. Make it easy for users to customize Utf8StorageConverter when they have 
their own data format



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1269:
-

Attachment: zebra.0302

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302, zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1269:
-

Attachment: (was: zebra.0302)

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1269:
-

Status: Patch Available  (was: Open)

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-02 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840339#action_12840339
 ] 

Viraj Bhat commented on PIG-1252:
-

A modified version of the script works, does this have to do with nested 
foreach? 

{code}
loadData = load '/user/viraj/zebradata' using 
org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
col7');

prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
(chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : 
IS_VALID ('200', '0', '0', 'input.txt')) as validRec;

SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
falseDataTmp IF (validRec == '1' AND splitcond == '');

grpData = GROUP trueDataTmp BY splitcond;

finalData = FOREACH grpData GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 
'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
 
dump finalData;
{code}

> Diamond splitter does not generate correct results when using Multi-query 
> optimization
> --
>
> Key: PIG-1252
> URL: https://issues.apache.org/jira/browse/PIG-1252
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
>
> I have script which uses split but somehow does not use one of the split 
> branch. The skeleton of the script is as follows
> {code}
> loadData = load '/user/viraj/zebradata' using 
> org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
> col7');
> prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
> (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
> ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
> : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
> SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
> falseDataTmp IF (validRec == '1' AND splitcond == '');
> grpData = GROUP trueDataTmp BY splitcond;
> finalData = FOREACH grpData {
>orderedData = ORDER trueDataTmp BY col1,col2;
>GENERATE FLATTEN ( MYUDF (orderedData, 60, 
> 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
>   }
> dump finalData;
> {code}
> You can see that "falseDataTmp" is untouched.
> When I run this script with no-Multiquery (-M) option I get the right result. 
>  This could be the result of complex BinCond's in the POLoad. We can get rid 
> of this error by using  FILTER instead of SPIT.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Viraj Bhat (JIRA)
Column pruner causes wrong results
--

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.7.0


For a simple script the column pruner optimization removes certain columns from 
the original relation, which results in wrong results.

Input file "kv" contains the following columns (tab separated)
{code}
a   1
a   2
a   3
b   4
c   5
c   6
b   7
d   8
{code}

Now running this script in Pig 0.6 produces

{code}
kv = load 'kv' as (k,v);
keys= foreach kv generate k;
keys = distinct keys; 
keys = limit keys 2;
rejoin = join keys by k, kv by k;
dump rejoin;
{code}

(a,a)
(a,a)
(a,a)
(b,b)
(b,b)


Running this in Pig 0.5 version without column pruner results in:
(a,a,1)
(a,a,2)
(a,a,3)
(b,b,4)
(b,b,7)

When we disable the "ColumnPruner" optimization it gives right results.

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-1272:
---

Assignee: Daniel Dai

> Column pruner causes wrong results
> --
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
>
> For a simple script the column pruner optimization removes certain columns 
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a   1
> a   2
> a   3
> b   4
> c   5
> c   6
> b   7
> d   8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840389#action_12840389
 ] 

Viraj Bhat commented on PIG-1272:
-

Now with Pig 0.7 or trunk we have the following error:

2010-03-02 23:35:09,349 FATAL org.apache.hadoop.mapred.Child: Error running 
child : java.lang.NoSuchFieldError: sJobConf
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POJoinPackage.getNext(POJoinPackage.java:110)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:380)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:363)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:240)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:409)
at org.apache.hadoop.mapred.Child.main(Child.java:159)

Viraj

> Column pruner causes wrong results
> --
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
>
> For a simple script the column pruner optimization removes certain columns 
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a   1
> a   2
> a   3
> b   4
> c   5
> c   6
> b   7
> d   8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840407#action_12840407
 ] 

Hadoop QA commented on PIG-1269:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12437638/zebra.0302
  against trunk revision 917827.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 63 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/219/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/219/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/219/console

This message is automatically generated.

> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0302
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Attachment: PIG-1272-1.patch

> Column pruner causes wrong results
> --
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1272-1.patch
>
>
> For a simple script the column pruner optimization removes certain columns 
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a   1
> a   2
> a   3
> b   4
> c   5
> c   6
> b   7
> d   8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Status: Patch Available  (was: Open)

> Column pruner causes wrong results
> --
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1272-1.patch
>
>
> For a simple script the column pruner optimization removes certain columns 
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a   1
> a   2
> a   3
> b   4
> c   5
> c   6
> b   7
> d   8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1273) Skewed join throws error

2010-03-02 Thread Ankur (JIRA)
Skewed join throws error 
-

 Key: PIG-1273
 URL: https://issues.apache.org/jira/browse/PIG-1273
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur


When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1273) Skewed join throws error

2010-03-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840482#action_12840482
 ] 

Ankur commented on PIG-1273:


Here is a simple script to reproduce it

a = load 'test.dat' using PigStorage() as (nums:chararray);
b = load 'join.dat' using PigStorage('\u0001') as 
(number:chararray,text:chararray);
c = filter a by nums == '7';
d = join c by nums LEFT OUTER, b by number USING "skewed";
dump d;

 test.dat 
1
2
3
4
5

= join.dat =
1^Aone
2^Atwo
3^Athree

where ^A means Control-A charatcer used as a separator.

> Skewed join throws error 
> -
>
> Key: PIG-1273
> URL: https://issues.apache.org/jira/browse/PIG-1273
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1273) Skewed join throws error

2010-03-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840483#action_12840483
 ] 

Ankur commented on PIG-1273:


Complete stack trace of the error thrown my 3rd M/R job in the pipeline

java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.(MapTask.java:448)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 6 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Empty 
samples file
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.configure(SkewedPartitioner.java:128)
... 11 more
Caused by: java.lang.RuntimeException: Empty samples file
at 
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.loadPartitionFile(MapRedUtil.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.configure(SkewedPartitioner.java:125)
... 11 more


> Skewed join throws error 
> -
>
> Key: PIG-1273
> URL: https://issues.apache.org/jira/browse/PIG-1273
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840496#action_12840496
 ] 

Hadoop QA commented on PIG-1272:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12437666/PIG-1272-1.patch
  against trunk revision 917827.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/220/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/220/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/220/console

This message is automatically generated.

> Column pruner causes wrong results
> --
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1272-1.patch
>
>
> For a simple script the column pruner optimization removes certain columns 
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a   1
> a   2
> a   3
> b   4
> c   5
> c   6
> b   7
> d   8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1274) Column pruning throws Null pointer exception

2010-03-02 Thread Ankur (JIRA)
Column pruning throws Null pointer exception


 Key: PIG-1274
 URL: https://issues.apache.org/jira/browse/PIG-1274
 Project: Pig
  Issue Type: Bug
Reporter: Ankur


In case data has missing values for certain columns in a relation participating 
in a join, column pruning throws null pointer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1274) Column pruning throws Null pointer exception

2010-03-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840514#action_12840514
 ] 

Ankur commented on PIG-1274:


Here is a script to reproduce the error

== pig script  =

R1 = load 'data1' as (a:chararray, b:chararray, c:chararray, d:chararray);
R2 = load 'data2' as (x:chararray, y:chararray, z:chararray);
joined = join R1 by c, R2 by z;
projected = FOREACH joined generate c, d;
dump projected;

= data 1==
a   b
== data 2 ==
a   b   c
a   t   d
a   x   e

The exception log is


ERROR 1002: Unable to store alias projected

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias projected
at org.apache.pig.PigServer.openIterator(PigServer.java:482)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:552)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
...
...
Caused by: java.lang.NullPointerException
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:149)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:234)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:615)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup.accumulateData(POCogroup.java:177)
at 
org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup.getNext(POCogroup.java:96)

> Column pruning throws Null pointer exception
> 
>
> Key: PIG-1274
> URL: https://issues.apache.org/jira/browse/PIG-1274
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>
> In case data has missing values for certain columns in a relation 
> participating in a join, column pruning throws null pointer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.