Build failed in Hudson: Pig-trunk #584

2009-10-12 Thread Apache Hudson Server
See 

Changes:

[rangadi] PIG-986. Second commit. forgot to svn-add new files in the previous 
commit.

[rangadi] PIG-986. Column groups can have explicit names specified instorage 
hint. (Yan Zhou via rangadi)

--
[...truncated 139249 lines...]
[junit] 09/10/12 12:41:49 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=create  
src=/tmp/hadoop-hudson/mapred/system/job_20091012124116715_0002/job.xml 
dst=nullperm=hudson:supergroup:rw-r--r--
[junit] 09/10/12 12:41:49 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=setPermission   
src=/tmp/hadoop-hudson/mapred/system/job_20091012124116715_0002/job.xml 
dst=nullperm=hudson:supergroup:rw-r--r--
[junit] 09/10/12 12:41:49 INFO hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_20091012124116715_0002/job.xml. 
blk_64099637709780241_1015
[junit] 09/10/12 12:41:49 INFO datanode.DataNode: Receiving block 
blk_64099637709780241_1015 src: /127.0.0.1:54132 dest: /127.0.0.1:55754
[junit] 09/10/12 12:41:49 INFO datanode.DataNode: Receiving block 
blk_64099637709780241_1015 src: /127.0.0.1:43241 dest: /127.0.0.1:47521
[junit] 09/10/12 12:41:49 INFO datanode.DataNode: Receiving block 
blk_64099637709780241_1015 src: /127.0.0.1:44670 dest: /127.0.0.1:48402
[junit] 09/10/12 12:41:49 INFO DataNode.clienttrace: src: /127.0.0.1:44670, 
dest: /127.0.0.1:48402, bytes: 48253, op: HDFS_WRITE, cliID: 
DFSClient_-928526854, srvID: DS-1295567057-127.0.1.1-48402-1255351276055, 
blockid: blk_64099637709780241_1015
[junit] 09/10/12 12:41:49 INFO datanode.DataNode: PacketResponder 0 for 
block blk_64099637709780241_1015 terminating
[junit] 09/10/12 12:41:49 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48402 is added to 
blk_64099637709780241_1015 size 48253
[junit] 09/10/12 12:41:49 INFO DataNode.clienttrace: src: /127.0.0.1:43241, 
dest: /127.0.0.1:47521, bytes: 48253, op: HDFS_WRITE, cliID: 
DFSClient_-928526854, srvID: DS-1256327855-127.0.1.1-47521-1255351276624, 
blockid: blk_64099637709780241_1015
[junit] 09/10/12 12:41:49 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:47521 is added to 
blk_64099637709780241_1015 size 48253
[junit] 09/10/12 12:41:49 INFO datanode.DataNode: PacketResponder 1 for 
block blk_64099637709780241_1015 terminating
[junit] 09/10/12 12:41:49 INFO DataNode.clienttrace: src: /127.0.0.1:54132, 
dest: /127.0.0.1:55754, bytes: 48253, op: HDFS_WRITE, cliID: 
DFSClient_-928526854, srvID: DS-108433573-127.0.1.1-55754-1255351274841, 
blockid: blk_64099637709780241_1015
[junit] 09/10/12 12:41:49 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:55754 is added to 
blk_64099637709780241_1015 size 48253
[junit] 09/10/12 12:41:49 INFO datanode.DataNode: PacketResponder 2 for 
block blk_64099637709780241_1015 terminating
[junit] 09/10/12 12:41:49 INFO hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/tmp/hadoop-hudson/mapred/system/job_20091012124116715_0002/job.xml is closed 
by DFSClient_-928526854
[junit] 09/10/12 12:41:49 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20091012124116715_0002/job.xml 
dst=nullperm=null
[junit] 09/10/12 12:41:49 INFO DataNode.clienttrace: src: /127.0.0.1:47521, 
dest: /127.0.0.1:43243, bytes: 48633, op: HDFS_READ, cliID: 
DFSClient_-928526854, srvID: DS-1256327855-127.0.1.1-47521-1255351276624, 
blockid: blk_64099637709780241_1015
[junit] 09/10/12 12:41:49 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20091012124116715_0002/job.jar 
dst=nullperm=null
[junit] 09/10/12 12:41:49 INFO DataNode.clienttrace: src: /127.0.0.1:55754, 
dest: /127.0.0.1:54136, bytes: 2481312, op: HDFS_READ, cliID: 
DFSClient_-928526854, srvID: DS-108433573-127.0.1.1-55754-1255351274841, 
blockid: blk_8897466993134543700_1013
[junit] 09/10/12 12:41:49 INFO mapred.JobTracker: Initializing 
job_20091012124116715_0002
[junit] 09/10/12 12:41:49 INFO mapred.JobInProgress: Initializing 
job_20091012124116715_0002
[junit] 09/10/12 12:41:49 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=create  
src=/tmp/temp1896202947/tmp-1253282919/_logs/history/localhost_1255351276736_job_20091012124116715_0002_hudson_Job841759538738982609.jar
dst=nullperm=hudson:supergroup:rw-r--r--
[junit] 09/10/12 12:41:49 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=create  
src=/tmp/temp1896202947/tmp-1253282919/_logs/history/localhost_1255351276736_job_20091012124116715_0002_conf.xml
dst=nullperm=hudson:supergroup:rw-r-

[jira] Commented: (PIG-994) Provide 'append'/'update' keyword to allow appending/updating to diferent dataset once the feature is available in Hadoop

2009-10-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764735#action_12764735
 ] 

Alan Gates commented on PIG-994:


I don't follow where you're going here.

How is "append b, c" different from "union b, c"?

I don't understand what would happen in the update example given above either.

> Provide 'append'/'update' keyword to allow appending/updating to diferent 
> dataset once the feature is available in Hadoop
> -
>
> Key: PIG-994
> URL: https://issues.apache.org/jira/browse/PIG-994
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
> Environment: Grid clusters
>Reporter: Rekha
>Priority: Minor
>
> Provide 'append'/'update' keyword to allow appending/updating to diferent 
> dataset on pig 0.5.0 as it is now on hadoop 0.20(which has append feature)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-992) [zebra] Separate Schema-related files into a "Schema" package

2009-10-12 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-992:
-

Attachment: SchemaPackageChange.patch

Mostly address the review comments by Hong. Most significant changes include 
removal of JavaCC generated sources from the repository, unification of the 
ParseException class generated by both the storage parser and schema parser, 
hide internal members of the Schema.ColumnSchema class.

> [zebra] Separate Schema-related files into a "Schema" package
> -
>
> Key: PIG-992
> URL: https://issues.apache.org/jira/browse/PIG-992
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: SchemaPackageChange.patch, SchemaPackageChange.patch, 
> SchemaPackageChange.patch
>
>
> The hope is to facilitate future sharing of the Schema codes between 
> different modules and/or products. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-12 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-993:
-

Attachment: (was: zebra_drop_cg.patch)

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-12 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-993:
-

Attachment: zebra_drop_cg.patch

This new version  of patch is needed due to the upstream changes in Pig-992. 
Although this Jira is functionally independent of Pig-992, but the application 
sequence of the patches must be honored.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-12 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-993:
-

Attachment: zebra-drop-cg.patch

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-10-12 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-944:
-

Attachment: SchemaConversion.patch

The new version of patch is needed because of its dependence upon an upstream 
Jira's change in Pig-992.

> Zebra schema is taken from Pig through TableStorer's construct
> --
>
> Key: PIG-944
> URL: https://issues.apache.org/jira/browse/PIG-944
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SchemaConversion.patch, SchemaConversion.patch
>
>
> It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
> because the information is dynamic in Pig's execution engine and should not 
> be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-12 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764755#action_12764755
 ] 

Pradeep Kamath commented on PIG-1014:
-

The jira is to track if it is possible to automatically convert a 
COUNT(relation) in the script to COUNT_STAR(relation) in the plan so that 
nullness of the fields in the records is not considered while returning the 
count. For example if a relation (A) has two fields and there is the following 
script snippet:
{noformat}
B = group A by $0;
C = foreach B generate group, COUNT(A);
{noformat}
This is equivalent to a count(*) after grouping on the first column in SQL. Per 
SQL semantics, COUNT(*) counts all records for the group without regard to the 
nullness of the individual fields. This behavior is achieved through COUNT_STAR 
built -in in pig. However COUNT built-in in pig is meant for counting a bag 
with a single column  (for example COUNT(A.$0)  above).  So the implementation 
in COUNT checks if the first field in the bag is null or not and only counts 
non null values. In the above script if the first column in the bag is null for 
any record, it does not get counted which would not be the same as the expected 
result for COUNT(*) in SQL. So if the compilation phase in pig can detect that 
the COUNT is being performed on a whole relation (rather than an individual 
column), it can replace the COUNT with COUNT_STAR and achieve the desired 
result.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-12 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764771#action_12764771
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

Is Pig trying to guess the user's intent? What if the user wanted to do count 
without nulls ?

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-958) Splitting output data on key field

2009-10-12 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764773#action_12764773
 ] 

Pradeep Kamath commented on PIG-958:


+1 - changes looks good!
For the test, I observed you were using the mapreduce mode pigserver object 
even in local mode - I made some changes but was unable to run the tests due to 
some config issue in setting up the test run - did not explore more - 
nevertheless here is what I changed:
{noformat}
127   private void testMultiStorage(PigServer pigServer, Mode mode, 

   
128   String... queries) throws IOException {   

   
129 PigServer ps = (mode == Mode.cluster) ? pigServer: pigServerLocal;  

   
130 ps.setBatchOn();

   
131 for (String query : queries) {  

   
132   ps.registerQuery(query);  

   
133 }   

   
134 ps.executeBatch();  

   
135 verifyResults(mode);

   
136   }   
{nofrmat}

Check if making the above changes solves the issue you are seeing.

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1015) [piggybank] DateExtractor should take into account timezones

2009-10-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1015:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks, Dmitry!

> [piggybank] DateExtractor should take into account timezones
> 
>
> Key: PIG-1015
> URL: https://issues.apache.org/jira/browse/PIG-1015
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Fix For: 0.6.0
>
> Attachments: date_extractor.patch
>
>
> The current implementation defaults to the local timezone when parsing 
> strings, thereby providing inconsistent results depending on the settings of 
> the computer the program is executing on (this is causing unit test 
> failures). We should set the timezone to a consistent default, and allow 
> users to override this default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764780#action_12764780
 ] 

Dmitriy V. Ryaboy commented on PIG-1014:


Santosh -- if the user wanted to do a "count without nulls in the first field" 
then she should COUNT(A.$0).  I think Pradeep's suggestion causes least 
surprise to the end user (at least, that's the behaviour I would have expected 
had I not seen this ticket). 

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)
Reading in map data seems broken


 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy


Hi, I'm trying to load a map that has a tuple for value. The read fails in 
0.4.0 because of a misconfiguration in the parser. Where as in almost all 
documentation it is stated that value of the map can be any time.

I've attached a patch that allows us to read in complex objects as value as 
documented. I've done simple verification of loading in maps with tuple/map 
values and writing them back out using LOAD and STORE. All seems to work fine.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Status: Patch Available  (was: Open)

% diff org/apache/pig/data/parser/TextDataParser.jjt 
org/apache/pig/data/parser/newTextDataParser.jjt
145c145
<   String value = null;
---
>   Object value = null;
149c149
<   (key = StringDatum() "#" value = StringDatum())
---
>   (key = StringDatum() "#" value = Datum())
151c151
<   keyValues.put(key, new DataByteArray(value.getBytes("UTF-8")));
---
>   keyValues.put(key, value);


> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: map_to_any_value.patch

A patch for org/apache/pig/data/parser/TextDataParser.jjt

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1017) Converts strings to text in Pig

2009-10-12 Thread Sriranjan Manjunath (JIRA)
Converts strings to text in Pig
---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath


Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) 
stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764789#action_12764789
 ] 

Hadoop QA commented on PIG-1016:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12421892/map_to_any_value.patch
  against trunk revision 824446.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/72/console

This message is automatically generated.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-12 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764792#action_12764792
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

If the user wants to count without nulls then the user should use COUNT_STAR. 
One of the philosophies of Pig has been to allow users to do exactly what they 
want. Here, we are violating that philosophy and secondly we are second 
guessing the user's intention.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Open  (was: Patch Available)

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Attachment: PIG-984_1.patch

This patch fixed the above QA errors.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Patch Available  (was: Open)

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-12 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764846#action_12764846
 ] 

Santhosh Srinivasan commented on PIG-984:
-

Very quick comment. The parser has a log.info which should be converted to a 
log.debug

Index: src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt
===


+[ ("\"collected\"" { 
+log.info("Using mapside");


> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-990) Provide a way to pin LogicalOperator Options

2009-10-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764852#action_12764852
 ] 

Alan Gates commented on PIG-990:


Looks good.

One comment, rather than referring to the default join as "regular" I think we 
should refer to it as "hash" or "symmetric hash", since these accurately 
describe how it works.  That way users can specify the type of join they want, 
and if for whatever reason we switch the default they'll still get what they 
want.

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: pinned_options.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-992) [zebra] Separate Schema-related files into a "Schema" package

2009-10-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-992:
---

Status: Open  (was: Patch Available)

Canceling old patches.

> [zebra] Separate Schema-related files into a "Schema" package
> -
>
> Key: PIG-992
> URL: https://issues.apache.org/jira/browse/PIG-992
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: SchemaPackageChange.patch, SchemaPackageChange.patch, 
> SchemaPackageChange.patch
>
>
> The hope is to facilitate future sharing of the Schema codes between 
> different modules and/or products. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-992) [zebra] Separate Schema-related files into a "Schema" package

2009-10-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-992:
---

Status: Patch Available  (was: Open)

Resubmitting patch so hudson will rerun

> [zebra] Separate Schema-related files into a "Schema" package
> -
>
> Key: PIG-992
> URL: https://issues.apache.org/jira/browse/PIG-992
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: SchemaPackageChange.patch, SchemaPackageChange.patch, 
> SchemaPackageChange.patch
>
>
> The hope is to facilitate future sharing of the Schema codes between 
> different modules and/or products. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-990) Provide a way to pin LogicalOperator Options

2009-10-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764860#action_12764860
 ] 

Dmitriy V. Ryaboy commented on PIG-990:
---

Thanks for reviewing.

I only called it "regular" because that's what it was called in the enum that 
already existed inside LOJoin.

Maybe a better option would be to call the currently-default join 'hash', and 
also provide a 'default' key that, for now, will translate to hash, but can 
translate to something else (and stay pinned) if the defaults change. I'll add 
that to the next iteration of the patch (which will also contain whatever 
keyword is decided on for the map-side groups).

> Provide a way to pin LogicalOperator Options
> 
>
> Key: PIG-990
> URL: https://issues.apache.org/jira/browse/PIG-990
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: pinned_options.patch
>
>
> This is a proactive patch, setting up the groundwork for adding an optimizer.
> Some of the LogicalOperators have options. For example, LOJoin has a variety 
> of join types (regular, fr, skewed, merge), which can be set by the user or 
> chosen by a hypothetical optimizer.  If a user selects a join type, pig 
> philoophy guides us to always respect the user's choice and not explore 
> alternatives.  Therefore, we need a way to "pin" options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1016:


Status: Open  (was: Patch Available)

Canceling the patch as Hudson was not able to successfully apply it.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-10-12 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-944:
-

Status: Open  (was: Patch Available)

cancel tha patch

> Zebra schema is taken from Pig through TableStorer's construct
> --
>
> Key: PIG-944
> URL: https://issues.apache.org/jira/browse/PIG-944
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SchemaConversion.patch, SchemaConversion.patch
>
>
> It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
> because the information is dynamic in Pig's execution engine and should not 
> be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: map_to_any_value.patch)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: trunk_map_to_any_value.patch

Including a patch via svn diff.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: trunk_map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Open  (was: Patch Available)

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Attachment: PIG-976.patch

Another corner case:

{code}
A = LOAD 'input';
B = GROUP A BY $0;
C = FOREACH B GENERATE COUNT(A), group, group;
{code}

The multi-query optimizer needs to track the positions of the group key in the 
output tuple of above foreach statement.

This patch takes care of this case as well.


> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Patch Available  (was: Open)

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.

2009-10-12 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-996:
--

Status: Open  (was: Patch Available)

This patch should be applied only after pig-944 patch has been applied - 
cancelling for now - will re-submit again.

> [zebra] Zebra build script does not have findbugs and clover targets.
> -
>
> Key: PIG-996
> URL: https://issues.apache.org/jira/browse/PIG-996
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: patch_build
>
>
> Zebra build script does not have findbugs and clover targets, leading hudson 
> build process to fail on Zebra.
> This jira is to fix this by adding these two targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1016:


Status: Patch Available  (was: Open)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: trunk_map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764902#action_12764902
 ] 

Hadoop QA commented on PIG-984:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421908/PIG-984_1.patch
  against trunk revision 824446.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/73/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/73/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/73/console

This message is automatically generated.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-992) [zebra] Separate Schema-related files into a "Schema" package

2009-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764903#action_12764903
 ] 

Hadoop QA commented on PIG-992:
---

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12421879/SchemaPackageChange.patch
  against trunk revision 824446.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 183 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/console

This message is automatically generated.

> [zebra] Separate Schema-related files into a "Schema" package
> -
>
> Key: PIG-992
> URL: https://issues.apache.org/jira/browse/PIG-992
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: SchemaPackageChange.patch, SchemaPackageChange.patch, 
> SchemaPackageChange.patch
>
>
> The hope is to facilitate future sharing of the Schema codes between 
> different modules and/or products. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1018) FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter

2009-10-12 Thread Olga Natkovich (JIRA)
FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower 
case letter
---

 Key: PIG-1018
 URL: https://issues.apache.org/jira/browse/PIG-1018
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich


Nm  The field name 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.LogToPhyMap
 doesn't start with a lower case letter
Nm  The method name 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.CreateTuple(Object[])
 doesn't start with a lower case letter
Nm  The class name 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.util.operatorHelper 
doesn't start with an upper case letter
Nm  Class org.apache.pig.impl.util.WrappedIOException is not derived from 
an Exception, even though it is named as such
Nm  The method name 
org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(LogicalOperator, 
Map) doesn't start with a lower case letter
Nm  The field name org.apache.pig.pen.util.DisplayExamples.Result doesn't 
start with a lower case letter
Nm  The method name 
org.apache.pig.pen.util.DisplayExamples.PrintSimple(LogicalOperator, Map) 
doesn't start with a lower case letter
Nm  The method name 
org.apache.pig.pen.util.DisplayExamples.PrintTabular(LogicalPlan, Map) doesn't 
start with a lower case letter
Nm  The method name 
org.apache.pig.tools.parameters.TokenMgrError.LexicalError(boolean, int, int, 
int, String, char) doesn't start with a lower case letter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1019) FINDBUGS: add exclude file

2009-10-12 Thread Olga Natkovich (JIRA)
FINDBUGS: add exclude file
--

 Key: PIG-1019
 URL: https://issues.apache.org/jira/browse/PIG-1019
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1019) FINDBUGS: add exclude file

2009-10-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1019:


Attachment: PIG-1019.patch

> FINDBUGS: add exclude file
> --
>
> Key: PIG-1019
> URL: https://issues.apache.org/jira/browse/PIG-1019
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1019.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1019) FINDBUGS: add exclude file

2009-10-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1019:


Status: Patch Available  (was: Open)

> FINDBUGS: add exclude file
> --
>
> Key: PIG-1019
> URL: https://issues.apache.org/jira/browse/PIG-1019
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1019.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764922#action_12764922
 ] 

Hadoop QA commented on PIG-1016:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12421920/trunk_map_to_any_value.patch
  against trunk revision 824446.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/19/console

This message is automatically generated.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: trunk_map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: PIG-1016.patch

rename

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: trunk_map_to_any_value.patch)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-trunk #585

2009-10-12 Thread Apache Hudson Server
See 




[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764935#action_12764935
 ] 

Hadoop QA commented on PIG-976:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421921/PIG-976.patch
  against trunk revision 824446.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/74/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/74/console

This message is automatically generated.

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Status: Open  (was: Patch Available)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: PIG-1016.patch)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: PIG-1016.patch

This patch is generated with svndiff and has a unit test

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: PIG-1016.patch)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-12 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: PIG-1016.patch

Unit test plus patch. This time unit test actually passes.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1019) FINDBUGS: add exclude file

2009-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764945#action_12764945
 ] 

Hadoop QA commented on PIG-1019:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421933/PIG-1019.patch
  against trunk revision 824446.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 292 release audit warnings 
(more than the trunk's current 291 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/20/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/20/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/20/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/20/console

This message is automatically generated.

> FINDBUGS: add exclude file
> --
>
> Key: PIG-1019
> URL: https://issues.apache.org/jira/browse/PIG-1019
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1019.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-12 Thread Daniel Dai (JIRA)
Include an ant target to build pig.jar without hadoop libraries
---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Priority: Minor
 Fix For: 0.6.0


Provide an ant target to build pig.jar without all hadoop related libraries. 
User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Assignee: Daniel Dai
  Status: Patch Available  (was: Open)

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1020-1.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Attachment: PIG-1020-1.patch

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1020-1.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.