[jira] Updated: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-644:
---

Attachment: PIG-644-1.patch

Add a SchemaAliasValidator to do this check.

 Duplicate column names in foreach do not throw parser error
 ---

 Key: PIG-644
 URL: https://issues.apache.org/jira/browse/PIG-644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: blah.txt, PIG-644-1.patch


 Consider the following Pig script where we generate column names b and b in 
 the FOREACH
 {code}
 DATA = LOAD 'blah.txt' as (a:long, b:long);
 RESULT = FOREACH DATA GENERATE a, b, (b20?b:0) as b;
 DESCRIBE RESULT;
 dump RESULT;
 {code}
 Pig runs the script successfully and does not complain of the duplicate 
 column names.  I do not know if the new error handling framework will handle 
 these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-644:
--

Assignee: Daniel Dai

 Duplicate column names in foreach do not throw parser error
 ---

 Key: PIG-644
 URL: https://issues.apache.org/jira/browse/PIG-644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: blah.txt, PIG-644-1.patch


 Consider the following Pig script where we generate column names b and b in 
 the FOREACH
 {code}
 DATA = LOAD 'blah.txt' as (a:long, b:long);
 RESULT = FOREACH DATA GENERATE a, b, (b20?b:0) as b;
 DESCRIBE RESULT;
 dump RESULT;
 {code}
 Pig runs the script successfully and does not complain of the duplicate 
 column names.  I do not know if the new error handling framework will handle 
 these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-644:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.2.0)
   0.4.0
   Status: Patch Available  (was: Open)

 Duplicate column names in foreach do not throw parser error
 ---

 Key: PIG-644
 URL: https://issues.apache.org/jira/browse/PIG-644
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: blah.txt, PIG-644-1.patch


 Consider the following Pig script where we generate column names b and b in 
 the FOREACH
 {code}
 DATA = LOAD 'blah.txt' as (a:long, b:long);
 RESULT = FOREACH DATA GENERATE a, b, (b20?b:0) as b;
 DESCRIBE RESULT;
 dump RESULT;
 {code}
 Pig runs the script successfully and does not complain of the duplicate 
 column names.  I do not know if the new error handling framework will handle 
 these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766444#action_12766444
 ] 

Daniel Dai commented on PIG-1016:
-

I think the problem is in current TextDataParser, map is defined as 
String#String, and string exclude special characters such as (, ), ,, so 
busy has no way to generate a tuple in the value field of the map. The approach 
busy took looks valid to me.

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766451#action_12766451
 ] 

Hadoop QA commented on PIG-1020:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422263/PIG-1020-3.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/86/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/86/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/86/console

This message is automatically generated.

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.5.0, 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-993:
---

Status: Open  (was: Patch Available)

 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.6.0

 Attachments: DropColumnGroupExample.java, 
 TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
 zebra-drop-cg.patch, zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-16 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766613#action_12766613
 ] 

Olga Natkovich commented on PIG-1020:
-

+1

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.5.0, 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Open  (was: Patch Available)

 Multi-query optimization throws ClassCastException
 --

 Key: PIG-976
 URL: https://issues.apache.org/jira/browse/PIG-976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
 PIG-976.patch, PIG-976.patch


 Multi-query optimization fails to merge 2 branches when 1 is a result of 
 Group By ALL and another is a result of Group By field1 where field 1 is of 
 type long. Here is the script that fails with multi-query on.
 data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
 A = GROUP data ALL;
 B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
 C = FOREACH B GENERATE (sum1/sum2) AS rate; 
 STORE C INTO 'result1';
 D = GROUP data BY a; 
 E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
 STORE E into 'result2';
  
 Here is the exception from the logs
 java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
 to org.apache.pig.data.DataBag
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Open  (was: Patch Available)

 PERFORMANCE: Implement a map-side group operator to speed up processing of 
 ordered data 
 

 Key: PIG-984
 URL: https://issues.apache.org/jira/browse/PIG-984
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch


 The general group by operation in Pig needs both mappers and reducers (the 
 aggregation is done in reducers). This incurs disk writes/reads  between 
 mappers and reducers.
 However, in the cases where the input data has the following properties
1. The records with the same key are grouped together (such as the data is 
 sorted by the keys).
2. The records with the same key are in the same mapper input.
 the group by operation can be performed in the mappers only and thus remove 
 the overhead of disk writes/reads.
 Alan proposed adding a hint to the group by clause like this one:
 {code}
 A = load 'input' using SomeLoader(...);
 B = group A by $0 using mapside;
 C = foreach B generate ...
 {code}
 The proposed addition of using mapside to group will be a mapside group 
 operator that collects all records for a given key into a buffer. When it 
 sees a key change it will emit the key and bag for records it had buffered. 
 It will assume that all keys for a given record are collected together and 
 thus there is not need to buffer across keys. 
 It is expected that SomeLoader will be implemented by data systems such as 
 Zebra to ensure the data emitted by the loader satisfies the above properties 
 (1) and (2).
 It will be the responsibility of the user (or the loader) to guarantee these 
 properties (1)  (2) before invoking the mapside hint for the group by 
 clause. The Pig runtime can't check for the errors in the input data.
 For the group by clauses with mapside hint, Pig Latin will only support group 
 by columns (including *), not group by expressions nor group all. 
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1011) FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define serialVersionUID

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1011:


Attachment: PIG-1011.patch

 FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define 
 serialVersionUID
 ---

 Key: PIG-1011
 URL: https://issues.apache.org/jira/browse/PIG-1011
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Attachments: PIG-1011.patch


 SnVI  
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct
  is Serializable; consider declaring a SnVI   
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORead
  is Serializable; consider declaring a serialVersionUID

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-10-16 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-953:
---

Attachment: PIG-953-6.patch

Dmitriy - by default when the application does not set an OutputCommitter, 
hadoop uses FileOutputCommitter. So currently (in trunk code) since pig does 
not set an OuptuCommitter, hadoop would be using FileOutputCommitter. Hence I 
derived from FileOutputCommitter so that the current cleanup continues to 
happen and we do the extra commit needed by Zebra.

The new load-store redesign already has an allFinished() method in storeFunc 
which is the same as this commit except it does not have the Configuration - I 
have modified it to have the Configuration parameter.

It turns out zebra needs the job configuration in order to open the right side 
file during merge join. Hence I am introducing an initialize(Configuration 
conf) method into the IndexableLoadFunc interface in the attached patch so that 
the pig runtime can call it allowing zebra to store this configuration for use 
in opening the right side file later.

 Enable merge join in pig to work with loaders and store functions which can 
 internally index sorted data 
 -

 Key: PIG-953
 URL: https://issues.apache.org/jira/browse/PIG-953
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
 PIG-953-5.patch, PIG-953-6.patch, PIG-953.patch


 Currently merge join implementation in pig includes construction of an index 
 on sorted data and use of that index to seek into the right input to 
 efficiently perform the join operation. Some loaders (notably the zebra 
 loader) internally implement an index on sorted data and can perform this 
 seek efficiently using their index. So the use of the index needs to be 
 abstracted in such a way that when the loader supports indexing, pig uses it 
 (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Attachment: PIG-984_1.patch

Fix the compile errors.

 PERFORMANCE: Implement a map-side group operator to speed up processing of 
 ordered data 
 

 Key: PIG-984
 URL: https://issues.apache.org/jira/browse/PIG-984
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch, 
 PIG-984_1.patch


 The general group by operation in Pig needs both mappers and reducers (the 
 aggregation is done in reducers). This incurs disk writes/reads  between 
 mappers and reducers.
 However, in the cases where the input data has the following properties
1. The records with the same key are grouped together (such as the data is 
 sorted by the keys).
2. The records with the same key are in the same mapper input.
 the group by operation can be performed in the mappers only and thus remove 
 the overhead of disk writes/reads.
 Alan proposed adding a hint to the group by clause like this one:
 {code}
 A = load 'input' using SomeLoader(...);
 B = group A by $0 using mapside;
 C = foreach B generate ...
 {code}
 The proposed addition of using mapside to group will be a mapside group 
 operator that collects all records for a given key into a buffer. When it 
 sees a key change it will emit the key and bag for records it had buffered. 
 It will assume that all keys for a given record are collected together and 
 thus there is not need to buffer across keys. 
 It is expected that SomeLoader will be implemented by data systems such as 
 Zebra to ensure the data emitted by the loader satisfies the above properties 
 (1) and (2).
 It will be the responsibility of the user (or the loader) to guarantee these 
 properties (1)  (2) before invoking the mapside hint for the group by 
 clause. The Pig runtime can't check for the errors in the input data.
 For the group by clauses with mapside hint, Pig Latin will only support group 
 by columns (including *), not group by expressions nor group all. 
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-10-16 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-953:
---

Attachment: PIG-953-7.patch

I missed allowing an IOException to be thrown in commit() in 
CommittableStoreFunc and initialize() in IndexableLoadFunc in my previous patch 
- attaching new version with just that change.

 Enable merge join in pig to work with loaders and store functions which can 
 internally index sorted data 
 -

 Key: PIG-953
 URL: https://issues.apache.org/jira/browse/PIG-953
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
 PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953.patch


 Currently merge join implementation in pig includes construction of an index 
 on sorted data and use of that index to seek into the right input to 
 efficiently perform the join operation. Some loaders (notably the zebra 
 loader) internally implement an index on sorted data and can perform this 
 seek efficiently using their index. So the use of the index needs to be 
 abstracted in such a way that when the loader supports indexing, pig uses it 
 (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)
Should be able to set job priority through Pig Latin


 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor


Currently users can set the job name through Pig Latin by saying

set job.name 'my job name'

The ability to set the priority would also be nice, and the patch should be 
small.  The goal is to be able to say

set job.priority 'high'

and throw a JobCreationException in the JobControlCompiler if the priority is 
not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
very_low, low, normal, high, very_high.   Case insensitivity makes this a 
little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1277#action_1277
 ] 

Hadoop QA commented on PIG-790:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422298/PIG-790-1.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/89/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/89/console

This message is automatically generated.

 Error message should indicate in which line number in the Pig script the 
 error occured (debugging BinCond)
 --

 Key: PIG-790
 URL: https://issues.apache.org/jira/browse/PIG-790
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
 pig_1240972895275.log


 I have a simple Pig script which loads integer data and does a Bincond, where 
 it compares, (col1 eq ''). There is an error message that is generated in 
 this case, but it does not specify the line number in the script. 
 {code}
 MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
 col2:int);
 MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
  ((col1 neq '') ? col1 - col2 : 
 16)
 as time_diff;
 dump MYDATA_PROJECT;
 {code}
 ==
 2009-04-29 02:33:07,182 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-04-29 02:33:08,584 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
 graph.
 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
 side:chararray
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
 ==
 It would be good if the error message has a line number and a copy of the 
 line in the script which is causing the problem.
 Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1013) FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1013:


Status: Patch Available  (was: Open)

 FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array
 

 Key: PIG-1013
 URL: https://issues.apache.org/jira/browse/PIG-1013
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Attachments: PIG-1013.patch


 DMI   Invocation of toString on stackTraceLines in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(String[],
  int)
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToDouble(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToFloat(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToInteger(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToLong(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToMap(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(byte[])
 DMI   Invocation of toString on args in 
 org.apache.pig.impl.PigContext.instantiateFuncFromSpec(FuncSpec)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1013) FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1013:


Attachment: PIG-1013.patch

 FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array
 

 Key: PIG-1013
 URL: https://issues.apache.org/jira/browse/PIG-1013
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Attachments: PIG-1013.patch


 DMI   Invocation of toString on stackTraceLines in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(String[],
  int)
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToDouble(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToFloat(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToInteger(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToLong(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToMap(byte[])
 DMI   Invocation of toString on b in 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(byte[])
 DMI   Invocation of toString on args in 
 org.apache.pig.impl.PigContext.instantiateFuncFromSpec(FuncSpec)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Weil updated PIG-1025:


Attachment: PIG-1025.patch

 Should be able to set job priority through Pig Latin
 

 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1025.patch


 Currently users can set the job name through Pig Latin by saying
 set job.name 'my job name'
 The ability to set the priority would also be nice, and the patch should be 
 small.  The goal is to be able to say
 set job.priority 'high'
 and throw a JobCreationException in the JobControlCompiler if the priority is 
 not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
 very_low, low, normal, high, very_high.   Case insensitivity makes this a 
 little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1026) [zebra] map split returns null

2009-10-16 Thread Jing Huang (JIRA)
[zebra] map split returns null
--

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0


Here is the test scenario:
 final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
  //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1];
 final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1,m2];

projection: String projection2 = new String(m1#{b}, m2#{x|z});
User got null pointer exception on reading m1#{b}.

Yan, please refer to the test class:
TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread hc busy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766710#action_12766710
 ] 

hc busy commented on PIG-1016:
--

'kay, since my last comment, I've verified that in trunk, the patch in this 
ticket did not introduce an error. the Skewed join (correct or not) is 
returning the same number of rows when data read in is from a nested data 
structure as data read in from a tuple.

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1017) Converts strings to text in Pig

2009-10-16 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-1017:
-

Status: Patch Available  (was: Open)

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1017) Converts strings to text in Pig

2009-10-16 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-1017:
-

Attachment: stotext.patch

The patch will fail MRCompiler and LogToPhyTransalator unit tests since we need 
to replace the golden files. The rest should pass.

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1017) Converts strings to text in Pig

2009-10-16 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath reassigned PIG-1017:


Assignee: Sriranjan Manjunath

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-993.


Resolution: Fixed

Patch checked in.

 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.6.0

 Attachments: DropColumnGroupExample.java, 
 TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
 zebra-drop-cg.patch, zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766746#action_12766746
 ] 

Alan Gates commented on PIG-928:


I ran some quick and sloppy performance tests on this.  I ran it using both BSF 
and direct bindings to groovy.  I also ran it using the builtin TOKENIZE 
function in Pig.  I had it read 5000 lines of text.  The groovy (or TOKENIZE) 
functions handle splitting the line, then we do a standard group/count to count 
the words.  I got the following results:

Groovy using BSF:  55.070 seconds
Groovy direct bindings:  58.560 seconds
TOKENIZE:  2.554 seconds

So a 30x slow down using this.  That's pretty painful.  I know string 
translation between languages can be bad.  I don't know how much of this is 
inter-language bindings and how much is groovy.  When i get  chance I'll try 
this in Python and see if I get similar numbers.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766750#action_12766750
 ] 

Ashutosh Chauhan commented on PIG-928:
--

30x is indeed too slow. But, between BSF and direct bindings, I imagine direct 
bindings should have been more performant, since BSF adds an extra layer of 
translation. Isn't it ? 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766757#action_12766757
 ] 

Alan Gates commented on PIG-928:


I expected to see the direct bindings to be faster as well, but the tests 
didn't show that.  In the code contributed by Kishore the type translation was 
done the same regardless of the bindings used.  Perhaps there would be a more 
efficient way to do the type translation for direct bindings.  

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766763#action_12766763
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Though good learning from this test is BSF is not slower then direct bindings 
(need additional verifications though..) So, this feature could be implemented 
in lot less code and complexity using BSF as oppose to using different direct 
bindings for different languages.  On the other hand, only useful language BSF 
supports currently is Ruby. Not sure how many people using Pig will also be 
interested in groovy, javascript etc.( other languages supported by BSF ).

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766769#action_12766769
 ] 

Alan Gates commented on PIG-928:


jython was the one I was assuming people would want.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766771#action_12766771
 ] 

Ashutosh Chauhan commented on PIG-1025:
---

Useful feature. Patch looks straightforward. In your test case you are only 
testing whether it parses it correctly or not, I will suggest to also test 
whether priority is actually set in the jobconf or not.

 Should be able to set job priority through Pig Latin
 

 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1025.patch


 Currently users can set the job name through Pig Latin by saying
 set job.name 'my job name'
 The ability to set the priority would also be nice, and the patch should be 
 small.  The goal is to be able to say
 set job.priority 'high'
 and throw a JobCreationException in the JobControlCompiler if the priority is 
 not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
 very_low, low, normal, high, very_high.   Case insensitivity makes this a 
 little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766774#action_12766774
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Right, I overlooked it. I think Ruby and Python are two most widely used 
scripting languages and both are supported by BSF. So, comparing BSF with 
direct bindings:
1) Performance : Initial test shows almost equal.
2) Support of multiple languages.
3) Ease of implementation 
To me, BSF seems to be the way to go for this, atleast the first-cut. 
Implementing this feature using BSF will allow us to expose this to users 
quickly and if many people are using it and finding one particular language to 
be slow then we can explore language bindings for that particular language. 
Thoughts?

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766787#action_12766787
 ] 

Hadoop QA commented on PIG-1016:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422303/PIG-1016.patch
  against trunk revision 826047.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/console

This message is automatically generated.

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #592

2009-10-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/592/changes

Changes:

[gates] PIG-993 Ability to drop a column group in a table.

[gates] PIG-858: Order By followed by replicated join fails while compiling 
MR-plan from physical plan.

[daijy] PIG-1020: Include an ant target to build pig.jar without hadoop 
libraries

--
[...truncated 2547 lines...]

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: 
org.apache.pig#Pig;2009-10-17_00-27-30
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in default
[ivy:resolve] :: resolution report :: resolve 59ms :: artifacts dl 4ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
| buildJar |   4   |   0   |   0   |   0   ||   4   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/4ms)

buildJar:
 [echo] svnString 826142
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/pig-2009-10-17_00-27-30.jar
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk

jarWithOutSvn:

findbugs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] The following classes needed for analysis were missing:
 [findbugs]   com.jcraft.jsch.SocketFactory
 [findbugs]   com.jcraft.jsch.Logger
 [findbugs]   jline.Completor
 [findbugs]   com.jcraft.jsch.Session
 [findbugs]   com.jcraft.jsch.HostKeyRepository
 [findbugs]   com.jcraft.jsch.JSch
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   jline.ConsoleReaderInputStream
 [findbugs]   com.jcraft.jsch.HostKey
 [findbugs]   jline.ConsoleReader
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   jline.History
 [findbugs]   com.jcraft.jsch.ChannelDirectTCPIP
 [findbugs]   com.jcraft.jsch.JSchException
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs] Warnings generated: 392
 [findbugs] Missing classes: 16
 [findbugs] Calculating exit code...
 [findbugs] Setting 'missing class' flag (2)
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 3
 [findbugs] Java Result: 3
 [findbugs] Classes needed for analysis were missing
 [findbugs] Output saved to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 [xslt] Processing 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.html
 [xslt] Loading stylesheet 
/homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl

BUILD SUCCESSFUL
Total time: 2 minutes 49 seconds
+ mv build/pig-2009-10-17_00-27-30.tar.gz 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/test/findbugs 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/docs/api 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src-gen
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src/docs/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/test/org/apache/pig/test/utils/dotGraph/parser

BUILD SUCCESSFUL
Total time: 0 seconds
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant 
-Dtest.junit.output.format=xml -Dtest.output=yes 
-Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true 
-Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test 
generate-clover-reports
Buildfile: build.xml

clover.setup:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db
[clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732)
[clover-setup] Loaded from: 
/homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar
[clover-setup] Clover: Open Source License registered to Apache Software 
Foundation.
[clover-setup] Clover is enabled with 

[jira] Updated: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-10-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-944:
-

Assignee: Ying He  (was: Yan Zhou)
  Status: Patch Available  (was: Open)

 Zebra schema is taken from Pig through TableStorer's construct
 --

 Key: PIG-944
 URL: https://issues.apache.org/jira/browse/PIG-944
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
Assignee: Ying He
 Fix For: 0.6.0

 Attachments: SchemaConversion.patch, SchemaConversion.patch


 It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
 because the information is dynamic in Pig's execution engine and should not 
 be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766838#action_12766838
 ] 

Kevin Weil commented on PIG-1025:
-

I very much agree that the test case is weak.  I followed the model for the 
rest of the grunt tests, which are similarly weak :) 

 Should be able to set job priority through Pig Latin
 

 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1025.patch


 Currently users can set the job name through Pig Latin by saying
 set job.name 'my job name'
 The ability to set the priority would also be nice, and the patch should be 
 small.  The goal is to be able to say
 set job.priority 'high'
 and throw a JobCreationException in the JobControlCompiler if the priority is 
 not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
 very_low, low, normal, high, very_high.   Case insensitivity makes this a 
 little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.