Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-27 Thread Mathias Herberts

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/
---

(Updated June 28, 2012, 6:25 a.m.)


Review request for pig, Julien Le Dem and Jonathan Coveney.


Changes
---

Added support for non static methods (with associated unit test). Modified 
DataBagIterator to throw RuntimeException when it encounters an exception in 
pigToGroovy.


Description
---

Adds support for Groovy UDFs in Pig.


This addresses bug PIG-2763.
https://issues.apache.org/jira/browse/PIG-2763


Diffs (updated)
-

  /trunk/ivy.xml 1353307 
  /trunk/ivy/libraries.properties 1353307 
  /trunk/src/org/apache/pig/scripting/ScriptEngine.java 1354285 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicFinal.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicInitial.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/OutputSchemaFunction.java 
PRE-CREATION 
  /trunk/test/org/apache/pig/test/TestUDFGroovy.java PRE-CREATION 
  /trunk/test/unit-tests 1353307 

Diff: https://reviews.apache.org/r/5591/diff/


Testing
---


Thanks,

Mathias Herberts



Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-27 Thread Mathias Herberts


> On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
> > /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java, line 195
> > 
> >
> > It seems weird to allow Groovy static methods as UDFs. I suppose there 
> > is no harm in it, but given that in Pig all UDF's imply that they are 
> > instantiated, it proposes a potential strong departure from how people 
> > typically should think about UDF's.
> 
> Mathias Herberts wrote:
> As stated earlier, a Groovy class should really be seen as a container 
> for multiple UDFs, not as containing a single one.
> 
> Non static methods are needed for Accumulator UDFs, all other UDFs 
> maintain no state, thus the use of static methods. I guess non static methods 
> could be supported too.
> 
> Julien Le Dem wrote:
> For stateless methods that don't need initialization, static methods are 
> easier. We should allow both

I added support of both in PIG-2763-3.patch


> On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
> > /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java, line 271
> > 
> >
> > I don't know if it should throw away the exception like this.
> 
> Mathias Herberts wrote:
> What would you recommend? Throwing RuntimeException?
> 
> Julien Le Dem wrote:
> yes. And chain the cause

Done in PIG-2763-3.patch


- Mathias


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/#review8628
---


On June 26, 2012, 11:26 p.m., Mathias Herberts wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5591/
> ---
> 
> (Updated June 26, 2012, 11:26 p.m.)
> 
> 
> Review request for pig, Julien Le Dem and Jonathan Coveney.
> 
> 
> Description
> ---
> 
> Adds support for Groovy UDFs in Pig.
> 
> 
> This addresses bug PIG-2763.
> https://issues.apache.org/jira/browse/PIG-2763
> 
> 
> Diffs
> -
> 
>   /trunk/ivy.xml 1353307 
>   /trunk/ivy/libraries.properties 1353307 
>   /trunk/src/org/apache/pig/scripting/ScriptEngine.java 1354285 
>   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicFinal.java PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicInitial.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/OutputSchemaFunction.java 
> PRE-CREATION 
>   /trunk/test/org/apache/pig/test/TestUDFGroovy.java PRE-CREATION 
>   /trunk/test/unit-tests 1353307 
> 
> Diff: https://reviews.apache.org/r/5591/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mathias Herberts
> 
>



[jira] [Created] (PIG-2776) Extending merge join to work with left outer joins

2012-06-27 Thread Aneesh Sharma (JIRA)
Aneesh Sharma created PIG-2776:
--

 Summary: Extending merge join to work with left outer joins
 Key: PIG-2776
 URL: https://issues.apache.org/jira/browse/PIG-2776
 Project: Pig
  Issue Type: Improvement
Reporter: Aneesh Sharma


The current merge join implementation only allows for an inner join while the 
idea seems to be apply equally well to a left outer join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-27 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402790#comment-13402790
 ] 

Jie Li commented on PIG-2661:
-

Sure will post some numbers tomorrow.

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-483) PERFORMANCE: different strategies for large and small order bys

2012-06-27 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-483:
---

Attachment: PIG-483.0.patch

Attached a patch that introduced SkipJob. The output of order-by on small 
dataset would look like:

Job Stats (time in seconds):
JobId   Alias   Feature Outputs
job_local_0001  a   MAP_ONLY
job_local_0002  b   ORDER_BYfile:/tmp/temp-107984693/tmp2050404975,
skipped_job b   SAMPLER 

Input(s):
Successfully read records from: "file:///Users/JieLi/git/pig-git/1.txt"

Output(s):
Successfully stored records in: "file:/tmp/temp-107984693/tmp2050404975"

Job DAG:
job_local_0001  ->  skipped_job,
skipped_job ->  job_local_0002,
job_local_0002

> PERFORMANCE: different strategies for large and small order bys
> ---
>
> Key: PIG-483
> URL: https://issues.apache.org/jira/browse/PIG-483
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>  Labels: gsoc2011, performance
> Attachments: PIG-483.0.patch
>
>
> Currently pig always does a multi-pass order by where it first determines a 
> distribution for the keys and then orders in a second pass.  This avoids the 
> necessity of having a single reducer.  However, in cases where the data is 
> small enough to fit into a single reducer, this is inefficient.  For small 
> data sets it would be good to realize the small size of the set and do the 
> order by in a single pass with a single reducer.
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-27 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated PIG-2766:


Attachment: PIG-2766_6.patch

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch, 
> PIG-2766_4.patch, PIG-2766_5.patch, PIG-2766_6.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-27 Thread Julien Le Dem


> On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
> > /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java, line 195
> > 
> >
> > It seems weird to allow Groovy static methods as UDFs. I suppose there 
> > is no harm in it, but given that in Pig all UDF's imply that they are 
> > instantiated, it proposes a potential strong departure from how people 
> > typically should think about UDF's.
> 
> Mathias Herberts wrote:
> As stated earlier, a Groovy class should really be seen as a container 
> for multiple UDFs, not as containing a single one.
> 
> Non static methods are needed for Accumulator UDFs, all other UDFs 
> maintain no state, thus the use of static methods. I guess non static methods 
> could be supported too.

For stateless methods that don't need initialization, static methods are 
easier. We should allow both


> On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
> > /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java, line 271
> > 
> >
> > I don't know if it should throw away the exception like this.
> 
> Mathias Herberts wrote:
> What would you recommend? Throwing RuntimeException?

yes. And chain the cause


- Julien


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/#review8628
---


On June 26, 2012, 11:26 p.m., Mathias Herberts wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5591/
> ---
> 
> (Updated June 26, 2012, 11:26 p.m.)
> 
> 
> Review request for pig, Julien Le Dem and Jonathan Coveney.
> 
> 
> Description
> ---
> 
> Adds support for Groovy UDFs in Pig.
> 
> 
> This addresses bug PIG-2763.
> https://issues.apache.org/jira/browse/PIG-2763
> 
> 
> Diffs
> -
> 
>   /trunk/ivy.xml 1353307 
>   /trunk/ivy/libraries.properties 1353307 
>   /trunk/src/org/apache/pig/scripting/ScriptEngine.java 1354285 
>   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicFinal.java PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicInitial.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java 
> PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java PRE-CREATION 
>   /trunk/src/org/apache/pig/scripting/groovy/OutputSchemaFunction.java 
> PRE-CREATION 
>   /trunk/test/org/apache/pig/test/TestUDFGroovy.java PRE-CREATION 
>   /trunk/test/unit-tests 1353307 
> 
> Diff: https://reviews.apache.org/r/5591/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mathias Herberts
> 
>



[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-27 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated PIG-2766:


Attachment: PIG-2766_5.patch

Added handling for PIG_OPTS route for adding jars.

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch, 
> PIG-2766_4.patch, PIG-2766_5.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2750) add artifacts to the ivy.xml for other jars Pig generates

2012-06-27 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2750.


   Resolution: Fixed
Fix Version/s: 0.11
 Assignee: Julien Le Dem

> add artifacts to the ivy.xml for other jars Pig generates
> -
>
> Key: PIG-2750
> URL: https://issues.apache.org/jira/browse/PIG-2750
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2750.patch
>
>
> the following artifacts are generated and should be declared to allow using 
> ivy's features
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402722#comment-13402722
 ] 

Thejas M Nair commented on PIG-2774:


If the left side relations tuples for a value of join key are serialized to 
disk, then for ever value of join key in right relation, it will hit the disk. 
That will perform very poorly.
Looks like what we need is something like a merge-skew join. Ie, similar to 
skew join,  sample left side, and partition the splits for map tasks based on 
sampled information. 

> Fix merge join to work with many duplicate left keys
> 
>
> Key: PIG-2774
> URL: https://issues.apache.org/jira/browse/PIG-2774
> Project: Pig
>  Issue Type: Bug
>Reporter: Aneesh Sharma
>
> A merge join can throw an OOM error if the number of duplicate left tuples is 
> large as it accumulates all of them in memory. There are two solutions around 
> this problem:
> 1. Serialize the accumulated tuples to disk if they exceed a certain size.
> 2. Spit out join output periodically, and re-seek on the right hand side 
> index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2775) Register jar does not goes to classpath in some cases

2012-06-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2775.
-

  Resolution: Fixed
Assignee: Daniel Dai
Hadoop Flags: Reviewed

Patch committed to 0.9/0.10/trunk. Thanks Dmitriy for reviewing!

> Register jar does not goes to classpath in some cases
> -
>
> Key: PIG-2775
> URL: https://issues.apache.org/jira/browse/PIG-2775
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2, 0.10.0, 0.11
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2775-1.patch
>
>
> In PIG-2532, we fix this issue in load side, but we still have issue in store 
> side, see 
> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201206.mbox/%3CCAB2zpW9MY6t-NMdOJ-%2B0ezt0NJOFrxsJ7kc%3DR4WJh%2Bn-9bDW2g%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2775) Register jar does not goes to classpath in some cases

2012-06-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402713#comment-13402713
 ] 

Dmitriy V. Ryaboy commented on PIG-2775:


+1

> Register jar does not goes to classpath in some cases
> -
>
> Key: PIG-2775
> URL: https://issues.apache.org/jira/browse/PIG-2775
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2, 0.10.0, 0.11
>Reporter: Daniel Dai
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2775-1.patch
>
>
> In PIG-2532, we fix this issue in load side, but we still have issue in store 
> side, see 
> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201206.mbox/%3CCAB2zpW9MY6t-NMdOJ-%2B0ezt0NJOFrxsJ7kc%3DR4WJh%2Bn-9bDW2g%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2697) pretty print schema

2012-06-27 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402709#comment-13402709
 ] 

Jonathan Coveney commented on PIG-2697:
---

Fix is in. Thanks Julien.

> pretty print schema
> ---
>
> Key: PIG-2697
> URL: https://issues.apache.org/jira/browse/PIG-2697
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.11
>
> Attachments: PIG-2697-fix-0.patch, PIG-2697.patch, PIG-2697.patch
>
>
> currently 'describe' dumps the schema in one line. If you have a long or 
> complicated schema, it is pretty much impossible to figure out how the schema 
> looks or what the fileds are.
> will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2750) add artifacts to the ivy.xml for other jars Pig generates

2012-06-27 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402705#comment-13402705
 ] 

Jonathan Coveney commented on PIG-2750:
---

+1

> add artifacts to the ivy.xml for other jars Pig generates
> -
>
> Key: PIG-2750
> URL: https://issues.apache.org/jira/browse/PIG-2750
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Attachments: PIG-2750.patch
>
>
> the following artifacts are generated and should be declared to allow using 
> ivy's features
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2775) Register jar does not goes to classpath in some cases

2012-06-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2775:


Attachment: PIG-2775-1.patch

> Register jar does not goes to classpath in some cases
> -
>
> Key: PIG-2775
> URL: https://issues.apache.org/jira/browse/PIG-2775
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.2, 0.10.0, 0.11
>Reporter: Daniel Dai
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2775-1.patch
>
>
> In PIG-2532, we fix this issue in load side, but we still have issue in store 
> side, see 
> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201206.mbox/%3CCAB2zpW9MY6t-NMdOJ-%2B0ezt0NJOFrxsJ7kc%3DR4WJh%2Bn-9bDW2g%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402696#comment-13402696
 ] 

Dmitriy V. Ryaboy commented on PIG-2661:


I am easily convinced by numbers :)

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2774) Fix merge join to work with many duplicate left keys

2012-06-27 Thread Aneesh Sharma (JIRA)
Aneesh Sharma created PIG-2774:
--

 Summary: Fix merge join to work with many duplicate left keys
 Key: PIG-2774
 URL: https://issues.apache.org/jira/browse/PIG-2774
 Project: Pig
  Issue Type: Bug
Reporter: Aneesh Sharma


A merge join can throw an OOM error if the number of duplicate left tuples is 
large as it accumulates all of them in memory. There are two solutions around 
this problem:
1. Serialize the accumulated tuples to disk if they exceed a certain size.
2. Spit out join output periodically, and re-seek on the right hand side index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2775) Register jar does not goes to classpath in some cases

2012-06-27 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-2775:
---

 Summary: Register jar does not goes to classpath in some cases
 Key: PIG-2775
 URL: https://issues.apache.org/jira/browse/PIG-2775
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0, 0.9.2, 0.11
Reporter: Daniel Dai
 Fix For: 0.9.3, 0.11, 0.10.1


In PIG-2532, we fix this issue in load side, but we still have issue in store 
side, see 
http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/201206.mbox/%3CCAB2zpW9MY6t-NMdOJ-%2B0ezt0NJOFrxsJ7kc%3DR4WJh%2Bn-9bDW2g%40mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2773) Doing a merge join requires setting pig.noSplitCombination=true which should always be set internally for a merge join

2012-06-27 Thread Aneesh Sharma (JIRA)
Aneesh Sharma created PIG-2773:
--

 Summary: Doing a merge join requires setting 
pig.noSplitCombination=true which should always be set internally for a merge 
join
 Key: PIG-2773
 URL: https://issues.apache.org/jira/browse/PIG-2773
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Aneesh Sharma


The merge join requires setting: pig.noSplitCombination=true as otherwise the 
tuples get out of order. This should be done internally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-27 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402688#comment-13402688
 ] 

Jie Li commented on PIG-2661:
-

In this case, it'll read up to 100 bags, which can be a lot of data?

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2697) pretty print schema

2012-06-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402678#comment-13402678
 ] 

Julien Le Dem commented on PIG-2697:


+1

> pretty print schema
> ---
>
> Key: PIG-2697
> URL: https://issues.apache.org/jira/browse/PIG-2697
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.11
>
> Attachments: PIG-2697-fix-0.patch, PIG-2697.patch, PIG-2697.patch
>
>
> currently 'describe' dumps the schema in one line. If you have a long or 
> complicated schema, it is pretty much impossible to figure out how the schema 
> looks or what the fileds are.
> will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402676#comment-13402676
 ] 

Dmitriy V. Ryaboy commented on PIG-2661:


Sampler is *fast* though. Like, really fast. It doesn't read most of the data 
it's sampling.

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2697) pretty print schema

2012-06-27 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2697:
--

Attachment: PIG-2697-fix-0.patch

Good call Julien. Not sure how this happened as I ran the tests, but *shrug*. 
This fixes it.

> pretty print schema
> ---
>
> Key: PIG-2697
> URL: https://issues.apache.org/jira/browse/PIG-2697
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.11
>
> Attachments: PIG-2697-fix-0.patch, PIG-2697.patch, PIG-2697.patch
>
>
> currently 'describe' dumps the schema in one line. If you have a long or 
> complicated schema, it is pretty much impossible to figure out how the schema 
> looks or what the fileds are.
> will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2750) add artifacts to the ivy.xml for other jars Pig generates

2012-06-27 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2750:
---

Patch Info: Patch Available

> add artifacts to the ivy.xml for other jars Pig generates
> -
>
> Key: PIG-2750
> URL: https://issues.apache.org/jira/browse/PIG-2750
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Attachments: PIG-2750.patch
>
>
> the following artifacts are generated and should be declared to allow using 
> ivy's features
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2750) add artifacts to the ivy.xml for other jars Pig generates

2012-06-27 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2750:
---

Attachment: PIG-2750.patch

PIG-2750.patch adds the artifacts to the ivy file

> add artifacts to the ivy.xml for other jars Pig generates
> -
>
> Key: PIG-2750
> URL: https://issues.apache.org/jira/browse/PIG-2750
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Attachments: PIG-2750.patch
>
>
> the following artifacts are generated and should be declared to allow using 
> ivy's features
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2772) Convert the skew join to the normal join for small dataset

2012-06-27 Thread Jie Li (JIRA)
Jie Li created PIG-2772:
---

 Summary: Convert the skew join to the normal join for small dataset
 Key: PIG-2772
 URL: https://issues.apache.org/jira/browse/PIG-2772
 Project: Pig
  Issue Type: Bug
Reporter: Jie Li


Similar to PIG-483 that we want to avoid the unnecessary sampling job for small 
dataset.

Any easy way to do the runtime conversion?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-483) PERFORMANCE: different strategies for large and small order bys

2012-06-27 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402661#comment-13402661
 ] 

Jie Li commented on PIG-483:


For the skew join, if the partition table turns out to be small, then we can 
convert it to a normal join, which doesn't need the sampler either. Would open 
another jira for that.

> PERFORMANCE: different strategies for large and small order bys
> ---
>
> Key: PIG-483
> URL: https://issues.apache.org/jira/browse/PIG-483
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>  Labels: gsoc2011, performance
>
> Currently pig always does a multi-pass order by where it first determines a 
> distribution for the keys and then orders in a second pass.  This avoids the 
> necessity of having a single reducer.  However, in cases where the data is 
> small enough to fit into a single reducer, this is inefficient.  For small 
> data sets it would be good to realize the small size of the set and do the 
> order by in a single pass with a single reducer.
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2697) pretty print schema

2012-06-27 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402650#comment-13402650
 ] 

Jonathan Coveney commented on PIG-2697:
---

Julien,

I was not able to reproduce this. Can you try to give me specifics on how to 
reproduce?

> pretty print schema
> ---
>
> Key: PIG-2697
> URL: https://issues.apache.org/jira/browse/PIG-2697
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.11
>
> Attachments: PIG-2697.patch, PIG-2697.patch
>
>
> currently 'describe' dumps the schema in one line. If you have a long or 
> complicated schema, it is pretty much impossible to figure out how the schema 
> looks or what the fileds are.
> will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2681) TestDriverPig.countStores() does not correctly count the number of stores for pig scripts using variables for the alias

2012-06-27 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402621#comment-13402621
 ] 

Daniel Dai commented on PIG-2681:
-

The fix only works if the macro is invoked once. Maybe a better and more 
elegant solution is needed.

> TestDriverPig.countStores() does not correctly count the number of stores for 
> pig scripts using variables for the alias
> ---
>
> Key: PIG-2681
> URL: https://issues.apache.org/jira/browse/PIG-2681
> Project: Pig
>  Issue Type: Test
>  Components: e2e harness
>Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0
>Reporter: Araceli Henley
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2681.patch
>
>
> For  pig macros where the out parameter is referenced in a store statement, 
> the TestDriveP.countStores() does not correctly count the number of stores:
> For example, the store will not be counted in :
> define myMacro(in1,in2) returns A {
>  A  = load '$in1' using PigStorage('$delimeter') as (intnum1000: int,id: 
> int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
>store $A into '$out';
> }
>  countStores() matches with:
>  $count += $q[$i] =~ /store\s+[a-zA-Z][a-zA-Z0-9_]*\s+into/i;
> Since the alias has a special character "$" it doesn't count it and the test 
> fails.
> Need to change this to:
>$count += $q[$i] =~ /store\s+(\$)?[a-zA-Z][a-zA-Z0-9_]*\s+into/i;
> I'll submit a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-27 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402619#comment-13402619
 ] 

Jie Li commented on PIG-2661:
-

{code}
a = load 'in' as (g:{});
b = foreach a generate flatten(g);
c = order b by $1;
dump c;
{code}

For this query, if we merge the flatten into the sample, then the sample will 
read 100 bags and flatten them to possibly unlimited number of records, all of 
which will flow through one single reducer of the sampling job. 

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1267

2012-06-27 Thread Apache Jenkins Server
See 

Changes:

[julien] PIG-2748: Change the names of the jar produced in the build folder to 
match maven conventions (julien)

[daijy] PIG-2746: Pig doesn't detect all forms of compression extensions 
properly

--
[...truncated 3532 lines...]
 [exec] Fetching plugins descriptor: 
http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] Getting: http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] To: 

 [exec] local file date : Tue Feb 01 02:18:42 UTC 2011
 [exec] ..
 [exec] last modified = Fri Jun 10 08:37:02 UTC 2011
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/plugins.xml.
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/whiteboard-plugins.xml.
 [exec] 
 [exec] init-plugins:
 [exec] Created dir: 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] 
 [exec]   --
 [exec]   Installing plugin: org.apache.forrest.plugin.output.pdf
 [exec]   --
 [exec]
 [exec] 
 [exec] check-plugin:
 [exec] org.apache.forrest.plugin.output.pdf is available in the build dir. 
Trying to update it...
 [exec] 
 [exec] init-props:
 [exec] 
 [exec] echo-settings-condition:
 [exec] 
 [exec] echo-settings:
 [exec] 
 [exec] init-proxy:
 [exec] 
 [exec] fetch-plugins-descriptors:
 [exec] 
 [exec] fetch-plugin:
 [exec] Trying to find the description of 
org.apache.forrest.plugin.output.pdf in the different descriptor files
 [exec] Using the descriptor file 

 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginlist2fetch.xsl
 [exec] 
 [exec] fetch-local-unversioned-plugin:
 [exec] 
 [exec] get-local:
 [exec] Trying to locally get org.apache.forrest.plugin.output.pdf
 [exec] Looking in local /home/jenkins/tools/forrest/latest/plugins
 [exec] Found !
 [exec] 
 [exec] init-build-compiler:
 [exec] 
 [exec] echo-init:
 [exec] 
 [exec] init:
 [exec] 
 [exec] compile:
 [exec] 
 [exec] jar:
 [exec] 
 [exec] local-deploy:
 [exec] Locally deploying org.apache.forrest.plugin.output.pdf
 [exec] 
 [exec] build:
 [exec] Plugin org.apache.forrest.plugin.output.pdf deployed ! Ready to 
configure
 [exec] 
 [exec] fetch-remote-unversioned-plugin-version-forrest:
 [exec] 
 [exec] fetch-remote-unversioned-plugin-unversion-forrest:
 [exec] 
 [exec] has-been-downloaded:
 [exec] 
 [exec] downloaded-message:
 [exec] 
 [exec] uptodate-message:
 [exec] 
 [exec] not-found-message:
 [exec] Fetch-plugin Ok, installing !
 [exec] 
 [exec] unpack-plugin:
 [exec] 
 [exec] install-plugin:
 [exec] 
 [exec] configure-plugin:
 [exec] 
 [exec] configure-output-plugin:
 [exec] Mounting output plugin: org.apache.forrest.plugin.output.pdf
 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginMountSnippet.xsl
 [exec] Moving 1 file to 

 [exec] 
 [exec] configure-plugin-locationmap:
 [exec] Mounting plugin locationmap for org.apache.forrest.plugin.output.pdf
 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginLmMountSnippet.xsl
 [exec] Moving 1 file to 


[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-27 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated PIG-2766:


Attachment: PIG-2766_4.patch

Removed hive conf from the additional jars list. Added it only to the 
classpath. 

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch, 
> PIG-2766_4.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2748) Change the names of the jar produced in the build folder to match maven conventions

2012-06-27 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2748.


   Resolution: Fixed
Fix Version/s: 0.11

> Change the names of the jar produced in the build folder to match maven 
> conventions
> ---
>
> Key: PIG-2748
> URL: https://issues.apache.org/jira/browse/PIG-2748
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2748.patch
>
>
> {noformat}
> pig-{version}-core.jar becomes pig-{version}.jar
> pig-{version}.jar becomes pig-{version}-withdependencies.jar
> - 
> +  value="${build.dir}/${final.name}-withdependencies.jar" />
> -  value="${build.dir}/${final.name}-core.jar" />
> +  />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Can we set up a pig-git reviewboard, similar to hcatalog-git?

2012-06-27 Thread Jonathan Coveney
Including HCat guys

2012/6/25 Dmitriy Ryaboy 

> +1. Hcat people, how did you guys do that?
> Infra ticket?
>
> D
>
> On Thu, Jun 21, 2012 at 11:37 AM, Jonathan Coveney 
> wrote:
> > Currently our pig reviewboard is configured for svn diffs... I believe it
> > is possible to set up another reviewboard base that would accept diffs
> from
> > git. How was this done for HCat?
>


[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402553#comment-13402553
 ] 

Dmitriy V. Ryaboy commented on PIG-2661:


I am not sure we should disable this due to concern of how expensive sampling 
is. Do you have examples that show this being worthwhile? 

> Pig uses an extra job for loading data in Pigmix L9
> ---
>
> Key: PIG-2661
> URL: https://issues.apache.org/jira/browse/PIG-2661
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Jie Li
>Assignee: Jie Li
> Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch
>
>
> See 
> https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2748) Change the names of the jar produced in the build folder to match maven conventions

2012-06-27 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402551#comment-13402551
 ] 

Daniel Dai commented on PIG-2748:
-

+1

> Change the names of the jar produced in the build folder to match maven 
> conventions
> ---
>
> Key: PIG-2748
> URL: https://issues.apache.org/jira/browse/PIG-2748
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2748.patch
>
>
> {noformat}
> pig-{version}-core.jar becomes pig-{version}.jar
> pig-{version}.jar becomes pig-{version}-withdependencies.jar
> - 
> +  value="${build.dir}/${final.name}-withdependencies.jar" />
> -  value="${build.dir}/${final.name}-core.jar" />
> +  />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2766) Pig-HCat Usability

2012-06-27 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402542#comment-13402542
 ] 

Vikram Dixit K commented on PIG-2766:
-

Fixed 1 and 2.
3 was just to keep it consistent with the rest of the file.

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2766) Pig-HCat Usability

2012-06-27 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated PIG-2766:


Attachment: PIG-2766_3.patch

Updated with comments incorporated.

> Pig-HCat Usability
> --
>
> Key: PIG-2766
> URL: https://issues.apache.org/jira/browse/PIG-2766
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.10.0
>
> Attachments: PIG-2766.patch, PIG-2766_2.patch, PIG-2766_3.patch
>
>
> Currently to use hcat from pig (via HCatLoader/HCatStorer) user need to 
> register bunch of jars and set couple of configuration. For a novice user, it 
> is non-trivial to find all the relevant jars and config params. We should 
> have better integration between Pig & HCat by pre-configuring Pig to load all 
> these jars and configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402517#comment-13402517
 ] 

Russell Jurney commented on PIG-1314:
-

This sounds good to me.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2748) Change the names of the jar produced in the build folder to match maven conventions

2012-06-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402474#comment-13402474
 ] 

Julien Le Dem commented on PIG-2748:


mvn-install works correctly.
mvn-deploy fails with 401 as it should because I'm not authorized to deploy.
Daniel, can you take a quick look ? I'd like to check this in.

> Change the names of the jar produced in the build folder to match maven 
> conventions
> ---
>
> Key: PIG-2748
> URL: https://issues.apache.org/jira/browse/PIG-2748
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2748.patch
>
>
> {noformat}
> pig-{version}-core.jar becomes pig-{version}.jar
> pig-{version}.jar becomes pig-{version}-withdependencies.jar
> - 
> +  value="${build.dir}/${final.name}-withdependencies.jar" />
> -  value="${build.dir}/${final.name}-core.jar" />
> +  />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402467#comment-13402467
 ] 

Thejas M Nair commented on PIG-1314:


bq. Or we temporally set aside the performance issue right now, and move 
forward to make timezone serialization work by simply serializing the timezone 
id string.
We can add features later, but dropping features later won't be good. In my 
opinion, the support for long timezone name is not going to be needed by most 
people. I think we can support it only for creating a DateTime field, but say 
that pig will not preserve the long name. Pig will only retain hours+minute 
offset (no seconds and milliseconds!). The hour+min offset form is portable and 
more likely to be supported by other serialization formats. 


> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-06-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2769:


Fix Version/s: (was: 0.10.0)
   0.11

It takes me 5 min to compile the logical plan.

> a simple logic causes very long compiling time on pig 0.10.0
> 
>
> Key: PIG-2769
> URL: https://issues.apache.org/jira/browse/PIG-2769
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
>Reporter: Dan Li
> Fix For: 0.11
>
> Attachments: case1.tar
>
>
> We found the following simple logic will cause very long compiling time for 
> pig 0.10.0, while using pig 0.8.1, everything is fine.
> A = load 'A.txt' using PigStorage()  AS (m: int);
> B = FOREACH A {
> days_str = (chararray)
> (m == 1 ? 31: 
> (m == 2 ? 28: 
> (m == 3 ? 31: 
> (m == 4 ? 30: 
> (m == 5 ? 31: 
> (m == 6 ? 30: 
> (m == 7 ? 31: 
> (m == 8 ? 31: 
> (m == 9 ? 30: 
> (m == 10 ? 31: 
> (m == 11 ? 30:31)));
> GENERATE
>days_str as days_str;
> }   
> store B into 'B';
> and here's a simple input file example: A.txt
> 1
> 2
> 3
> The pig version we used in the test
> Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2746) Pig doesn't detect all forms of compression extensions properly

2012-06-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2746:


   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Sounds good. 

Patch committed to trunk.

Thanks Harsh!

> Pig doesn't detect all forms of compression extensions properly
> ---
>
> Key: PIG-2746
> URL: https://issues.apache.org/jira/browse/PIG-2746
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Harsh J
>Assignee: Harsh J
> Fix For: 0.11
>
> Attachments: PIG-2746.patch, PIG-2746.patch, PIG-2746.patch
>
>
> The PigStorage has the following snippet.
> {code}
> private void setCompression(Path path, Job job) {
>   String location=path.getName();
> if (location.endsWith(".bz2") || location.endsWith(".bz")) {
> FileOutputFormat.setCompressOutput(job, true);
> FileOutputFormat.setOutputCompressorClass(job,  BZip2Codec.class);
> }  else if (location.endsWith(".gz")) {
> FileOutputFormat.setCompressOutput(job, true);
> FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
> } else {
> FileOutputFormat.setCompressOutput( job, false);
> }
> }
> {code}
> This limits it to only work with STORE filenames provided as 'output.gz' or 
> 'output.bz2' and for the rest (like LZO) one has to specify codecs and 
> manually enable compression.
> Ideally Pig can rely on Hadoop's extension-to-codec detector instead of 
> having this ladder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2771) Extend LoadPushDown to push limit info to the LoadFunc

2012-06-27 Thread Mathias Herberts (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402439#comment-13402439
 ] 

Mathias Herberts commented on PIG-2771:
---

Sorry, I had missed it, I'm not using 0.10 yet due to PIG-2760.

> Extend LoadPushDown to push limit info to the LoadFunc
> --
>
> Key: PIG-2771
> URL: https://issues.apache.org/jira/browse/PIG-2771
> Project: Pig
>  Issue Type: Improvement
>Reporter: Mathias Herberts
>Priority: Minor
>
> It is not uncommon to use LIMIT clauses just after a LOAD, especially during 
> the development phase of new scripts.
> The current behaviour is to do the LIMIT in the map phase just after the 
> LOAD, this means that the output of each Mapper has indeed N records if a 
> 'LIMIT x N' was used, but the LoadFunc has read all the records in its splits.
> A nice optimization would be to push to the LoadFunc the fact that only the 
> first N records are needed, this way the LOAD would terminate as soon as each 
> Mapper have produced N records, which can speed up things quite a bit when 
> input is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2771) Extend LoadPushDown to push limit info to the LoadFunc

2012-06-27 Thread Mathias Herberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathias Herberts resolved PIG-2771.
---

Resolution: Duplicate

> Extend LoadPushDown to push limit info to the LoadFunc
> --
>
> Key: PIG-2771
> URL: https://issues.apache.org/jira/browse/PIG-2771
> Project: Pig
>  Issue Type: Improvement
>Reporter: Mathias Herberts
>Priority: Minor
>
> It is not uncommon to use LIMIT clauses just after a LOAD, especially during 
> the development phase of new scripts.
> The current behaviour is to do the LIMIT in the map phase just after the 
> LOAD, this means that the output of each Mapper has indeed N records if a 
> 'LIMIT x N' was used, but the LoadFunc has read all the records in its splits.
> A nice optimization would be to push to the LoadFunc the fact that only the 
> first N records are needed, this way the LOAD would terminate as soon as each 
> Mapper have produced N records, which can speed up things quite a bit when 
> input is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2697) pretty print schema

2012-06-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402430#comment-13402430
 ] 

Julien Le Dem commented on PIG-2697:


it seems org.apache.pig.pigunit.PigTest does not compile anymore.
{noformat}
[javac] 
/Users/julien/svn/pig/trunk-PIG-2748/test/org/apache/pig/pigunit/PigTest.java:254:
 
stringifySchema(java.lang.StringBuilder,org.apache.pig.impl.logicalLayer.schema.Schema,byte,int)
 in org.apache.pig.impl.logicalLayer.schema.Schema cannot be applied to 
(java.lang.StringBuilder,org.apache.pig.impl.logicalLayer.schema.Schema,byte)
[javac] Schema.stringifySchema(sb, pig.dumpSchema(aliasInput), 
DataType.TUPLE) ;
{noformat}

> pretty print schema
> ---
>
> Key: PIG-2697
> URL: https://issues.apache.org/jira/browse/PIG-2697
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.11
>
> Attachments: PIG-2697.patch, PIG-2697.patch
>
>
> currently 'describe' dumps the schema in one line. If you have a long or 
> complicated schema, it is pretty much impossible to figure out how the schema 
> looks or what the fileds are.
> will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2771) Extend LoadPushDown to push limit info to the LoadFunc

2012-06-27 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402366#comment-13402366
 ] 

Dmitriy V. Ryaboy commented on PIG-2771:


mathias, I think that's been done for 0.10+ 

see PIG-1270

> Extend LoadPushDown to push limit info to the LoadFunc
> --
>
> Key: PIG-2771
> URL: https://issues.apache.org/jira/browse/PIG-2771
> Project: Pig
>  Issue Type: Improvement
>Reporter: Mathias Herberts
>Priority: Minor
>
> It is not uncommon to use LIMIT clauses just after a LOAD, especially during 
> the development phase of new scripts.
> The current behaviour is to do the LIMIT in the map phase just after the 
> LOAD, this means that the output of each Mapper has indeed N records if a 
> 'LIMIT x N' was used, but the LoadFunc has read all the records in its splits.
> A nice optimization would be to push to the LoadFunc the fact that only the 
> first N records are needed, this way the LOAD would terminate as soon as each 
> Mapper have produced N records, which can speed up things quite a bit when 
> input is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1266

2012-06-27 Thread Apache Jenkins Server
See 

Changes:

[julien] PIG-2770: Allow easy inclusion of custom build targets (julien)

[jcoveney] PIG-2697: pretty print schema via pig.pretty.print.schema (rangadi 
via jcoveney)

--
[...truncated 3528 lines...]
 [exec] Fetching plugins descriptor: 
http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] Getting: http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] To: 

 [exec] local file date : Tue Feb 01 02:18:42 UTC 2011
 [exec] ..
 [exec] last modified = Fri Jun 10 08:37:02 UTC 2011
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/plugins.xml.
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/whiteboard-plugins.xml.
 [exec] 
 [exec] init-plugins:
 [exec] Created dir: 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] Copying 1 file to 

 [exec] 
 [exec]   --
 [exec]   Installing plugin: org.apache.forrest.plugin.output.pdf
 [exec]   --
 [exec]
 [exec] 
 [exec] check-plugin:
 [exec] org.apache.forrest.plugin.output.pdf is available in the build dir. 
Trying to update it...
 [exec] 
 [exec] init-props:
 [exec] 
 [exec] echo-settings-condition:
 [exec] 
 [exec] echo-settings:
 [exec] 
 [exec] init-proxy:
 [exec] 
 [exec] fetch-plugins-descriptors:
 [exec] 
 [exec] fetch-plugin:
 [exec] Trying to find the description of 
org.apache.forrest.plugin.output.pdf in the different descriptor files
 [exec] Using the descriptor file 

 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginlist2fetch.xsl
 [exec] 
 [exec] fetch-local-unversioned-plugin:
 [exec] 
 [exec] get-local:
 [exec] Trying to locally get org.apache.forrest.plugin.output.pdf
 [exec] Looking in local /home/jenkins/tools/forrest/latest/plugins
 [exec] Found !
 [exec] 
 [exec] init-build-compiler:
 [exec] 
 [exec] echo-init:
 [exec] 
 [exec] init:
 [exec] 
 [exec] compile:
 [exec] 
 [exec] jar:
 [exec] 
 [exec] local-deploy:
 [exec] Locally deploying org.apache.forrest.plugin.output.pdf
 [exec] 
 [exec] build:
 [exec] Plugin org.apache.forrest.plugin.output.pdf deployed ! Ready to 
configure
 [exec] 
 [exec] fetch-remote-unversioned-plugin-version-forrest:
 [exec] 
 [exec] fetch-remote-unversioned-plugin-unversion-forrest:
 [exec] 
 [exec] has-been-downloaded:
 [exec] 
 [exec] downloaded-message:
 [exec] 
 [exec] uptodate-message:
 [exec] 
 [exec] not-found-message:
 [exec] Fetch-plugin Ok, installing !
 [exec] 
 [exec] unpack-plugin:
 [exec] 
 [exec] install-plugin:
 [exec] 
 [exec] configure-plugin:
 [exec] 
 [exec] configure-output-plugin:
 [exec] Mounting output plugin: org.apache.forrest.plugin.output.pdf
 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginMountSnippet.xsl
 [exec] Moving 1 file to 

 [exec] 
 [exec] configure-plugin-locationmap:
 [exec] Mounting plugin locationmap for org.apache.forrest.plugin.output.pdf
 [exec] Processing 

 to 

 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginLmMountSnippet.xsl
 [exec] Moving 1 file to 

 [exec] 
 [exec

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402061#comment-13402061
 ] 

Zhijie Shen commented on PIG-1314:
--

{quote}
Yes, it will be lossy, but the part that is important for date calculations is 
preserved. The ISO spec only has offset for timezone. I don't think we have to 
allow datetime field to be used for storing location information. Does JodaTime 
preserve the location string ?
{quote}

Yes, I think so. If I get an DateTimeZone object by 
DateTimeZone.forID("asia/singapore"), the returned DateTimeZone object doesn't 
change to "+08:00", but keeps "asia/singapore". We'd better preserve it because 
when users want to output the time in their customized format that has "z" in 
the pattern string, the exact timezone can be outputed.

{quote}
But won't jodatime support a timezone outside this list, If the user specifies 
a date using the UTC offset format ?
{quote}

Yes, DateTimeZone.forID() also allows UTC offset string as input, such as 
"+08:00", though it is not in the list. However, the offset can be value in the 
range [-23:59:59.999, +23:59:59.999], and the minimal granularity can be the 
millisecond

Then, we are expected to have a combined lookup table that maps canonical 
timezone ids and UTC offset to their concise representation. Do you have any 
suggestion here? Or we temporally set aside the performance issue right now, 
and move forward to make timezone serialization work by simply serializing the 
timezone id string.

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2771) Extend LoadPushDown to push limit info to the LoadFunc

2012-06-27 Thread Mathias Herberts (JIRA)
Mathias Herberts created PIG-2771:
-

 Summary: Extend LoadPushDown to push limit info to the LoadFunc
 Key: PIG-2771
 URL: https://issues.apache.org/jira/browse/PIG-2771
 Project: Pig
  Issue Type: Improvement
Reporter: Mathias Herberts
Priority: Minor


It is not uncommon to use LIMIT clauses just after a LOAD, especially during 
the development phase of new scripts.

The current behaviour is to do the LIMIT in the map phase just after the LOAD, 
this means that the output of each Mapper has indeed N records if a 'LIMIT x N' 
was used, but the LoadFunc has read all the records in its splits.

A nice optimization would be to push to the LoadFunc the fact that only the 
first N records are needed, this way the LOAD would terminate as soon as each 
Mapper have produced N records, which can speed up things quite a bit when 
input is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira