Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread Prasanth Jayachandran
Congrats Aniket!

Thanks
Prasanth Jayachandran

On Jan 15, 2014, at 10:30 AM, Bill Graham  wrote:

> Woo! Congrats Aniket!
> 
> 
> On Tue, Jan 14, 2014 at 8:47 PM, Olga Natkovich wrote:
> 
>> Congrats, Aniket!
>> 
>> 
>> 
>> On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen 
>> wrote:
>> 
>> Congrats Aniket!
>> 
>> 
>> 
>> On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park 
>> wrote:
>> 
>>> Congrats Aniket!
>>> 
>>> 
>>> On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho >>> wrote:
>>> 
 Congratulations Aniket, good work!
 
 Jarcec
 
 On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
> It's my pleasure to announce that Aniket Mokashi became the newest
 addition to the Pig PMC.
> Aniket has been actively contributing to Pig for years.
> Please join me in congratulating Aniket!
> 
> Julien
> 
 
>>> 
>> 
> 
> 
> 
> -- 
> *Note that I'm no longer using my Yahoo! email address. Please email me
> at billgra...@gmail.com  going forward.*


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread Bill Graham
Woo! Congrats Aniket!


On Tue, Jan 14, 2014 at 8:47 PM, Olga Natkovich wrote:

> Congrats, Aniket!
>
>
>
> On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen 
> wrote:
>
> Congrats Aniket!
>
>
>
> On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park 
> wrote:
>
> > Congrats Aniket!
> >
> >
> > On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho  > >wrote:
> >
> > > Congratulations Aniket, good work!
> > >
> > > Jarcec
> > >
> > > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
> > > > It's my pleasure to announce that Aniket Mokashi became the newest
> > > addition to the Pig PMC.
> > > > Aniket has been actively contributing to Pig for years.
> > > > Please join me in congratulating Aniket!
> > > >
> > > > Julien
> > > >
> > >
> >
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me
at billgra...@gmail.com  going forward.*


Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread Olga Natkovich
Congrats, Aniket!



On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen  
wrote:
 
Congrats Aniket!



On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park  wrote:

> Congrats Aniket!
>
>
> On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho  >wrote:
>
> > Congratulations Aniket, good work!
> >
> > Jarcec
> >
> > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
> > > It's my pleasure to announce that Aniket Mokashi became the newest
> > addition to the Pig PMC.
> > > Aniket has been actively contributing to Pig for years.
> > > Please join me in congratulating Aniket!
> > >
> > > Julien
> > >
> >
>

Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread Tongjie Chen
Congrats Aniket!


On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park  wrote:

> Congrats Aniket!
>
>
> On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho  >wrote:
>
> > Congratulations Aniket, good work!
> >
> > Jarcec
> >
> > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
> > > It's my pleasure to announce that Aniket Mokashi became the newest
> > addition to the Pig PMC.
> > > Aniket has been actively contributing to Pig for years.
> > > Please join me in congratulating Aniket!
> > >
> > > Julien
> > >
> >
>


Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread Cheolsoo Park
Congrats Aniket!


On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho wrote:

> Congratulations Aniket, good work!
>
> Jarcec
>
> On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
> > It's my pleasure to announce that Aniket Mokashi became the newest
> addition to the Pig PMC.
> > Aniket has been actively contributing to Pig for years.
> > Please join me in congratulating Aniket!
> >
> > Julien
> >
>


Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread Jarek Jarcec Cecho
Congratulations Aniket, good work!

Jarcec

On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
> It's my pleasure to announce that Aniket Mokashi became the newest addition 
> to the Pig PMC.
> Aniket has been actively contributing to Pig for years.
> Please join me in congratulating Aniket!
> 
> Julien
> 


signature.asc
Description: Digital signature


Welcome to the new Pig PMC member Aniket Mokashi

2014-01-14 Thread JULIEN LE DEM
It's my pleasure to announce that Aniket Mokashi became the newest addition to 
the Pig PMC.
Aniket has been actively contributing to Pig for years.
Please join me in congratulating Aniket!

Julien



[jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN

2014-01-14 Thread Hiten Java (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiten Java updated PIG-3668:


Priority: Major  (was: Trivial)

> COR built-in function when atleast one of the coefficient values is NaN
> ---
>
> Key: PIG-3668
> URL: https://issues.apache.org/jira/browse/PIG-3668
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.12.0
>Reporter: Hiten Java
> Attachments: COR.diff
>
>
> When passing multiple column keys for Correlation analysis, if coefficient 
> value of one of the combinations is NaN, then the value for all other 
> combinations is not computed.
> Pearson Co-efficient value is NaN if all values for a given column are the 
> same.
> Example:
> A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
> B = group A all;
> c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) 
> A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, 
> (bag{tuple(double)}) A.col_4));
> If the value of pearson coefficient for col_1 and col_2 is NaN, then value of 
> co-efficients for all combinations is NaN
> This is happening because of 'return null' statement in catch block on lines 
> 157 and 235 in file org.apache.pig.builtin.COR.java
> If the catch block is removed, then the correlation analysis would continue 
> for the remaining columns. (ApachePig 0.12.0)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN

2014-01-14 Thread Hiten Java (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiten Java updated PIG-3668:


Affects Version/s: (was: 0.11.1)
   (was: 0.11)
   Status: Patch Available  (was: Open)

> COR built-in function when atleast one of the coefficient values is NaN
> ---
>
> Key: PIG-3668
> URL: https://issues.apache.org/jira/browse/PIG-3668
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.12.0
>Reporter: Hiten Java
>Priority: Trivial
> Attachments: COR.diff
>
>
> When passing multiple column keys for Correlation analysis, if coefficient 
> value of one of the combinations is NaN, then the value for all other 
> combinations is not computed.
> Pearson Co-efficient value is NaN if all values for a given column are the 
> same.
> Example:
> A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
> B = group A all;
> c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) 
> A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, 
> (bag{tuple(double)}) A.col_4));
> If the value of pearson coefficient for col_1 and col_2 is NaN, then value of 
> co-efficients for all combinations is NaN
> This is happening because of 'return null' statement in catch block on lines 
> 157 and 235 in file org.apache.pig.builtin.COR.java
> If the catch block is removed, then the correlation analysis would continue 
> for the remaining columns. (ApachePig 0.12.0)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN

2014-01-14 Thread Hiten Java (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiten Java updated PIG-3668:


Attachment: COR.diff

Patch file for .12 version.

> COR built-in function when atleast one of the coefficient values is NaN
> ---
>
> Key: PIG-3668
> URL: https://issues.apache.org/jira/browse/PIG-3668
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.11, 0.12.0, 0.11.1
>Reporter: Hiten Java
>Priority: Trivial
> Attachments: COR.diff
>
>
> When passing multiple column keys for Correlation analysis, if coefficient 
> value of one of the combinations is NaN, then the value for all other 
> combinations is not computed.
> Pearson Co-efficient value is NaN if all values for a given column are the 
> same.
> Example:
> A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
> B = group A all;
> c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) 
> A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, 
> (bag{tuple(double)}) A.col_4));
> If the value of pearson coefficient for col_1 and col_2 is NaN, then value of 
> co-efficients for all combinations is NaN
> This is happening because of 'return null' statement in catch block on lines 
> 157 and 235 in file org.apache.pig.builtin.COR.java
> If the catch block is removed, then the correlation analysis would continue 
> for the remaining columns. (ApachePig 0.12.0)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN

2014-01-14 Thread Hiten Java (JIRA)
Hiten Java created PIG-3668:
---

 Summary: COR built-in function when atleast one of the coefficient 
values is NaN
 Key: PIG-3668
 URL: https://issues.apache.org/jira/browse/PIG-3668
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.11.1, 0.12.0, 0.11
Reporter: Hiten Java
Priority: Trivial


When passing multiple column keys for Correlation analysis, if coefficient 
value of one of the combinations is NaN, then the value for all other 
combinations is not computed.

Pearson Co-efficient value is NaN if all values for a given column are the same.

Example:
A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
B = group A all;
c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) 
A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, 
(bag{tuple(double)}) A.col_4));

If the value of pearson coefficient for col_1 and col_2 is NaN, then value of 
co-efficients for all combinations is NaN

This is happening because of 'return null' statement in catch block on lines 
157 and 235 in file org.apache.pig.builtin.COR.java
If the catch block is removed, then the correlation analysis would continue for 
the remaining columns. (ApachePig 0.12.0)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3557) Implement optimizations for LIMIT

2014-01-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871484#comment-13871484
 ] 

Daniel Dai commented on PIG-3557:
-

Yes, in case of the root vertex (vertex contains load), the parallelism is 
determined by InputFormat not requestedParallelism, and it cannot be determined 
in compile time. We will need to do a second limit only vertex in this case. 
For non-root vertex however, we can use requestedParallelism as a criteria to 
determine whether or not we need a follow up vertex for limit.

> Implement optimizations for LIMIT
> -
>
> Key: PIG-3557
> URL: https://issues.apache.org/jira/browse/PIG-3557
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Alex Bain
>Assignee: Alex Bain
>
> Implement optimizations for LIMIT when other parts of Pig-on-Tez are more 
> mature. Some of the optimizations mentioned by Daniel include:
> 1. If the previous stage using 1 reduce, no need to add one more vertex
> 2. If the limitplan is null (ie, not the "limited order by" case), we might 
> not need a shuffle edge, a pass through edge should be enough if possible
> 3. Similar to PIG-1270, we can push limit to InputHandler
> 4. We also need to think through the "limited order by" case once "order by" 
> is implemented



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-14 Thread Suhas Satish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Satish updated PIG-3667:
--

Attachment: PIG-3667.patch

> build.xml jar-all target does not include jython*.jar in lib/ directory 
> 
>
> Key: PIG-3667
> URL: https://issues.apache.org/jira/browse/PIG-3667
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Suhas Satish
>  Labels: build
> Attachments: PIG-3667.patch
>
>
> Pig package does not include the jython jar within lib/ directory  with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig  
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log => 
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt>  register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-14 Thread Suhas Satish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Satish updated PIG-3667:
--

   Labels: build  (was: )
Affects Version/s: (was: 0.11.1)
 Hadoop Flags: Reviewed
   Status: Patch Available  (was: Open)

> build.xml jar-all target does not include jython*.jar in lib/ directory 
> 
>
> Key: PIG-3667
> URL: https://issues.apache.org/jira/browse/PIG-3667
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Suhas Satish
>  Labels: build
>
> Pig package does not include the jython jar within lib/ directory  with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig  
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log => 
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt>  register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] Subscription: PIG patch available

2014-01-14 Thread jira
Issue Subscription
Filter: PIG patch available (10 issues)

Subscriber: pigdaily

Key Summary
PIG-3654Add class cache to PigContext
https://issues.apache.org/jira/browse/PIG-3654
PIG-3644Implement skewed join in Tez
https://issues.apache.org/jira/browse/PIG-3644
PIG-3642Direct HDFS access for small jobs (fetch) 
https://issues.apache.org/jira/browse/PIG-3642
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3615Update the way that JsonLoader/JsonStorage deal with BigDecimal
https://issues.apache.org/jira/browse/PIG-3615
PIG-3613UDF for SimilarityMatching between strings with matching scores
https://issues.apache.org/jira/browse/PIG-3613
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3573Provide StoreFunc and LoadFunc for Accumulo
https://issues.apache.org/jira/browse/PIG-3573
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3347Store invocation brings side effect
https://issues.apache.org/jira/browse/PIG-3347

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


Pig User Group Meetup at LinkedIn on Fri Mar 14

2014-01-14 Thread Rohini Palaniswamy
Please join us for the Pig User Group Meetup this quarter at LinkedIn on
Fri Mar 14. We have some interesting talks lined up on the recent
developments in Pig.

RSVP at http://www.meetup.com/PigUser/events/160604192/

Tentative lineup for this meetup:
Pig on Tez
Pig on Storm
Intel Graph Builder
Pig Pen (MR for Clojure)
Accumulo Storage

  Video recording of the meetup talks will be posted after the meeting for
those not able to attend.

  Thanks Mark Wagner and Alex Bain for hosting it at LinkedIn.

Regards,
Rohini


Re: Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo

2014-01-14 Thread Josh Elser

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16533/
---

(Updated Jan. 15, 2014, 12:44 a.m.)


Review request for pig.


Changes
---

This should address currently open issues. Reworked all of the column 
specification to match HBaseStorage more closely with a few differences.

* Accumulo allows any number of colfams for a table which allows for different 
table designs. As such, I introduced the notion of "*" which consumes all 
columns in a row as a map. If the user enters no columns (empty string), this 
is also the default behavior. "literal" or "literal:literal" create a 
DataByteArray in the tuple, and "liter*", "literal:" and "literal:*" all create 
a map in the tuple. 

* Removed string-ification serialization in AccumuloBinaryConvert.

* Even more unit tests.


Bugs: PIG-3573
https://issues.apache.org/jira/browse/PIG-3573


Repository: pig-git


Description
---

Provides basic StoreFunc and LoadFunc implementations. Based off of code that 
was in an Accumulo contrib project.


Diffs (updated)
-

  ivy.xml 180eb2c 
  ivy/libraries.properties 14abdf8 
  src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java 
PRE-CREATION 
  src/org/apache/pig/backend/hadoop/accumulo/AccumuloBinaryConverter.java 
PRE-CREATION 
  src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java PRE-CREATION 
  src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageOptions.java 
PRE-CREATION 
  src/org/apache/pig/backend/hadoop/accumulo/Column.java PRE-CREATION 
  src/org/apache/pig/backend/hadoop/accumulo/FixedByteArrayOutputStream.java 
PRE-CREATION 
  src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION 
  test/excluded-tests-23 aaf6bd1 
  test/org/apache/pig/backend/hadoop/accumulo/TestAbstractAccumuloStorage.java 
PRE-CREATION 
  test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloBinaryConverter.java 
PRE-CREATION 
  test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloColumns.java 
PRE-CREATION 
  test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloPigCluster.java 
PRE-CREATION 
  test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloStorage.java 
PRE-CREATION 
  
test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloStorageConfiguration.java
 PRE-CREATION 
  test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloStorageOptions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/16533/diff/


Testing
---

Local tests reading, writing and JOIN'ing Accumulo tables. Tested against 
Hadoop-1.0.4 and 2.2.0, with Accumulo 1.5.0


Thanks,

Josh Elser



[jira] [Created] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-14 Thread Suhas Satish (JIRA)
Suhas Satish created PIG-3667:
-

 Summary: build.xml jar-all target does not include jython*.jar in 
lib/ directory 
 Key: PIG-3667
 URL: https://issues.apache.org/jira/browse/PIG-3667
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11.1, 0.12.0
Reporter: Suhas Satish
Assignee: Suhas Satish


Pig package does not include the jython jar within lib/ directory  with the 
jar-all ant target but includes it in the "ant package" target. It should be 
including it in both targets as often, the build/ directory is excluded from 
packaging which is where ivy puts all the dependency jars while building under 
build/ivy/lib/Pig  

To reproduce:
ant jar-all 
rm -rf build/ 
bin/pig
grunt> register '/tmp/test.py' using jython as myfunction;

If done prior to installing jython, here's the error one gets:
2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
2998: Unhandled internal error. org/python/core/PyObject
Details at logfile: pig_*.log

Within the pig_*.log => 



Pig Stack Trace
---
ERROR 2998: Unhandled internal error. org/python/core/PyObject

java.lang.NoClassDefFoundError: org/python/core/PyObject
at
org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
at org.apache.pig.PigServer.registerCode(PigServer.java:501)
at
org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 14 more


Fix: Including jython*.jar within the lib/ directory gets rid of this issue and 
the UDF can be loaded- 
grunt>  register '/tmp/test.py' using jython as myfuncs;

2013-12-27 18:37:02,402 [main] INFO 
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_4887743829482443898
2013-12-27 18:37:03,448 [main] WARN 
org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is
empty. This is not expected unless on testing.
2013-12-27 18:37:03,724 [main] INFO 
org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
myfuncs.helloworld




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3664) Piggy Bank XPath UDF can't be called

2014-01-14 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated PIG-3664:
-

Attachment: PIG-3664.1.patch

Attached a new patch that implements the getArgToFuncMapping method.

> Piggy Bank XPath UDF can't be called
> 
>
> Key: PIG-3664
> URL: https://issues.apache.org/jira/browse/PIG-3664
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
>Priority: Blocker
> Attachments: PIG-3664.1.patch, PIG-3664.patch
>
>
> When I try to call XPath UDF to process a very simple XML with Pig 0.12 I get 
> the problem:
> 2014-01-13 16:14:19,530 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1045: 
>  Could not infer the matching function for 
> org.apache.pig.piggybank.evaluation.xml.XPath as multiple or none of them 
> fit. Please use an explicit cast. I guess the XPath UDF overrides the 
> getArgToFuncMapping() in an incorrect way. A fixed is attached.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3664) Piggy Bank XPath UDF can't be called

2014-01-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871380#comment-13871380
 ] 

Daniel Dai commented on PIG-3664:
-

getArgToFuncMapping still better cuz we can capture schema mismatch in the 
frontend. I would prefer fix it rather than get rid of it.

> Piggy Bank XPath UDF can't be called
> 
>
> Key: PIG-3664
> URL: https://issues.apache.org/jira/browse/PIG-3664
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
>Priority: Blocker
> Attachments: PIG-3664.patch
>
>
> When I try to call XPath UDF to process a very simple XML with Pig 0.12 I get 
> the problem:
> 2014-01-13 16:14:19,530 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1045: 
>  Could not infer the matching function for 
> org.apache.pig.piggybank.evaluation.xml.XPath as multiple or none of them 
> fit. Please use an explicit cast. I guess the XPath UDF overrides the 
> getArgToFuncMapping() in an incorrect way. A fixed is attached.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3557) Implement optimizations for LIMIT

2014-01-14 Thread Alex Bain (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871376#comment-13871376
 ] 

Alex Bain commented on PIG-3557:


1. You can check requestedParallelism for the tezOperator. This should be 
doable.

This doesn't sound quite right to me. Let's say you are doing:
a = LOAD '/data/myLargeDataSet';
b = LIMIT a 100;
...
where myLargeDataSet contains lots of block-sized files. Then, in that case, 
the Tez vertex for the POLoad has a requestedParallelism of 1, but the actual 
runtime parallelism will be equal to the number of files. In this case, the 
optimization (putting the limit only in the plan for the previous vertex, which 
in this case, is the vertex for the load) and not having a second vertex fails. 
Basically, we can't depend on requestedParallelism = 1 to actually be the 
parallelism at runtime.

[Just to note, the LimitOptimizer would actually push the limit up to the Input 
Handler, but just to keep this example simple, let's ignore that for now]

> Implement optimizations for LIMIT
> -
>
> Key: PIG-3557
> URL: https://issues.apache.org/jira/browse/PIG-3557
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Alex Bain
>Assignee: Alex Bain
>
> Implement optimizations for LIMIT when other parts of Pig-on-Tez are more 
> mature. Some of the optimizations mentioned by Daniel include:
> 1. If the previous stage using 1 reduce, no need to add one more vertex
> 2. If the limitplan is null (ie, not the "limited order by" case), we might 
> not need a shuffle edge, a pass through edge should be enough if possible
> 3. Similar to PIG-1270, we can push limit to InputHandler
> 4. We also need to think through the "limited order by" case once "order by" 
> is implemented



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3666) Fix store after load

2014-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3666:


Hadoop Flags: Reviewed

> Fix store after load
> 
>
> Key: PIG-3666
> URL: https://issues.apache.org/jira/browse/PIG-3666
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3666-1.patch
>
>
> Several e2e test fail share the following pattern:
> .
> store into 'afile';
> a = load 'afile';
> ..
> Stack:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435)
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344)
> ... 16 more
> It needs to break into two DAGs since the second DAG expect hdfs input 
> produced by the first DAG.
> Example of such e2e test failures are: Casts_[1-6]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (PIG-3666) Fix store after load

2014-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3666.
-

Resolution: Fixed

Patch committed to Tez branch. Thanks Rohini for review!

> Fix store after load
> 
>
> Key: PIG-3666
> URL: https://issues.apache.org/jira/browse/PIG-3666
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3666-1.patch
>
>
> Several e2e test fail share the following pattern:
> .
> store into 'afile';
> a = load 'afile';
> ..
> Stack:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435)
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344)
> ... 16 more
> It needs to break into two DAGs since the second DAG expect hdfs input 
> produced by the first DAG.
> Example of such e2e test failures are: Casts_[1-6]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (PIG-3665) TEZ-41 break pig-tez

2014-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3665.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

Patch committed to tez-branch. 

FYI, without this patch, with newer version of tez, we will get empty result.

> TEZ-41 break pig-tez
> 
>
> Key: PIG-3665
> URL: https://issues.apache.org/jira/browse/PIG-3665
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3665-1.patch
>
>
> TEZ-41 introduce a backward incompatible change and Pig need to change 
> accordingly. Please update tez code once the change is checked into Pig.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3666) Fix store after load

2014-01-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871327#comment-13871327
 ] 

Rohini Palaniswamy commented on PIG-3666:
-

+1

> Fix store after load
> 
>
> Key: PIG-3666
> URL: https://issues.apache.org/jira/browse/PIG-3666
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3666-1.patch
>
>
> Several e2e test fail share the following pattern:
> .
> store into 'afile';
> a = load 'afile';
> ..
> Stack:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435)
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344)
> ... 16 more
> It needs to break into two DAGs since the second DAG expect hdfs input 
> produced by the first DAG.
> Example of such e2e test failures are: Casts_[1-6]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3665) TEZ-41 break pig-tez

2014-01-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871316#comment-13871316
 ] 

Rohini Palaniswamy commented on PIG-3665:
-

+1

> TEZ-41 break pig-tez
> 
>
> Key: PIG-3665
> URL: https://issues.apache.org/jira/browse/PIG-3665
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3665-1.patch
>
>
> TEZ-41 introduce a backward incompatible change and Pig need to change 
> accordingly. Please update tez code once the change is checked into Pig.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3666) Fix store after load

2014-01-14 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-3666:
---

 Summary: Fix store after load
 Key: PIG-3666
 URL: https://issues.apache.org/jira/browse/PIG-3666
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
 Attachments: PIG-3666-1.patch

Several e2e test fail share the following pattern:

.
store into 'afile';
a = load 'afile';
..

Stack:
Caused by: java.lang.NullPointerException
at 
org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435)
at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344)
... 16 more


It needs to break into two DAGs since the second DAG expect hdfs input produced 
by the first DAG.

Example of such e2e test failures are: Casts_[1-6]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3666) Fix store after load

2014-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3666:


Attachment: PIG-3666-1.patch

> Fix store after load
> 
>
> Key: PIG-3666
> URL: https://issues.apache.org/jira/browse/PIG-3666
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3666-1.patch
>
>
> Several e2e test fail share the following pattern:
> .
> store into 'afile';
> a = load 'afile';
> ..
> Stack:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435)
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344)
> ... 16 more
> It needs to break into two DAGs since the second DAG expect hdfs input 
> produced by the first DAG.
> Example of such e2e test failures are: Casts_[1-6]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (PIG-3666) Fix store after load

2014-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-3666:
---

Assignee: Daniel Dai

> Fix store after load
> 
>
> Key: PIG-3666
> URL: https://issues.apache.org/jira/browse/PIG-3666
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3666-1.patch
>
>
> Several e2e test fail share the following pattern:
> .
> store into 'afile';
> a = load 'afile';
> ..
> Stack:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435)
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344)
> ... 16 more
> It needs to break into two DAGs since the second DAG expect hdfs input 
> produced by the first DAG.
> Example of such e2e test failures are: Casts_[1-6]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3665) TEZ-41 break pig-tez

2014-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3665:


Attachment: PIG-3665-1.patch

> TEZ-41 break pig-tez
> 
>
> Key: PIG-3665
> URL: https://issues.apache.org/jira/browse/PIG-3665
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3665-1.patch
>
>
> TEZ-41 introduce a backward incompatible change and Pig need to change 
> accordingly. Please update tez code once the change is checked into Pig.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3665) TEZ-41 break pig-tez

2014-01-14 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-3665:
---

 Summary: TEZ-41 break pig-tez
 Key: PIG-3665
 URL: https://issues.apache.org/jira/browse/PIG-3665
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: tez-branch


TEZ-41 introduce a backward incompatible change and Pig need to change 
accordingly. Please update tez code once the change is checked into Pig.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3557) Implement optimizations for LIMIT

2014-01-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871183#comment-13871183
 ] 

Daniel Dai commented on PIG-3557:
-

1. You can check requestedParallelism for the tezOperator. This should be 
doable.

2. We can do a non-sorted scatter-gather, but this depends on TEZ-661, we 
cannot proceed now

4. We could use combiner and duplicate POLimit in the combiner. Otherwise, plan 
looks good.

> Implement optimizations for LIMIT
> -
>
> Key: PIG-3557
> URL: https://issues.apache.org/jira/browse/PIG-3557
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Alex Bain
>Assignee: Alex Bain
>
> Implement optimizations for LIMIT when other parts of Pig-on-Tez are more 
> mature. Some of the optimizations mentioned by Daniel include:
> 1. If the previous stage using 1 reduce, no need to add one more vertex
> 2. If the limitplan is null (ie, not the "limited order by" case), we might 
> not need a shuffle edge, a pass through edge should be enough if possible
> 3. Similar to PIG-1270, we can push limit to InputHandler
> 4. We also need to think through the "limited order by" case once "order by" 
> is implemented



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3644) Implement skewed join in Tez

2014-01-14 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3644:
---

Status: Patch Available  (was: Open)

> Implement skewed join in Tez
> 
>
> Key: PIG-3644
> URL: https://issues.apache.org/jira/browse/PIG-3644
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3644-1.patch
>
>
> Skewed join in Tez can be implemented similarly to order-by (PIG-3634).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3644) Implement skewed join in Tez

2014-01-14 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3644:
---

Attachment: PIG-3644-1.patch

Attaching the 1st patch. The RB link is-
https://reviews.apache.org/r/16860/

> Implement skewed join in Tez
> 
>
> Key: PIG-3644
> URL: https://issues.apache.org/jira/browse/PIG-3644
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3644-1.patch
>
>
> Skewed join in Tez can be implemented similarly to order-by (PIG-3634).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Review Request 16860: PIG-3644: Implement skewed join in Tez

2014-01-14 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16860/
---

Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini 
Palaniswamy.


Bugs: PIG-3644
https://issues.apache.org/jira/browse/PIG-3644


Repository: pig-git


Description
---

Skewed join in Tez is implemented in 5 vertices:
Vertex 1) Sample/load skewed table => broadcast sampling input to vertex 2 and 
shuffle entire input to vertex 3.
Vertex 2) Sampling aggregation vertex => build distribution map and broadcast 
it to vertex 3 and 4.
Vertex 3) POLocalRearrangeTez for skewed table => partition skewed table using 
SkewedPartitioner and shuffle it to vertex 5.
Vertex 4) POPartitionRearrangeTez for streaming table => shuffle streaming 
table to vertex 5.
Vertex 5) Join inputs from vertex 3 and 4.

New classes for Tez:
- POPoissonSample) Sampling operator for skewed join.
- POPartitionRearrangeTez) Sub-class of POPartitionRearrange for Tez.
- SkewedPartitionerTez) Sub-class of SkewedPartitioner for Tez.

Note that there are a couple of places I can refactor. For eg,
- POPoissonSample and PoissonSampleLoader
- POPartitionRearrageTez and POLocalRearrangeTez

I will do it in follow-up jiras.


Diffs
-

  src/org/apache/pig/PigConfiguration.java ccf3635 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/SkewedPartitioner.java
 4790abe 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPoissonSample.java
 e69de29 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POReservoirSample.java
 bcb339c 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 
585509d 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
 e69de29 
  src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 
e9d8e64 
  src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 
e22c319 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java 
e69de29 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
d35e87d 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
83e5d2c 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 
93e522f 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
 7bcc79e 
  src/org/apache/pig/impl/builtin/PartitionSkewedKeys.java 7ce0e82 
  src/org/apache/pig/impl/builtin/PoissonSampleLoader.java 5ce5b9e 
  test/e2e/pig/tests/tez.conf ac254e5 

Diff: https://reviews.apache.org/r/16860/diff/


Testing
---

- Added e2e test cases for inner and outer skewed joins.
- unit tests pass.
- e2e tests pass.


Thanks,

Cheolsoo Park