Re: Welcome to the new Pig PMC member Aniket Mokashi
Congrats Aniket! Thanks Prasanth Jayachandran On Jan 15, 2014, at 10:30 AM, Bill Graham wrote: > Woo! Congrats Aniket! > > > On Tue, Jan 14, 2014 at 8:47 PM, Olga Natkovich wrote: > >> Congrats, Aniket! >> >> >> >> On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen >> wrote: >> >> Congrats Aniket! >> >> >> >> On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park >> wrote: >> >>> Congrats Aniket! >>> >>> >>> On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho >>> wrote: >>> Congratulations Aniket, good work! Jarcec On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: > It's my pleasure to announce that Aniket Mokashi became the newest addition to the Pig PMC. > Aniket has been actively contributing to Pig for years. > Please join me in congratulating Aniket! > > Julien > >>> >> > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me > at billgra...@gmail.com going forward.* -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Welcome to the new Pig PMC member Aniket Mokashi
Woo! Congrats Aniket! On Tue, Jan 14, 2014 at 8:47 PM, Olga Natkovich wrote: > Congrats, Aniket! > > > > On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen > wrote: > > Congrats Aniket! > > > > On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park > wrote: > > > Congrats Aniket! > > > > > > On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho > >wrote: > > > > > Congratulations Aniket, good work! > > > > > > Jarcec > > > > > > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: > > > > It's my pleasure to announce that Aniket Mokashi became the newest > > > addition to the Pig PMC. > > > > Aniket has been actively contributing to Pig for years. > > > > Please join me in congratulating Aniket! > > > > > > > > Julien > > > > > > > > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
Re: Welcome to the new Pig PMC member Aniket Mokashi
Congrats, Aniket! On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen wrote: Congrats Aniket! On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park wrote: > Congrats Aniket! > > > On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho >wrote: > > > Congratulations Aniket, good work! > > > > Jarcec > > > > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: > > > It's my pleasure to announce that Aniket Mokashi became the newest > > addition to the Pig PMC. > > > Aniket has been actively contributing to Pig for years. > > > Please join me in congratulating Aniket! > > > > > > Julien > > > > > >
Re: Welcome to the new Pig PMC member Aniket Mokashi
Congrats Aniket! On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park wrote: > Congrats Aniket! > > > On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho >wrote: > > > Congratulations Aniket, good work! > > > > Jarcec > > > > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: > > > It's my pleasure to announce that Aniket Mokashi became the newest > > addition to the Pig PMC. > > > Aniket has been actively contributing to Pig for years. > > > Please join me in congratulating Aniket! > > > > > > Julien > > > > > >
Re: Welcome to the new Pig PMC member Aniket Mokashi
Congrats Aniket! On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho wrote: > Congratulations Aniket, good work! > > Jarcec > > On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: > > It's my pleasure to announce that Aniket Mokashi became the newest > addition to the Pig PMC. > > Aniket has been actively contributing to Pig for years. > > Please join me in congratulating Aniket! > > > > Julien > > >
Re: Welcome to the new Pig PMC member Aniket Mokashi
Congratulations Aniket, good work! Jarcec On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: > It's my pleasure to announce that Aniket Mokashi became the newest addition > to the Pig PMC. > Aniket has been actively contributing to Pig for years. > Please join me in congratulating Aniket! > > Julien > signature.asc Description: Digital signature
Welcome to the new Pig PMC member Aniket Mokashi
It's my pleasure to announce that Aniket Mokashi became the newest addition to the Pig PMC. Aniket has been actively contributing to Pig for years. Please join me in congratulating Aniket! Julien
[jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN
[ https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiten Java updated PIG-3668: Priority: Major (was: Trivial) > COR built-in function when atleast one of the coefficient values is NaN > --- > > Key: PIG-3668 > URL: https://issues.apache.org/jira/browse/PIG-3668 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.12.0 >Reporter: Hiten Java > Attachments: COR.diff > > > When passing multiple column keys for Correlation analysis, if coefficient > value of one of the combinations is NaN, then the value for all other > combinations is not computed. > Pearson Co-efficient value is NaN if all values for a given column are the > same. > Example: > A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader(); > B = group A all; > c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) > A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, > (bag{tuple(double)}) A.col_4)); > If the value of pearson coefficient for col_1 and col_2 is NaN, then value of > co-efficients for all combinations is NaN > This is happening because of 'return null' statement in catch block on lines > 157 and 235 in file org.apache.pig.builtin.COR.java > If the catch block is removed, then the correlation analysis would continue > for the remaining columns. (ApachePig 0.12.0) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN
[ https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiten Java updated PIG-3668: Affects Version/s: (was: 0.11.1) (was: 0.11) Status: Patch Available (was: Open) > COR built-in function when atleast one of the coefficient values is NaN > --- > > Key: PIG-3668 > URL: https://issues.apache.org/jira/browse/PIG-3668 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.12.0 >Reporter: Hiten Java >Priority: Trivial > Attachments: COR.diff > > > When passing multiple column keys for Correlation analysis, if coefficient > value of one of the combinations is NaN, then the value for all other > combinations is not computed. > Pearson Co-efficient value is NaN if all values for a given column are the > same. > Example: > A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader(); > B = group A all; > c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) > A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, > (bag{tuple(double)}) A.col_4)); > If the value of pearson coefficient for col_1 and col_2 is NaN, then value of > co-efficients for all combinations is NaN > This is happening because of 'return null' statement in catch block on lines > 157 and 235 in file org.apache.pig.builtin.COR.java > If the catch block is removed, then the correlation analysis would continue > for the remaining columns. (ApachePig 0.12.0) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN
[ https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiten Java updated PIG-3668: Attachment: COR.diff Patch file for .12 version. > COR built-in function when atleast one of the coefficient values is NaN > --- > > Key: PIG-3668 > URL: https://issues.apache.org/jira/browse/PIG-3668 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.11, 0.12.0, 0.11.1 >Reporter: Hiten Java >Priority: Trivial > Attachments: COR.diff > > > When passing multiple column keys for Correlation analysis, if coefficient > value of one of the combinations is NaN, then the value for all other > combinations is not computed. > Pearson Co-efficient value is NaN if all values for a given column are the > same. > Example: > A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader(); > B = group A all; > c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) > A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, > (bag{tuple(double)}) A.col_4)); > If the value of pearson coefficient for col_1 and col_2 is NaN, then value of > co-efficients for all combinations is NaN > This is happening because of 'return null' statement in catch block on lines > 157 and 235 in file org.apache.pig.builtin.COR.java > If the catch block is removed, then the correlation analysis would continue > for the remaining columns. (ApachePig 0.12.0) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PIG-3668) COR built-in function when atleast one of the coefficient values is NaN
Hiten Java created PIG-3668: --- Summary: COR built-in function when atleast one of the coefficient values is NaN Key: PIG-3668 URL: https://issues.apache.org/jira/browse/PIG-3668 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.11.1, 0.12.0, 0.11 Reporter: Hiten Java Priority: Trivial When passing multiple column keys for Correlation analysis, if coefficient value of one of the combinations is NaN, then the value for all other combinations is not computed. Pearson Co-efficient value is NaN if all values for a given column are the same. Example: A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader(); B = group A all; c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, (bag{tuple(double)}) A.col_4)); If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients for all combinations is NaN This is happening because of 'return null' statement in catch block on lines 157 and 235 in file org.apache.pig.builtin.COR.java If the catch block is removed, then the correlation analysis would continue for the remaining columns. (ApachePig 0.12.0) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3557) Implement optimizations for LIMIT
[ https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871484#comment-13871484 ] Daniel Dai commented on PIG-3557: - Yes, in case of the root vertex (vertex contains load), the parallelism is determined by InputFormat not requestedParallelism, and it cannot be determined in compile time. We will need to do a second limit only vertex in this case. For non-root vertex however, we can use requestedParallelism as a criteria to determine whether or not we need a follow up vertex for limit. > Implement optimizations for LIMIT > - > > Key: PIG-3557 > URL: https://issues.apache.org/jira/browse/PIG-3557 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Alex Bain >Assignee: Alex Bain > > Implement optimizations for LIMIT when other parts of Pig-on-Tez are more > mature. Some of the optimizations mentioned by Daniel include: > 1. If the previous stage using 1 reduce, no need to add one more vertex > 2. If the limitplan is null (ie, not the "limited order by" case), we might > not need a shuffle edge, a pass through edge should be enough if possible > 3. Similar to PIG-1270, we can push limit to InputHandler > 4. We also need to think through the "limited order by" case once "order by" > is implemented -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory
[ https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Satish updated PIG-3667: -- Attachment: PIG-3667.patch > build.xml jar-all target does not include jython*.jar in lib/ directory > > > Key: PIG-3667 > URL: https://issues.apache.org/jira/browse/PIG-3667 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Suhas Satish > Labels: build > Attachments: PIG-3667.patch > > > Pig package does not include the jython jar within lib/ directory with the > jar-all ant target but includes it in the "ant package" target. It should be > including it in both targets as often, the build/ directory is excluded from > packaging which is where ivy puts all the dependency jars while building > under build/ivy/lib/Pig > To reproduce: > ant jar-all > rm -rf build/ > bin/pig > grunt> register '/tmp/test.py' using jython as myfunction; > If done prior to installing jython, here's the error one gets: > 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. org/python/core/PyObject > Details at logfile: pig_*.log > Within the pig_*.log => > > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. org/python/core/PyObject > java.lang.NoClassDefFoundError: org/python/core/PyObject > at > org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304) > at org.apache.pig.PigServer.registerCode(PigServer.java:501) > at > org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > at org.apache.pig.Main.run(Main.java:538) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > ... 14 more > Fix: Including jython*.jar within the lib/ directory gets rid of this issue > and the UDF can be loaded- > grunt> register '/tmp/test.py' using jython as myfuncs; > 2013-12-27 18:37:02,402 [main] INFO > org.apache.pig.scripting.jython.JythonScriptEngine - created tmp > python.cachedir=/tmp/pig_jython_4887743829482443898 > 2013-12-27 18:37:03,448 [main] WARN > org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders > is > empty. This is not expected unless on testing. > 2013-12-27 18:37:03,724 [main] INFO > org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: > myfuncs.helloworld -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory
[ https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Satish updated PIG-3667: -- Labels: build (was: ) Affects Version/s: (was: 0.11.1) Hadoop Flags: Reviewed Status: Patch Available (was: Open) > build.xml jar-all target does not include jython*.jar in lib/ directory > > > Key: PIG-3667 > URL: https://issues.apache.org/jira/browse/PIG-3667 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Suhas Satish > Labels: build > > Pig package does not include the jython jar within lib/ directory with the > jar-all ant target but includes it in the "ant package" target. It should be > including it in both targets as often, the build/ directory is excluded from > packaging which is where ivy puts all the dependency jars while building > under build/ivy/lib/Pig > To reproduce: > ant jar-all > rm -rf build/ > bin/pig > grunt> register '/tmp/test.py' using jython as myfunction; > If done prior to installing jython, here's the error one gets: > 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. org/python/core/PyObject > Details at logfile: pig_*.log > Within the pig_*.log => > > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. org/python/core/PyObject > java.lang.NoClassDefFoundError: org/python/core/PyObject > at > org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304) > at org.apache.pig.PigServer.registerCode(PigServer.java:501) > at > org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > at org.apache.pig.Main.run(Main.java:538) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > ... 14 more > Fix: Including jython*.jar within the lib/ directory gets rid of this issue > and the UDF can be loaded- > grunt> register '/tmp/test.py' using jython as myfuncs; > 2013-12-27 18:37:02,402 [main] INFO > org.apache.pig.scripting.jython.JythonScriptEngine - created tmp > python.cachedir=/tmp/pig_jython_4887743829482443898 > 2013-12-27 18:37:03,448 [main] WARN > org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders > is > empty. This is not expected unless on testing. > 2013-12-27 18:37:03,724 [main] INFO > org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: > myfuncs.helloworld -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (10 issues) Subscriber: pigdaily Key Summary PIG-3654Add class cache to PigContext https://issues.apache.org/jira/browse/PIG-3654 PIG-3644Implement skewed join in Tez https://issues.apache.org/jira/browse/PIG-3644 PIG-3642Direct HDFS access for small jobs (fetch) https://issues.apache.org/jira/browse/PIG-3642 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3615Update the way that JsonLoader/JsonStorage deal with BigDecimal https://issues.apache.org/jira/browse/PIG-3615 PIG-3613UDF for SimilarityMatching between strings with matching scores https://issues.apache.org/jira/browse/PIG-3613 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3573Provide StoreFunc and LoadFunc for Accumulo https://issues.apache.org/jira/browse/PIG-3573 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3347Store invocation brings side effect https://issues.apache.org/jira/browse/PIG-3347 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
Pig User Group Meetup at LinkedIn on Fri Mar 14
Please join us for the Pig User Group Meetup this quarter at LinkedIn on Fri Mar 14. We have some interesting talks lined up on the recent developments in Pig. RSVP at http://www.meetup.com/PigUser/events/160604192/ Tentative lineup for this meetup: Pig on Tez Pig on Storm Intel Graph Builder Pig Pen (MR for Clojure) Accumulo Storage Video recording of the meetup talks will be posted after the meeting for those not able to attend. Thanks Mark Wagner and Alex Bain for hosting it at LinkedIn. Regards, Rohini
Re: Review Request 16533: Add StoreFunc and LoadFunc classes to Pig for Accumulo
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16533/ --- (Updated Jan. 15, 2014, 12:44 a.m.) Review request for pig. Changes --- This should address currently open issues. Reworked all of the column specification to match HBaseStorage more closely with a few differences. * Accumulo allows any number of colfams for a table which allows for different table designs. As such, I introduced the notion of "*" which consumes all columns in a row as a map. If the user enters no columns (empty string), this is also the default behavior. "literal" or "literal:literal" create a DataByteArray in the tuple, and "liter*", "literal:" and "literal:*" all create a map in the tuple. * Removed string-ification serialization in AccumuloBinaryConvert. * Even more unit tests. Bugs: PIG-3573 https://issues.apache.org/jira/browse/PIG-3573 Repository: pig-git Description --- Provides basic StoreFunc and LoadFunc implementations. Based off of code that was in an Accumulo contrib project. Diffs (updated) - ivy.xml 180eb2c ivy/libraries.properties 14abdf8 src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java PRE-CREATION src/org/apache/pig/backend/hadoop/accumulo/AccumuloBinaryConverter.java PRE-CREATION src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java PRE-CREATION src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageOptions.java PRE-CREATION src/org/apache/pig/backend/hadoop/accumulo/Column.java PRE-CREATION src/org/apache/pig/backend/hadoop/accumulo/FixedByteArrayOutputStream.java PRE-CREATION src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION test/excluded-tests-23 aaf6bd1 test/org/apache/pig/backend/hadoop/accumulo/TestAbstractAccumuloStorage.java PRE-CREATION test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloBinaryConverter.java PRE-CREATION test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloColumns.java PRE-CREATION test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloPigCluster.java PRE-CREATION test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloStorage.java PRE-CREATION test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloStorageConfiguration.java PRE-CREATION test/org/apache/pig/backend/hadoop/accumulo/TestAccumuloStorageOptions.java PRE-CREATION Diff: https://reviews.apache.org/r/16533/diff/ Testing --- Local tests reading, writing and JOIN'ing Accumulo tables. Tested against Hadoop-1.0.4 and 2.2.0, with Accumulo 1.5.0 Thanks, Josh Elser
[jira] [Created] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory
Suhas Satish created PIG-3667: - Summary: build.xml jar-all target does not include jython*.jar in lib/ directory Key: PIG-3667 URL: https://issues.apache.org/jira/browse/PIG-3667 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11.1, 0.12.0 Reporter: Suhas Satish Assignee: Suhas Satish Pig package does not include the jython jar within lib/ directory with the jar-all ant target but includes it in the "ant package" target. It should be including it in both targets as often, the build/ directory is excluded from packaging which is where ivy puts all the dependency jars while building under build/ivy/lib/Pig To reproduce: ant jar-all rm -rf build/ bin/pig grunt> register '/tmp/test.py' using jython as myfunction; If done prior to installing jython, here's the error one gets: 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/python/core/PyObject Details at logfile: pig_*.log Within the pig_*.log => Pig Stack Trace --- ERROR 2998: Unhandled internal error. org/python/core/PyObject java.lang.NoClassDefFoundError: org/python/core/PyObject at org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304) at org.apache.pig.PigServer.registerCode(PigServer.java:501) at org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:538) at org.apache.pig.Main.main(Main.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 14 more Fix: Including jython*.jar within the lib/ directory gets rid of this issue and the UDF can be loaded- grunt> register '/tmp/test.py' using jython as myfuncs; 2013-12-27 18:37:02,402 [main] INFO org.apache.pig.scripting.jython.JythonScriptEngine - created tmp python.cachedir=/tmp/pig_jython_4887743829482443898 2013-12-27 18:37:03,448 [main] WARN org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing. 2013-12-27 18:37:03,724 [main] INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: myfuncs.helloworld -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3664) Piggy Bank XPath UDF can't be called
[ https://issues.apache.org/jira/browse/PIG-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nezih Yigitbasi updated PIG-3664: - Attachment: PIG-3664.1.patch Attached a new patch that implements the getArgToFuncMapping method. > Piggy Bank XPath UDF can't be called > > > Key: PIG-3664 > URL: https://issues.apache.org/jira/browse/PIG-3664 > Project: Pig > Issue Type: Bug >Affects Versions: 0.12.0 >Reporter: Nezih Yigitbasi >Assignee: Nezih Yigitbasi >Priority: Blocker > Attachments: PIG-3664.1.patch, PIG-3664.patch > > > When I try to call XPath UDF to process a very simple XML with Pig 0.12 I get > the problem: > 2014-01-13 16:14:19,530 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1045: > Could not infer the matching function for > org.apache.pig.piggybank.evaluation.xml.XPath as multiple or none of them > fit. Please use an explicit cast. I guess the XPath UDF overrides the > getArgToFuncMapping() in an incorrect way. A fixed is attached. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3664) Piggy Bank XPath UDF can't be called
[ https://issues.apache.org/jira/browse/PIG-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871380#comment-13871380 ] Daniel Dai commented on PIG-3664: - getArgToFuncMapping still better cuz we can capture schema mismatch in the frontend. I would prefer fix it rather than get rid of it. > Piggy Bank XPath UDF can't be called > > > Key: PIG-3664 > URL: https://issues.apache.org/jira/browse/PIG-3664 > Project: Pig > Issue Type: Bug >Affects Versions: 0.12.0 >Reporter: Nezih Yigitbasi >Assignee: Nezih Yigitbasi >Priority: Blocker > Attachments: PIG-3664.patch > > > When I try to call XPath UDF to process a very simple XML with Pig 0.12 I get > the problem: > 2014-01-13 16:14:19,530 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1045: > Could not infer the matching function for > org.apache.pig.piggybank.evaluation.xml.XPath as multiple or none of them > fit. Please use an explicit cast. I guess the XPath UDF overrides the > getArgToFuncMapping() in an incorrect way. A fixed is attached. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3557) Implement optimizations for LIMIT
[ https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871376#comment-13871376 ] Alex Bain commented on PIG-3557: 1. You can check requestedParallelism for the tezOperator. This should be doable. This doesn't sound quite right to me. Let's say you are doing: a = LOAD '/data/myLargeDataSet'; b = LIMIT a 100; ... where myLargeDataSet contains lots of block-sized files. Then, in that case, the Tez vertex for the POLoad has a requestedParallelism of 1, but the actual runtime parallelism will be equal to the number of files. In this case, the optimization (putting the limit only in the plan for the previous vertex, which in this case, is the vertex for the load) and not having a second vertex fails. Basically, we can't depend on requestedParallelism = 1 to actually be the parallelism at runtime. [Just to note, the LimitOptimizer would actually push the limit up to the Input Handler, but just to keep this example simple, let's ignore that for now] > Implement optimizations for LIMIT > - > > Key: PIG-3557 > URL: https://issues.apache.org/jira/browse/PIG-3557 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Alex Bain >Assignee: Alex Bain > > Implement optimizations for LIMIT when other parts of Pig-on-Tez are more > mature. Some of the optimizations mentioned by Daniel include: > 1. If the previous stage using 1 reduce, no need to add one more vertex > 2. If the limitplan is null (ie, not the "limited order by" case), we might > not need a shuffle edge, a pass through edge should be enough if possible > 3. Similar to PIG-1270, we can push limit to InputHandler > 4. We also need to think through the "limited order by" case once "order by" > is implemented -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3666) Fix store after load
[ https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3666: Hadoop Flags: Reviewed > Fix store after load > > > Key: PIG-3666 > URL: https://issues.apache.org/jira/browse/PIG-3666 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3666-1.patch > > > Several e2e test fail share the following pattern: > . > store into 'afile'; > a = load 'afile'; > .. > Stack: > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435) > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344) > ... 16 more > It needs to break into two DAGs since the second DAG expect hdfs input > produced by the first DAG. > Example of such e2e test failures are: Casts_[1-6] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (PIG-3666) Fix store after load
[ https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-3666. - Resolution: Fixed Patch committed to Tez branch. Thanks Rohini for review! > Fix store after load > > > Key: PIG-3666 > URL: https://issues.apache.org/jira/browse/PIG-3666 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3666-1.patch > > > Several e2e test fail share the following pattern: > . > store into 'afile'; > a = load 'afile'; > .. > Stack: > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435) > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344) > ... 16 more > It needs to break into two DAGs since the second DAG expect hdfs input > produced by the first DAG. > Example of such e2e test failures are: Casts_[1-6] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (PIG-3665) TEZ-41 break pig-tez
[ https://issues.apache.org/jira/browse/PIG-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-3665. - Resolution: Fixed Hadoop Flags: Reviewed Patch committed to tez-branch. FYI, without this patch, with newer version of tez, we will get empty result. > TEZ-41 break pig-tez > > > Key: PIG-3665 > URL: https://issues.apache.org/jira/browse/PIG-3665 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3665-1.patch > > > TEZ-41 introduce a backward incompatible change and Pig need to change > accordingly. Please update tez code once the change is checked into Pig. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3666) Fix store after load
[ https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871327#comment-13871327 ] Rohini Palaniswamy commented on PIG-3666: - +1 > Fix store after load > > > Key: PIG-3666 > URL: https://issues.apache.org/jira/browse/PIG-3666 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3666-1.patch > > > Several e2e test fail share the following pattern: > . > store into 'afile'; > a = load 'afile'; > .. > Stack: > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435) > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344) > ... 16 more > It needs to break into two DAGs since the second DAG expect hdfs input > produced by the first DAG. > Example of such e2e test failures are: Casts_[1-6] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3665) TEZ-41 break pig-tez
[ https://issues.apache.org/jira/browse/PIG-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871316#comment-13871316 ] Rohini Palaniswamy commented on PIG-3665: - +1 > TEZ-41 break pig-tez > > > Key: PIG-3665 > URL: https://issues.apache.org/jira/browse/PIG-3665 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3665-1.patch > > > TEZ-41 introduce a backward incompatible change and Pig need to change > accordingly. Please update tez code once the change is checked into Pig. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PIG-3666) Fix store after load
Daniel Dai created PIG-3666: --- Summary: Fix store after load Key: PIG-3666 URL: https://issues.apache.org/jira/browse/PIG-3666 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Attachments: PIG-3666-1.patch Several e2e test fail share the following pattern: . store into 'afile'; a = load 'afile'; .. Stack: Caused by: java.lang.NullPointerException at org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435) at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173) at org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328) at org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) at org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) at org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344) ... 16 more It needs to break into two DAGs since the second DAG expect hdfs input produced by the first DAG. Example of such e2e test failures are: Casts_[1-6] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3666) Fix store after load
[ https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3666: Attachment: PIG-3666-1.patch > Fix store after load > > > Key: PIG-3666 > URL: https://issues.apache.org/jira/browse/PIG-3666 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3666-1.patch > > > Several e2e test fail share the following pattern: > . > store into 'afile'; > a = load 'afile'; > .. > Stack: > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435) > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344) > ... 16 more > It needs to break into two DAGs since the second DAG expect hdfs input > produced by the first DAG. > Example of such e2e test failures are: Casts_[1-6] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (PIG-3666) Fix store after load
[ https://issues.apache.org/jira/browse/PIG-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-3666: --- Assignee: Daniel Dai > Fix store after load > > > Key: PIG-3666 > URL: https://issues.apache.org/jira/browse/PIG-3666 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3666-1.patch > > > Several e2e test fail share the following pattern: > . > store into 'afile'; > a = load 'afile'; > .. > Stack: > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:435) > at > org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:173) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:328) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:337) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezCompiler.compile(TezCompiler.java:215) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.compile(TezLauncher.java:152) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:72) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:344) > ... 16 more > It needs to break into two DAGs since the second DAG expect hdfs input > produced by the first DAG. > Example of such e2e test failures are: Casts_[1-6] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3665) TEZ-41 break pig-tez
[ https://issues.apache.org/jira/browse/PIG-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3665: Attachment: PIG-3665-1.patch > TEZ-41 break pig-tez > > > Key: PIG-3665 > URL: https://issues.apache.org/jira/browse/PIG-3665 > Project: Pig > Issue Type: Sub-task > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3665-1.patch > > > TEZ-41 introduce a backward incompatible change and Pig need to change > accordingly. Please update tez code once the change is checked into Pig. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PIG-3665) TEZ-41 break pig-tez
Daniel Dai created PIG-3665: --- Summary: TEZ-41 break pig-tez Key: PIG-3665 URL: https://issues.apache.org/jira/browse/PIG-3665 Project: Pig Issue Type: Sub-task Components: tez Reporter: Daniel Dai Assignee: Daniel Dai Fix For: tez-branch TEZ-41 introduce a backward incompatible change and Pig need to change accordingly. Please update tez code once the change is checked into Pig. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3557) Implement optimizations for LIMIT
[ https://issues.apache.org/jira/browse/PIG-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871183#comment-13871183 ] Daniel Dai commented on PIG-3557: - 1. You can check requestedParallelism for the tezOperator. This should be doable. 2. We can do a non-sorted scatter-gather, but this depends on TEZ-661, we cannot proceed now 4. We could use combiner and duplicate POLimit in the combiner. Otherwise, plan looks good. > Implement optimizations for LIMIT > - > > Key: PIG-3557 > URL: https://issues.apache.org/jira/browse/PIG-3557 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Alex Bain >Assignee: Alex Bain > > Implement optimizations for LIMIT when other parts of Pig-on-Tez are more > mature. Some of the optimizations mentioned by Daniel include: > 1. If the previous stage using 1 reduce, no need to add one more vertex > 2. If the limitplan is null (ie, not the "limited order by" case), we might > not need a shuffle edge, a pass through edge should be enough if possible > 3. Similar to PIG-1270, we can push limit to InputHandler > 4. We also need to think through the "limited order by" case once "order by" > is implemented -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3644) Implement skewed join in Tez
[ https://issues.apache.org/jira/browse/PIG-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3644: --- Status: Patch Available (was: Open) > Implement skewed join in Tez > > > Key: PIG-3644 > URL: https://issues.apache.org/jira/browse/PIG-3644 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3644-1.patch > > > Skewed join in Tez can be implemented similarly to order-by (PIG-3634). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3644) Implement skewed join in Tez
[ https://issues.apache.org/jira/browse/PIG-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3644: --- Attachment: PIG-3644-1.patch Attaching the 1st patch. The RB link is- https://reviews.apache.org/r/16860/ > Implement skewed join in Tez > > > Key: PIG-3644 > URL: https://issues.apache.org/jira/browse/PIG-3644 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3644-1.patch > > > Skewed join in Tez can be implemented similarly to order-by (PIG-3634). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Review Request 16860: PIG-3644: Implement skewed join in Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16860/ --- Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy. Bugs: PIG-3644 https://issues.apache.org/jira/browse/PIG-3644 Repository: pig-git Description --- Skewed join in Tez is implemented in 5 vertices: Vertex 1) Sample/load skewed table => broadcast sampling input to vertex 2 and shuffle entire input to vertex 3. Vertex 2) Sampling aggregation vertex => build distribution map and broadcast it to vertex 3 and 4. Vertex 3) POLocalRearrangeTez for skewed table => partition skewed table using SkewedPartitioner and shuffle it to vertex 5. Vertex 4) POPartitionRearrangeTez for streaming table => shuffle streaming table to vertex 5. Vertex 5) Join inputs from vertex 3 and 4. New classes for Tez: - POPoissonSample) Sampling operator for skewed join. - POPartitionRearrangeTez) Sub-class of POPartitionRearrange for Tez. - SkewedPartitionerTez) Sub-class of SkewedPartitioner for Tez. Note that there are a couple of places I can refactor. For eg, - POPoissonSample and PoissonSampleLoader - POPartitionRearrageTez and POLocalRearrangeTez I will do it in follow-up jiras. Diffs - src/org/apache/pig/PigConfiguration.java ccf3635 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/SkewedPartitioner.java 4790abe src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPoissonSample.java e69de29 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POReservoirSample.java bcb339c src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 585509d src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java e69de29 src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java e9d8e64 src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java e22c319 src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java e69de29 src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java d35e87d src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 83e5d2c src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 93e522f src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java 7bcc79e src/org/apache/pig/impl/builtin/PartitionSkewedKeys.java 7ce0e82 src/org/apache/pig/impl/builtin/PoissonSampleLoader.java 5ce5b9e test/e2e/pig/tests/tez.conf ac254e5 Diff: https://reviews.apache.org/r/16860/diff/ Testing --- - Added e2e test cases for inner and outer skewed joins. - unit tests pass. - e2e tests pass. Thanks, Cheolsoo Park