[jira] [Updated] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3419: --- Attachment: updated-8-23-2013-exec-engine.patch I am uploading a new patch that includes the following changes: * Fixes most test cases (issues with JobStats and Explain). * Removes "src/META-INF/services/org.apache.pig.backend.executionengine.ExecType" because it's duplicate. (Probably it was added by mistake.) * Renames TestJobStats.java to TestMRJobStats.java since it tests MRJobStats. * Fixes a bunch of Java warnings. The diff from Achal's last patch can be viewed [here|https://github.com/piaozhexiu/apache-pig/commit/2a0b8bd00ae8685cd13d9b5ea08cb4672c71f450]. I just kicked off the unit tests again and will let you know how it goes. Thanks! > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Karn updated PIG-2417: - Patch Info: Patch Available > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.11 >Reporter: Jeremy Karn >Assignee: Jeremy Karn > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-2417 started by Jeremy Karn. > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.11 >Reporter: Jeremy Karn >Assignee: Jeremy Karn > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work stopped] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-2417 stopped by Jeremy Karn. > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.11 >Reporter: Jeremy Karn >Assignee: Jeremy Karn > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749226#comment-13749226 ] Julien Le Dem commented on PIG-3419: The advantage of having the Execution engine abstraction in trunk is it allows running experimental Pig execution engines implementations like Tez or Spark on an official release of Pig without having to build from a specific branch. The execution engine implementations themselves are fairly independent of Pig and do not need to be maintained in a Pig branch. If the ExecutionEngine abstraction evolves over time that can be done in Trunk and can be merged independently of the Tez implementation itself. > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (18 issues) Subscriber: pigdaily Key Summary PIG-3436Make pigmix run with Hadoop2 https://issues.apache.org/jira/browse/PIG-3436 PIG-3431Return more information for parsing related exceptions. https://issues.apache.org/jira/browse/PIG-3431 PIG-3430Add xml format for explaining MapReduce Plan. https://issues.apache.org/jira/browse/PIG-3430 PIG-3426Add support for removing s3 files https://issues.apache.org/jira/browse/PIG-3426 PIG-3419Pluggable Execution Engine https://issues.apache.org/jira/browse/PIG-3419 PIG-3374CASE and IN fail when expression includes dereferencing operator https://issues.apache.org/jira/browse/PIG-3374 PIG-3349Document ToString(Datetime, String) UDF https://issues.apache.org/jira/browse/PIG-3349 PIG-3346New property that controls the number of combined splits https://issues.apache.org/jira/browse/PIG-3346 PIG-Fix remaining Windows core unit test failures https://issues.apache.org/jira/browse/PIG- PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3117A debug mode in which pig does not delete temporary files https://issues.apache.org/jira/browse/PIG-3117 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3048Add mapreduce workflow information to job configuration https://issues.apache.org/jira/browse/PIG-3048 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Assigned] (PIG-2606) union is not accepting same alias as multiple inputs
[ https://issues.apache.org/jira/browse/PIG-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan reassigned PIG-2606: -- Assignee: Hari Sankar Sivarama Subramaniyan > union is not accepting same alias as multiple inputs > > > Key: PIG-2606 > URL: https://issues.apache.org/jira/browse/PIG-2606 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Thejas M Nair >Assignee: Hari Sankar Sivarama Subramaniyan > > grunt> l = load 'x'; > grunt> u = union l, l; > 2012-03-16 18:48:45,687 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. Union with Count(Operand) < 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749190#comment-13749190 ] Dmitriy V. Ryaboy commented on PIG-3419: Olga, first commit to the spork branch is from *2012*. https://github.com/dvryaboy/pig (the default branch on my github is "spork"). > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Upgrading antlr to 3.5
Hive had lot of trouble while upgrading antlr last time. See https://issues.apache.org/jira/browse/HIVE-2439 & https://issues.apache.org/jira/browse/HIVE-4547 Not to say you will encounter these difficulties in next upgrade too but given the fact that antlr is not very particular about backward & forward compatibility and Hive uses antlr in pretty grueling way, I will be pretty cautious with upgrade. My two cents. Ashutosh On Fri, Aug 23, 2013 at 4:17 PM, Daniel Dai wrote: > If 3.5 can work without any code change, probably should be Ok. But we > never tried that. > > > On Fri, Aug 23, 2013 at 7:43 PM, Prashant Kommireddi >wrote: > > > Hi Daniel, > > > > The reasons are more internal. Our app is having an issue with 3.4 and > it's > > easier for us to move forward to 3.5 > > > > > http://antlr.markmail.org/search/?q=%22void+%3D+null%3B%22#query:%22void%20%3D%20null%3B%22%20order%3Adate-backward+page:1+mid:7g3th2bg3onyoqhv+state:results > > > > Is it difficult to upgrade the version across the board (hive + pig)? > > > > > > On Fri, Aug 23, 2013 at 2:02 PM, Daniel Dai > wrote: > > > > > Any reason why you want to upgrade to 3.5? We'd like Hive/Pig use the > > same > > > version of antrl, which ease the integration work of Hive/Pig/HCat. > > > > > > Thanks, > > > Daniel > > > > > > > > > On Fri, Aug 23, 2013 at 5:50 PM, Prashant Kommireddi < > > prash1...@gmail.com > > > >wrote: > > > > > > > Hey guys, > > > > > > > > Anyone aware of any issues with upgrading antlr to v3.5 for Pig? I am > > > > planning to try it out, and wanted to make sure it's not already been > > > > tried. > > > > > > > > Thanks, > > > > Prashant > > > > > > > > > > -- > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity > > to > > > which it is addressed and may contain information that is confidential, > > > privileged and exempt from disclosure under applicable law. If the > reader > > > of this message is not the intended recipient, you are hereby notified > > that > > > any printing, copying, dissemination, distribution, disclosure or > > > forwarding of this communication is strictly prohibited. If you have > > > received this communication in error, please contact the sender > > immediately > > > and delete it from your system. Thank You. > > > > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
Re: Upgrading antlr to 3.5
If 3.5 can work without any code change, probably should be Ok. But we never tried that. On Fri, Aug 23, 2013 at 7:43 PM, Prashant Kommireddi wrote: > Hi Daniel, > > The reasons are more internal. Our app is having an issue with 3.4 and it's > easier for us to move forward to 3.5 > > http://antlr.markmail.org/search/?q=%22void+%3D+null%3B%22#query:%22void%20%3D%20null%3B%22%20order%3Adate-backward+page:1+mid:7g3th2bg3onyoqhv+state:results > > Is it difficult to upgrade the version across the board (hive + pig)? > > > On Fri, Aug 23, 2013 at 2:02 PM, Daniel Dai wrote: > > > Any reason why you want to upgrade to 3.5? We'd like Hive/Pig use the > same > > version of antrl, which ease the integration work of Hive/Pig/HCat. > > > > Thanks, > > Daniel > > > > > > On Fri, Aug 23, 2013 at 5:50 PM, Prashant Kommireddi < > prash1...@gmail.com > > >wrote: > > > > > Hey guys, > > > > > > Anyone aware of any issues with upgrading antlr to v3.5 for Pig? I am > > > planning to try it out, and wanted to make sure it's not already been > > > tried. > > > > > > Thanks, > > > Prashant > > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749106#comment-13749106 ] Olga Natkovich commented on PIG-3419: - I think the reason we wanted it on the Tez branch is that it might evolve with Tez implementation and so we would merge the updated code back when Tez is ready. Since there are no plans for any additional backend, is there a need to apply this to trunk sooner rather than later? > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749014#comment-13749014 ] Dmitriy V. Ryaboy commented on PIG-3419: Rohini, I want to reiterate that this patch has NO tez dependencies (if it does, that's a bug). The intention is not to make Tez possible. It's to make pluggable execution engines possible; and I do not want that functionality to be tied to a tez branch that will be unstable and in heavy development for the foreseeable future. This work will be immediately useful for the Spork (pig on spark) branch, for example. Also, it allows people to work with new runtimes *without modifying Pig*. So Tez-on-Pig doesn't even have to be done as a branch of this project, someone can go an experiment completely independently. For these reasons, I would like it in trunk. You make a great point about the danger of changing exceptions, public methods, etc. I believe that most of these are project-public, and annotated as such. Do you have specific methods you are concerned about? Ideally we would change as little as possible for the end user. Dmitriy > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
[ https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3379: Resolution: Fixed Fix Version/s: 0.12 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Xuefu! > Alias reuse in nested foreach causes PIG script to fail > --- > > Key: PIG-3379 > URL: https://issues.apache.org/jira/browse/PIG-3379 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.12 > > Attachments: PIG-3379-draft.patch, PIG-3379.patch > > > The following script fails: > {code:title=temp.pig} > Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, > eventName:chararray); > Events = FOREACH Events GENERATE eventTime, deviceId, eventName; > EventsPerMinute = GROUP Events BY (eventTime / 6); > EventsPerMinute = FOREACH EventsPerMinute { > DistinctDevices = DISTINCT Events.deviceId; > nbDevices = SIZE(DistinctDevices); > DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; > nbDevicesWatching = SIZE(DistinctDevices); > GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching > as nbDevicesWatching; > } > EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < > 10; > A = FOREACH EventsPerMinute GENERATE timeStamp; > describe A; > {code} > With the error: > {code} > 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field > projection. Projected field [timeStamp] does not exist in schema: > deviceId:chararray. > {code} > Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As > an observation, removing the last filter statement also fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748957#comment-13748957 ] Jeremy Karn commented on PIG-2417: -- Here's the review board: https://reviews.apache.org/r/13781/ > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.11 >Reporter: Jeremy Karn >Assignee: Jeremy Karn > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 13781: Changes to add support for streaming_python udfs.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13781/ --- Review request for pig. Repository: pig-git Description --- Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417) Diffs - build.xml b20eb3d src/org/apache/pig/PigToStream.java 7cc2950 src/org/apache/pig/StreamToPig.java ff24b27 src/org/apache/pig/builtin/PigStreaming.java 5467693 src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3 src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83 src/org/apache/pig/impl/streaming/InputHandler.java 690d94e src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION src/org/apache/pig/impl/util/JarManager.java 5c4acb0 src/org/apache/pig/impl/util/StorageUtil.java dcb62ec src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION src/python/streaming/controller.py PRE-CREATION src/python/streaming/pig_util.py PRE-CREATION test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019 test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION test/org/apache/pig/test/TestStreaming.java 1eac5d2 test/python/streaming/test_controller.py PRE-CREATION test/unit-tests d52ad9d Diff: https://reviews.apache.org/r/13781/diff/ Testing --- Thanks, Jeremy Karn
[jira] [Updated] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Karn updated PIG-2417: - Attachment: PIG-2417-5.patch Here's an updated patch that I think should be ready for review (review board coming soon). Aside from the streaming python udfs this patch also contains some logic for capturing output from the python process that doesn't do much. However, I'm hoping to get a patch up soon with Mortar's illustrate changes and that will take advantage of the captured output. One thing thats still outstanding is documentation changes. Should I just add a section similar to http://pig.apache.org/docs/r0.11.1/udf.html#python-udfs for streaming python? > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.11 >Reporter: Jeremy Karn >Assignee: Jeremy Karn > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Upgrading antlr to 3.5
Hi Daniel, The reasons are more internal. Our app is having an issue with 3.4 and it's easier for us to move forward to 3.5 http://antlr.markmail.org/search/?q=%22void+%3D+null%3B%22#query:%22void%20%3D%20null%3B%22%20order%3Adate-backward+page:1+mid:7g3th2bg3onyoqhv+state:results Is it difficult to upgrade the version across the board (hive + pig)? On Fri, Aug 23, 2013 at 2:02 PM, Daniel Dai wrote: > Any reason why you want to upgrade to 3.5? We'd like Hive/Pig use the same > version of antrl, which ease the integration work of Hive/Pig/HCat. > > Thanks, > Daniel > > > On Fri, Aug 23, 2013 at 5:50 PM, Prashant Kommireddi >wrote: > > > Hey guys, > > > > Anyone aware of any issues with upgrading antlr to v3.5 for Pig? I am > > planning to try it out, and wanted to make sure it's not already been > > tried. > > > > Thanks, > > Prashant > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748907#comment-13748907 ] Rohini Palaniswamy commented on PIG-3419: - In the Pig-on-Tez meeting in Linkedin we decided to do Tez work on a branch and that Cheolsoo will initiate conversation thread on mailing list for it and take up the task of creating the branch. Tez is relatively new and unstable so it will be wise to not start with code directly on trunk. Hive is also doing their Tez work on a branch. Cheolsoo had a question as to whether we should commit this to trunk and branch after that. I would prefer PIG-3419 to be also put in the branch and not checked into trunk. It makes lot of changes to the Exceptions thrown, removes public methods etc and that might cause backward incompatibility during runtime with code compiled with previous versions of pig. All that needs to be figured out and fixed. So might not be a good idea to get this patch directly into trunk. Thoughts? > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3437) Error while running e2e test: "Can't open ./resource/hadoop23.res, No such file or directory " coming from test_harness.pl line #179
[ https://issues.apache.org/jira/browse/PIG-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748906#comment-13748906 ] Annie Lin commented on PIG-3437: using pig trunk workspace and run e2e test using hadoop23 on CI, it failed due to hadoop23.res not found, can someone point me where I can get this file? In test_harness.pl my $harnessRes = ""; if (defined($ENV{'HARNESS_RESOURCE'})) { $harnessRes = $ENV{'HARNESS_RESOURCE'}; } elsif($^O =~ /mswin/i) { $harnessRes = "$ROOT/resource/windows.res"; } elsif ($globalCfg->{'hadoopversion'} == '23') { $harnessRes = "$ROOT/resource/hadoop23.res"; <= } else { $harnessRes = "$ROOT/resource/default.res"; } below is error in console log from jenkins: [exec] FATAL ERROR ./test_harness.pl at 179: Can't open ./resource/hadoop23.res, No such file or directory thanks, Annie > Error while running e2e test: "Can't open ./resource/hadoop23.res, No such > file or directory " coming from test_harness.pl line #179 > - > > Key: PIG-3437 > URL: https://issues.apache.org/jira/browse/PIG-3437 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Annie Lin > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3437) Error while running e2e test: "Can't open ./resource/hadoop23.res, No such file or directory " coming from test_harness.pl line #179
Annie Lin created PIG-3437: -- Summary: Error while running e2e test: "Can't open ./resource/hadoop23.res, No such file or directory " coming from test_harness.pl line #179 Key: PIG-3437 URL: https://issues.apache.org/jira/browse/PIG-3437 Project: Pig Issue Type: Bug Components: e2e harness Reporter: Annie Lin -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748873#comment-13748873 ] Achal Soni commented on PIG-3419: - [~cheolsoo] Thanks a lot for running the test suite! It's good to see where the patch is failing. I definitely agree that all of these need to be investigated before the patch gets anywhere. I have some ideas about a few of the test cases, looks to be some minor stuff with JobStats and the way Explain works now which I have to look into. The rest I can't really think of off hte top of my head but I'll give it a shot. I'll report back with some more findings as soon as possible. > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Upgrading antlr to 3.5
Any reason why you want to upgrade to 3.5? We'd like Hive/Pig use the same version of antrl, which ease the integration work of Hive/Pig/HCat. Thanks, Daniel On Fri, Aug 23, 2013 at 5:50 PM, Prashant Kommireddi wrote: > Hey guys, > > Anyone aware of any issues with upgrading antlr to v3.5 for Pig? I am > planning to try it out, and wanted to make sure it's not already been > tried. > > Thanks, > Prashant > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (PIG-3436) Make pigmix run with Hadoop2
[ https://issues.apache.org/jira/browse/PIG-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3436: Status: Patch Available (was: Open) > Make pigmix run with Hadoop2 > > > Key: PIG-3436 > URL: https://issues.apache.org/jira/browse/PIG-3436 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3436-1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3435) Custom Partitioner not working with MultiQueryOptimizer
[ https://issues.apache.org/jira/browse/PIG-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-3435: -- Attachment: pig-3435-v02_skipcustompatitioner_for_merge.patch While looking at the testcase, found PIG-2627 where it fixed one of the issues with custom-partitioner and multiquery optimization (but not all). Specific case mentioned on that ticket is handled on that jira and it works, but my patch here simply skips multiquery optimization for ALL custom partitioner jobs. Since it's sort of a correctness issue, I want this fix to be back-ported to 0.11. And for that, I kept the change to be simple. Can we create a separate jira for reviving custom-partitioner + multiquery optimization for later releases? > Custom Partitioner not working with MultiQueryOptimizer > --- > > Key: PIG-3435 > URL: https://issues.apache.org/jira/browse/PIG-3435 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-3435-v01.patch, > pig-3435-v02_skipcustompatitioner_for_merge.patch > > > When looking at PIG-3385, noticed some issues in handling of custom > partitioner with multi-query optimization. > {noformat} > C1 = group B1 by col1 PARTITION BY >org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; > C2 = group B2 by col1 PARTITION BY >org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; > {noformat} > This seems to be merged to one mapreduce job correctly but custom partitioner > information was lost. > {noformat} > C1 = group B1 by col1 PARTITION BY > org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; > C2 = group B2 by col1 parallel 2; > {noformat} > This seems to be merged even though they should run on two different > partitioner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Upgrading antlr to 3.5
Hey guys, Anyone aware of any issues with upgrading antlr to v3.5 for Pig? I am planning to try it out, and wanted to make sure it's not already been tried. Thanks, Prashant
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748657#comment-13748657 ] Cheolsoo Park commented on PIG-3419: All, so here is the list of failing tests: {code} org.apache.pig.test.TestGrunt.testScriptMissingLastNewLine org.apache.pig.test.TestGrunt.testCheckScriptSyntaxWithSemiColonUDFErr org.apache.pig.test.TestGrunt.testExplainDot org.apache.pig.test.TestGrunt.testExplainOut org.apache.pig.test.TestGrunt.testExplainBrief org.apache.pig.test.TestGrunt.testExplainEmpty org.apache.pig.test.TestGrunt.testExplainScript org.apache.pig.test.TestInputOutputMiniClusterFileValidator.testValidationNeg org.apache.pig.test.TestJobStats.testOneTaskReport org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage1 org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage2 org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage3 org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage4 org.apache.pig.test.TestJobStats.testMedianMapReduceTime org.apache.pig.test.TestJobStats.testGetOuputSizeUsingFileBasedStorage org.apache.pig.test.TestMRExecutionEngine.testJobConfGeneration org.apache.pig.test.TestMRExecutionEngine.testJobConfGenerationWithUserConfigs org.apache.pig.test.TestMacroExpansion.test20 org.apache.pig.test.TestMacroExpansion.test21 org.apache.pig.test.TestMacroExpansion.test22 org.apache.pig.test.TestMacroExpansion.test23 org.apache.pig.test.TestMacroExpansion.test32 org.apache.pig.test.TestMacroExpansion.test33 org.apache.pig.test.TestMacroExpansion.test34 org.apache.pig.test.TestMacroExpansion.test35 org.apache.pig.test.TestMacroExpansion.testCommentInMacro org.apache.pig.test.TestMacroExpansion.testNegativeNumber org.apache.pig.test.TestMacroExpansion.typecastTest org.apache.pig.test.TestMacroExpansion.testFilter org.apache.pig.test.TestMapSideCogroup.testFailure2 org.apache.pig.test.TestMergeJoinOuter.testFailure org.apache.pig.test.TestPigRunner.testEmptyFile org.apache.pig.test.TestScriptLanguage.testSysArguments org.apache.pig.test.TestShortcuts.testExplainShortcutNoAlias org.apache.pig.test.TestShortcuts.testExplainShortcutNoAliasDefined {code} I prefer fixing them beforehand to fixing them afterward. Although none of these failures is serious (I believe), can we have a couple of more days before committing Achal's patch? I will make sure it gets committed into trunk because I definitely need it for a Tez branch. Thoughts? > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira