[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773742#comment-13773742 ] Koji Noguchi commented on PIG-2672: --- bq. Note: HADOOP-9639 has improved mechanism for this. I haven't read the patch but I thought HADOOP-9639 introduces a security hole unless NodeManager does the SHA-1 level checksumming. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773665#comment-13773665 ] Aniket Mokashi commented on PIG-2672: - Oh, actually I just noticed, the config names are - pig.shared.cluster.cache.location, pig.shared.user.cache.location. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773663#comment-13773663 ] Aniket Mokashi commented on PIG-2672: - Thanks Dmitriy! I will make those changes. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773662#comment-13773662 ] Aniket Mokashi commented on PIG-2672: - RB: https://reviews.apache.org/r/14274/ > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 14274: PIG-2672 Optimize the use of DistributedCache
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14274/ --- Review request for pig, Cheolsoo Park, DanielWX DanielWX, Dmitriy Ryaboy, Julien Le Dem, and Rohini Palaniswamy. Bugs: PIG-2672 https://issues.apache.org/jira/browse/PIG-2672 Repository: pig Description --- added jar.cache.location option Diffs - trunk/src/org/apache/pig/PigConstants.java 1525188 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1525188 trunk/src/org/apache/pig/impl/PigContext.java 1525188 trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1525188 trunk/test/org/apache/pig/test/TestJobControlCompiler.java 1525188 Diff: https://reviews.apache.org/r/14274/diff/ Testing --- Thanks, Aniket Mokashi
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (14 issues) Subscriber: pigdaily Key Summary PIG-3470Print configuration variables in grunt https://issues.apache.org/jira/browse/PIG-3470 PIG-3461Rewrite PartitionFilterOptimizer to make it work for all the cases https://issues.apache.org/jira/browse/PIG-3461 PIG-3451EvalFunc ctor reflection to determine value of type param T is brittle https://issues.apache.org/jira/browse/PIG-3451 PIG-3449Move JobCreationException to org.apache.pig.backend.hadoop.executionengine https://issues.apache.org/jira/browse/PIG-3449 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3434Null subexpression in bincond nullifies outer tuple (or bag) https://issues.apache.org/jira/browse/PIG-3434 PIG-3388No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage https://issues.apache.org/jira/browse/PIG-3388 PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 PIG-2672Optimize the use of DistributedCache https://issues.apache.org/jira/browse/PIG-2672 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773636#comment-13773636 ] Dmitriy V. Ryaboy commented on PIG-2672: Aniket, can we prefix the properties with "pig."? That way we won't conflict with potential properties from Hadoop, and it's a little easier to analyze stuff when looking at the jobconf. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773632#comment-13773632 ] Jeremy Karn commented on PIG-2417: -- I agree. If we get this committed and open a new jira for the hadoop2 problems, that'll give me a bit of time to set up a hadoop2 cluster and work out any kinks. > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12 >Reporter: Jeremy Karn > Fix For: 0.12 > > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, > PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-9-1.patch, PIG-2417-9-2.patch, > PIG-2417-9.patch, PIG-2417-e2e.patch, streaming2.patch, streaming3.patch, > streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3473) org.apache.pig.Expression should support "is null" and "not" operations
[ https://issues.apache.org/jira/browse/PIG-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3473: Description: Currently Expression only support BinaryExpressions and Constants. Most of the other logical expressions (cast, udf) need not be pushed down. But, it would make sense to be able to pushdown is null and not operations (possibly negativeexpression). This change would have impact on LoadFunc's (hcatloader), we need to be careful and make sure we do this in a backwards compatible way. > org.apache.pig.Expression should support "is null" and "not" operations > --- > > Key: PIG-3473 > URL: https://issues.apache.org/jira/browse/PIG-3473 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi > Fix For: 0.12.1 > > > Currently Expression only support BinaryExpressions and Constants. Most of > the other logical expressions (cast, udf) need not be pushed down. But, it > would make sense to be able to pushdown is null and not operations (possibly > negativeexpression). > This change would have impact on LoadFunc's (hcatloader), we need to be > careful and make sure we do this in a backwards compatible way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3473) org.apache.pig.Expression should support "is null" and "not" operations
Aniket Mokashi created PIG-3473: --- Summary: org.apache.pig.Expression should support "is null" and "not" operations Key: PIG-3473 URL: https://issues.apache.org/jira/browse/PIG-3473 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.11.1 Reporter: Aniket Mokashi Fix For: 0.12.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Are we ready for Pig 0.12.0 release?
wooo :D On 2013-09-20, at 19:32 , Daniel Dai wrote: > PIG-3454 is already committed several days back. > > > On Fri, Sep 20, 2013 at 4:29 PM, Erik Selin wrote: > >> Could we get PIG-3454 as well? >> >> Thanks :) >> >> Erik >> >> On 2013-09-20, at 18:46 , Julien Le Dem wrote: >> >>> I'd like to get PIG-3445 in too. >>> Julien >>> >>> On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote: >>> With regard to branching 0.12, I will try to commit PIG-2417 and >> Cheolsoo will probably commit PIG-3471. After that I will branch 0.12, hopefully over the weekend. Anything I miss? Thanks, Daniel On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney wrote: > +1, I need PIG-2417 too. > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn >> wrote: > >> I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I >> would >> like to get into 0.12 because we've had a number of people ask us >> about >> getting it committed back to Apache. However, if it looks like too >> much > to >> review and get committed in the next week or two it could probably be >> pushed off. >> >> I also have 3 small jiras (3426, 3430, 3431) I'd like to get into >> 0.12. >> I'm going to double check the submitted patches today because I think >> https://issues.apache.org/jira/browse/PIG-3419 might have broken the >> currently submitted patches. >> >> >> On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi < >> prash1...@gmail.com >>> wrote: >> >>> +1 for a 0.12 release. >>> >>> I have one outstanding JIRA >> https://issues.apache.org/jira/browse/PIG-3199 >>> . >>> Cheolsoo was fine with the patch (except for a typo which I will > correct) >>> but wanted a second opinion. Can someone please take a look? >>> >>> >>> On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho < >> jar...@apache.org wrote: >>> I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this >> week, to see if it can be included. Jarcec On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: > +1. I will go through my jiras this week. > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai > wrote: > >> Hi, All, >> It has been more than half a year since initial Pig 0.11 release. >> I'd like >> roll a Pig 0.12 release around the end of September or the >> beginning >>> of >> October. Let me know if it is possible. >> >> Proposed schedule: >> 1. Commit all major features (1-2 weeks) >> 2. Branching Pig 0.12 >> 3. Commit remaining patches (1-2 weeks) >> 4. Wrapping up, document (1 week) >> >> If you have patches want to get in, please make sure the Jira >> ticket has >> fix version set to 0.12. If the patches originally set to 0.12 > and >>> you >> think you can delay, please mark the fix version to either 0.13.0 >> or >> 0.12.1. >> >> Thanks, >> Daniel >> >> -- >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity to >> which it is addressed and may contain information that is >>> confidential, >> privileged and exempt from disclosure under applicable law. If > the reader >> of this message is not the intended recipient, you are hereby >>> notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you >> have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >>> >> >> >> >> -- >> >> Jeremy Karn / Lead Developer >> MORTAR DATA / 519 277 4391 / www.mortardata.com >> > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > datasyndrome.com > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or >> entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the >> reader of this message is not the intended recipient, you are hereby notified >> that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender >> immediately and delete it from your system. Thank You. >>> >> >> > > --
Re: Are we ready for Pig 0.12.0 release?
PIG-3454 is already committed several days back. On Fri, Sep 20, 2013 at 4:29 PM, Erik Selin wrote: > Could we get PIG-3454 as well? > > Thanks :) > > Erik > > On 2013-09-20, at 18:46 , Julien Le Dem wrote: > > > I'd like to get PIG-3445 in too. > > Julien > > > > On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote: > > > >> With regard to branching 0.12, I will try to commit PIG-2417 and > Cheolsoo > >> will probably commit PIG-3471. After that I will branch 0.12, hopefully > >> over the weekend. Anything I miss? > >> > >> Thanks, > >> Daniel > >> > >> > >> On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney > >> wrote: > >> > >>> +1, I need PIG-2417 too. > >>> > >>> > >>> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn > wrote: > >>> > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I > would > like to get into 0.12 because we've had a number of people ask us > about > getting it committed back to Apache. However, if it looks like too > much > >>> to > review and get committed in the next week or two it could probably be > pushed off. > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into > 0.12. > I'm going to double check the submitted patches today because I think > https://issues.apache.org/jira/browse/PIG-3419 might have broken the > currently submitted patches. > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi < > prash1...@gmail.com > > wrote: > > > +1 for a 0.12 release. > > > > I have one outstanding JIRA > https://issues.apache.org/jira/browse/PIG-3199 > > . > > Cheolsoo was fine with the patch (except for a typo which I will > >>> correct) > > but wanted a second opinion. Can someone please take a look? > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho < > jar...@apache.org > >> wrote: > > > >> I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this > week, > >> to see if it can be included. > >> > >> Jarcec > >> > >> On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: > >>> +1. I will go through my jiras this week. > >>> > >>> > >>> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai > >> wrote: > >>> > Hi, All, > It has been more than half a year since initial Pig 0.11 release. > I'd > >> like > roll a Pig 0.12 release around the end of September or the > beginning > > of > October. Let me know if it is possible. > > Proposed schedule: > 1. Commit all major features (1-2 weeks) > 2. Branching Pig 0.12 > 3. Commit remaining patches (1-2 weeks) > 4. Wrapping up, document (1 week) > > If you have patches want to get in, please make sure the Jira > ticket > >> has > fix version set to 0.12. If the patches originally set to 0.12 > >>> and > > you > think you can delay, please mark the fix version to either 0.13.0 > or > 0.12.1. > > Thanks, > Daniel > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or > >> entity to > which it is addressed and may contain information that is > > confidential, > privileged and exempt from disclosure under applicable law. If > >>> the > >> reader > of this message is not the intended recipient, you are hereby > > notified > >> that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you > have > received this communication in error, please contact the sender > >> immediately > and delete it from your system. Thank You. > > >> > > > > > > -- > > Jeremy Karn / Lead Developer > MORTAR DATA / 519 277 4391 / www.mortardata.com > > >>> > >>> > >>> > >>> -- > >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > >>> datasyndrome.com > >>> > >> > >> -- > >> CONFIDENTIALITY NOTICE > >> NOTICE: This message is intended for the use of the individual or > entity to > >> which it is addressed and may contain information that is confidential, > >> privileged and exempt from disclosure under applicable law. If the > reader > >> of this message is not the intended recipient, you are hereby notified > that > >> any printing, copying, dissemination, distribution, disclosure or > >> forwarding of this communication is strictly prohibited. If you have > >> received this communication in error, please contact the sender > immediately > >> and delete it from your system. Thank You. > > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain
[jira] [Updated] (PIG-3448) Tez backend layout
[ https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3448: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to tez branch. > Tez backend layout > -- > > Key: PIG-3448 > URL: https://issues.apache.org/jira/browse/PIG-3448 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch > > > Design the high-level layout of Tez backend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Are we ready for Pig 0.12.0 release?
Could we get PIG-3454 as well? Thanks :) Erik On 2013-09-20, at 18:46 , Julien Le Dem wrote: > I'd like to get PIG-3445 in too. > Julien > > On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote: > >> With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo >> will probably commit PIG-3471. After that I will branch 0.12, hopefully >> over the weekend. Anything I miss? >> >> Thanks, >> Daniel >> >> >> On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney >> wrote: >> >>> +1, I need PIG-2417 too. >>> >>> >>> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn wrote: >>> I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I would like to get into 0.12 because we've had a number of people ask us about getting it committed back to Apache. However, if it looks like too much >>> to review and get committed in the next week or two it could probably be pushed off. I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12. I'm going to double check the submitted patches today because I think https://issues.apache.org/jira/browse/PIG-3419 might have broken the currently submitted patches. On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi wrote: > +1 for a 0.12 release. > > I have one outstanding JIRA https://issues.apache.org/jira/browse/PIG-3199 > . > Cheolsoo was fine with the patch (except for a typo which I will >>> correct) > but wanted a second opinion. Can someone please take a look? > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho > wrote: > >> I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this week, >> to see if it can be included. >> >> Jarcec >> >> On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: >>> +1. I will go through my jiras this week. >>> >>> >>> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai >>> >> wrote: >>> Hi, All, It has been more than half a year since initial Pig 0.11 release. I'd >> like roll a Pig 0.12 release around the end of September or the beginning > of October. Let me know if it is possible. Proposed schedule: 1. Commit all major features (1-2 weeks) 2. Branching Pig 0.12 3. Commit remaining patches (1-2 weeks) 4. Wrapping up, document (1 week) If you have patches want to get in, please make sure the Jira ticket >> has fix version set to 0.12. If the patches originally set to 0.12 >>> and > you think you can delay, please mark the fix version to either 0.13.0 or 0.12.1. Thanks, Daniel -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or >> entity to which it is addressed and may contain information that is > confidential, privileged and exempt from disclosure under applicable law. If >>> the >> reader of this message is not the intended recipient, you are hereby > notified >> that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender >> immediately and delete it from your system. Thank You. >> > -- Jeremy Karn / Lead Developer MORTAR DATA / 519 277 4391 / www.mortardata.com >>> >>> >>> >>> -- >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >>> datasyndrome.com >>> >> >> -- >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity to >> which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >
[jira] [Commented] (PIG-3448) Tez backend layout
[ https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773571#comment-13773571 ] Cheolsoo Park commented on PIG-3448: [~aniket486], thank you for taking a look. Regarding your comment on JVM heap size on unit test, I just found some unit test cases (e.g. TestPigServer) fail with OOM when running with Hadoop-2.1.0-beta so increased the heap size to keep my jenkins build happy for now. > Tez backend layout > -- > > Key: PIG-3448 > URL: https://issues.apache.org/jira/browse/PIG-3448 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch > > > Design the high-level layout of Tez backend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3446) Umbrella jira for Pig on Tez
[ https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773569#comment-13773569 ] Julien Le Dem commented on PIG-3446: Here is the work that Achal did for Pig-on-Tez https://github.com/achalsoni81/pigeon > Umbrella jira for Pig on Tez > > > Key: PIG-3446 > URL: https://issues.apache.org/jira/browse/PIG-3446 > Project: Pig > Issue Type: New Feature > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > > This is a umbrella jira for Pig on Tez. More detailed subtasks will be added. > More information can be found on the following wiki page: > https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3471: --- Affects Version/s: 0.12 Fix Version/s: 0.12 > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: 0.12, tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12, tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3471: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773560#comment-13773560 ] Rohini Palaniswamy commented on PIG-2672: - I can take a look at this one. Can you put this up in review board please? > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773513#comment-13773513 ] Aniket Mokashi commented on PIG-2672: - Note: HADOOP-9639 has improved mechanism for this. However, this is still somewhat useful for users that are on old versions of hadoop. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi reassigned PIG-2672: --- Assignee: Aniket Mokashi (was: Rohini Palaniswamy) > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Are we ready for Pig 0.12.0 release?
I'd like to get PIG-3445 in too. Julien On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote: > With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo > will probably commit PIG-3471. After that I will branch 0.12, hopefully > over the weekend. Anything I miss? > > Thanks, > Daniel > > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney > wrote: > >> +1, I need PIG-2417 too. >> >> >> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn wrote: >> >>> I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I >>> would >>> like to get into 0.12 because we've had a number of people ask us about >>> getting it committed back to Apache. However, if it looks like too much >> to >>> review and get committed in the next week or two it could probably be >>> pushed off. >>> >>> I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12. >>> I'm going to double check the submitted patches today because I think >>> https://issues.apache.org/jira/browse/PIG-3419 might have broken the >>> currently submitted patches. >>> >>> >>> On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi >>> wrote: >>> +1 for a 0.12 release. I have one outstanding JIRA >>> https://issues.apache.org/jira/browse/PIG-3199 . Cheolsoo was fine with the patch (except for a typo which I will >> correct) but wanted a second opinion. Can someone please take a look? On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho wrote: > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this >>> week, > to see if it can be included. > > Jarcec > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: >> +1. I will go through my jiras this week. >> >> >> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai >> > wrote: >> >>> Hi, All, >>> It has been more than half a year since initial Pig 0.11 release. >>> I'd > like >>> roll a Pig 0.12 release around the end of September or the >>> beginning of >>> October. Let me know if it is possible. >>> >>> Proposed schedule: >>> 1. Commit all major features (1-2 weeks) >>> 2. Branching Pig 0.12 >>> 3. Commit remaining patches (1-2 weeks) >>> 4. Wrapping up, document (1 week) >>> >>> If you have patches want to get in, please make sure the Jira >>> ticket > has >>> fix version set to 0.12. If the patches originally set to 0.12 >> and you >>> think you can delay, please mark the fix version to either 0.13.0 >>> or >>> 0.12.1. >>> >>> Thanks, >>> Daniel >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or > entity to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If >> the > reader >>> of this message is not the intended recipient, you are hereby notified > that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you >>> have >>> received this communication in error, please contact the sender > immediately >>> and delete it from your system. Thank You. >>> > >>> >>> >>> >>> -- >>> >>> Jeremy Karn / Lead Developer >>> MORTAR DATA / 519 277 4391 / www.mortardata.com >>> >> >> >> >> -- >> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >> datasyndrome.com >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773507#comment-13773507 ] Aniket Mokashi commented on PIG-2672: - I have attached a patch that that adds 2 configuration parameters- cluster.cache.location and user.cache.location. Jars are copied to /a/b/c/checksum-jarname.jar where a, b, c are first 3 characters of the checksum. When a new jar is registered, checksum is calculated and we check whether a jar with same name/checksum exists in the cache. If yes, copy to hdfs is avoided. Permissions to write to cache is managed by HDFS permissions. Also, its not possible to overwrite a jar using this mechanism. If jar changes, its checksum will also change and it will be a new jar in the cache. Removal of old jars is manual step- admins/users can list jars under the cache location and remove the ones that are very old. Alternatively, you can delete all the jars in the cache or change jar cache location and cache will be repopulated by running jobs. If this approach looks reasonable, I can add few more tests. Comments welcome! > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Are we ready for Pig 0.12.0 release?
It would be nice if we can get these in too- pig-2672 and pig-3461. On Fri, Sep 20, 2013 at 3:42 PM, Prashant Kommireddi wrote: > Thanks Daniel. > > > On Fri, Sep 20, 2013 at 3:37 PM, Daniel Dai wrote: > > > I just committed PIG-3199. > > > > > > On Fri, Sep 20, 2013 at 3:06 PM, Prashant Kommireddi < > prash1...@gmail.com > > >wrote: > > > > > Can we get PIG-3199 in? It only exposes a few properties of LP > > (load/store > > > paths and funcs) via a wrapper > > > > > > > > > On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai > > wrote: > > > > > > > With regard to branching 0.12, I will try to commit PIG-2417 and > > Cheolsoo > > > > will probably commit PIG-3471. After that I will branch 0.12, > hopefully > > > > over the weekend. Anything I miss? > > > > > > > > Thanks, > > > > Daniel > > > > > > > > > > > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney > > > > wrote: > > > > > > > > > +1, I need PIG-2417 too. > > > > > > > > > > > > > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn > > > > wrote: > > > > > > > > > > > I have one JIRA > https://issues.apache.org/jira/browse/PIG-2417that > > > I > > > > > > would > > > > > > like to get into 0.12 because we've had a number of people ask us > > > about > > > > > > getting it committed back to Apache. However, if it looks like > too > > > > much > > > > > to > > > > > > review and get committed in the next week or two it could > probably > > be > > > > > > pushed off. > > > > > > > > > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into > > > 0.12. > > > > > > I'm going to double check the submitted patches today because I > > > think > > > > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken > > the > > > > > > currently submitted patches. > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi < > > > > prash1...@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > +1 for a 0.12 release. > > > > > > > > > > > > > > I have one outstanding JIRA > > > > > > https://issues.apache.org/jira/browse/PIG-3199 > > > > > > > . > > > > > > > Cheolsoo was fine with the patch (except for a typo which I > will > > > > > correct) > > > > > > > but wanted a second opinion. Can someone please take a look? > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho < > > > > jar...@apache.org > > > > > > > >wrote: > > > > > > > > > > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) > > > this > > > > > > week, > > > > > > > > to see if it can be included. > > > > > > > > > > > > > > > > Jarcec > > > > > > > > > > > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park > wrote: > > > > > > > > > +1. I will go through my jiras this week. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai < > > > > da...@hortonworks.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, All, > > > > > > > > > > It has been more than half a year since initial Pig 0.11 > > > > release. > > > > > > I'd > > > > > > > > like > > > > > > > > > > roll a Pig 0.12 release around the end of September or > the > > > > > > beginning > > > > > > > of > > > > > > > > > > October. Let me know if it is possible. > > > > > > > > > > > > > > > > > > > > Proposed schedule: > > > > > > > > > > 1. Commit all major features (1-2 weeks) > > > > > > > > > > 2. Branching Pig 0.12 > > > > > > > > > > 3. Commit remaining patches (1-2 weeks) > > > > > > > > > > 4. Wrapping up, document (1 week) > > > > > > > > > > > > > > > > > > > > If you have patches want to get in, please make sure the > > Jira > > > > > > ticket > > > > > > > > has > > > > > > > > > > fix version set to 0.12. If the patches originally set to > > > 0.12 > > > > > and > > > > > > > you > > > > > > > > > > think you can delay, please mark the fix version to > either > > > > 0.13.0 > > > > > > or > > > > > > > > > > 0.12.1. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Daniel > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > CONFIDENTIALITY NOTICE > > > > > > > > > > NOTICE: This message is intended for the use of the > > > individual > > > > or > > > > > > > > entity to > > > > > > > > > > which it is addressed and may contain information that is > > > > > > > confidential, > > > > > > > > > > privileged and exempt from disclosure under applicable > law. > > > If > > > > > the > > > > > > > > reader > > > > > > > > > > of this message is not the intended recipient, you are > > hereby > > > > > > > notified > > > > > > > > that > > > > > > > > > > any printing, copying, dissemination, distribution, > > > disclosure > > > > or > > > > > > > > > > forwarding of this communication is strictly prohibited. > If > > > you > > > > > > have > > > > > > > > > > received this communication in error, please contact the > > > sender > > > > > > > > immediately > > > > > > > > > > and delete it from your system. Than
Re: Are we ready for Pig 0.12.0 release?
Thanks Daniel. On Fri, Sep 20, 2013 at 3:37 PM, Daniel Dai wrote: > I just committed PIG-3199. > > > On Fri, Sep 20, 2013 at 3:06 PM, Prashant Kommireddi >wrote: > > > Can we get PIG-3199 in? It only exposes a few properties of LP > (load/store > > paths and funcs) via a wrapper > > > > > > On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai > wrote: > > > > > With regard to branching 0.12, I will try to commit PIG-2417 and > Cheolsoo > > > will probably commit PIG-3471. After that I will branch 0.12, hopefully > > > over the weekend. Anything I miss? > > > > > > Thanks, > > > Daniel > > > > > > > > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney > > > wrote: > > > > > > > +1, I need PIG-2417 too. > > > > > > > > > > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn > > > wrote: > > > > > > > > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417that > > I > > > > > would > > > > > like to get into 0.12 because we've had a number of people ask us > > about > > > > > getting it committed back to Apache. However, if it looks like too > > > much > > > > to > > > > > review and get committed in the next week or two it could probably > be > > > > > pushed off. > > > > > > > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into > > 0.12. > > > > > I'm going to double check the submitted patches today because I > > think > > > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken > the > > > > > currently submitted patches. > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi < > > > prash1...@gmail.com > > > > > >wrote: > > > > > > > > > > > +1 for a 0.12 release. > > > > > > > > > > > > I have one outstanding JIRA > > > > > https://issues.apache.org/jira/browse/PIG-3199 > > > > > > . > > > > > > Cheolsoo was fine with the patch (except for a typo which I will > > > > correct) > > > > > > but wanted a second opinion. Can someone please take a look? > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho < > > > jar...@apache.org > > > > > > >wrote: > > > > > > > > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) > > this > > > > > week, > > > > > > > to see if it can be included. > > > > > > > > > > > > > > Jarcec > > > > > > > > > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: > > > > > > > > +1. I will go through my jiras this week. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai < > > > da...@hortonworks.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, All, > > > > > > > > > It has been more than half a year since initial Pig 0.11 > > > release. > > > > > I'd > > > > > > > like > > > > > > > > > roll a Pig 0.12 release around the end of September or the > > > > > beginning > > > > > > of > > > > > > > > > October. Let me know if it is possible. > > > > > > > > > > > > > > > > > > Proposed schedule: > > > > > > > > > 1. Commit all major features (1-2 weeks) > > > > > > > > > 2. Branching Pig 0.12 > > > > > > > > > 3. Commit remaining patches (1-2 weeks) > > > > > > > > > 4. Wrapping up, document (1 week) > > > > > > > > > > > > > > > > > > If you have patches want to get in, please make sure the > Jira > > > > > ticket > > > > > > > has > > > > > > > > > fix version set to 0.12. If the patches originally set to > > 0.12 > > > > and > > > > > > you > > > > > > > > > think you can delay, please mark the fix version to either > > > 0.13.0 > > > > > or > > > > > > > > > 0.12.1. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Daniel > > > > > > > > > > > > > > > > > > -- > > > > > > > > > CONFIDENTIALITY NOTICE > > > > > > > > > NOTICE: This message is intended for the use of the > > individual > > > or > > > > > > > entity to > > > > > > > > > which it is addressed and may contain information that is > > > > > > confidential, > > > > > > > > > privileged and exempt from disclosure under applicable law. > > If > > > > the > > > > > > > reader > > > > > > > > > of this message is not the intended recipient, you are > hereby > > > > > > notified > > > > > > > that > > > > > > > > > any printing, copying, dissemination, distribution, > > disclosure > > > or > > > > > > > > > forwarding of this communication is strictly prohibited. If > > you > > > > > have > > > > > > > > > received this communication in error, please contact the > > sender > > > > > > > immediately > > > > > > > > > and delete it from your system. Thank You. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Jeremy Karn / Lead Developer > > > > > MORTAR DATA / 519 277 4391 / www.mortardata.com > > > > > > > > > > > > > > > > > > > > > -- > > > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > > > > datasyndrome.com > > > > > > > > > > -- > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity >
[jira] [Commented] (PIG-3448) Tez backend layout
[ https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773553#comment-13773553 ] Aniket Mokashi commented on PIG-3448: - +1. LGTM. Thanks for doing this! Minor: This is just the skeleton code to enable Tez implementation and there are no tests added, why do we allocate more memory for unittests now, is that because of Tez/Yarn? > Tez backend layout > -- > > Key: PIG-3448 > URL: https://issues.apache.org/jira/browse/PIG-3448 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch > > > Design the high-level layout of Tez backend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3199) Provide a method to retriever name of loader/storer in PigServer
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3199: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. > Provide a method to retriever name of loader/storer in PigServer > > > Key: PIG-3199 > URL: https://issues.apache.org/jira/browse/PIG-3199 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.10.0 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: PIG-3199_2.patch, PIG-3199.patch > > > LogicalPlan could be exposed to user in order for one to make validations > based on it. For eg, one could get Load/Store paths or other operators and be > able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773529#comment-13773529 ] Daniel Dai commented on PIG-3199: - Looks fine for me. We are not exposing LogicalPlan, but only loader/storer name in the new patch. I will commit the patch with Cheolsoo's suggested change shortly. > Expose LogicalPlan via PigServer API > > > Key: PIG-3199 > URL: https://issues.apache.org/jira/browse/PIG-3199 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.10.0 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: PIG-3199_2.patch, PIG-3199.patch > > > LogicalPlan could be exposed to user in order for one to make validations > based on it. For eg, one could get Load/Store paths or other operators and be > able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Are we ready for Pig 0.12.0 release?
I just committed PIG-3199. On Fri, Sep 20, 2013 at 3:06 PM, Prashant Kommireddi wrote: > Can we get PIG-3199 in? It only exposes a few properties of LP (load/store > paths and funcs) via a wrapper > > > On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai wrote: > > > With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo > > will probably commit PIG-3471. After that I will branch 0.12, hopefully > > over the weekend. Anything I miss? > > > > Thanks, > > Daniel > > > > > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney > > wrote: > > > > > +1, I need PIG-2417 too. > > > > > > > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn > > wrote: > > > > > > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that > I > > > > would > > > > like to get into 0.12 because we've had a number of people ask us > about > > > > getting it committed back to Apache. However, if it looks like too > > much > > > to > > > > review and get committed in the next week or two it could probably be > > > > pushed off. > > > > > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into > 0.12. > > > > I'm going to double check the submitted patches today because I > think > > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken the > > > > currently submitted patches. > > > > > > > > > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi < > > prash1...@gmail.com > > > > >wrote: > > > > > > > > > +1 for a 0.12 release. > > > > > > > > > > I have one outstanding JIRA > > > > https://issues.apache.org/jira/browse/PIG-3199 > > > > > . > > > > > Cheolsoo was fine with the patch (except for a typo which I will > > > correct) > > > > > but wanted a second opinion. Can someone please take a look? > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho < > > jar...@apache.org > > > > > >wrote: > > > > > > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) > this > > > > week, > > > > > > to see if it can be included. > > > > > > > > > > > > Jarcec > > > > > > > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: > > > > > > > +1. I will go through my jiras this week. > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai < > > da...@hortonworks.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, All, > > > > > > > > It has been more than half a year since initial Pig 0.11 > > release. > > > > I'd > > > > > > like > > > > > > > > roll a Pig 0.12 release around the end of September or the > > > > beginning > > > > > of > > > > > > > > October. Let me know if it is possible. > > > > > > > > > > > > > > > > Proposed schedule: > > > > > > > > 1. Commit all major features (1-2 weeks) > > > > > > > > 2. Branching Pig 0.12 > > > > > > > > 3. Commit remaining patches (1-2 weeks) > > > > > > > > 4. Wrapping up, document (1 week) > > > > > > > > > > > > > > > > If you have patches want to get in, please make sure the Jira > > > > ticket > > > > > > has > > > > > > > > fix version set to 0.12. If the patches originally set to > 0.12 > > > and > > > > > you > > > > > > > > think you can delay, please mark the fix version to either > > 0.13.0 > > > > or > > > > > > > > 0.12.1. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Daniel > > > > > > > > > > > > > > > > -- > > > > > > > > CONFIDENTIALITY NOTICE > > > > > > > > NOTICE: This message is intended for the use of the > individual > > or > > > > > > entity to > > > > > > > > which it is addressed and may contain information that is > > > > > confidential, > > > > > > > > privileged and exempt from disclosure under applicable law. > If > > > the > > > > > > reader > > > > > > > > of this message is not the intended recipient, you are hereby > > > > > notified > > > > > > that > > > > > > > > any printing, copying, dissemination, distribution, > disclosure > > or > > > > > > > > forwarding of this communication is strictly prohibited. If > you > > > > have > > > > > > > > received this communication in error, please contact the > sender > > > > > > immediately > > > > > > > > and delete it from your system. Thank You. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jeremy Karn / Lead Developer > > > > MORTAR DATA / 519 277 4391 / www.mortardata.com > > > > > > > > > > > > > > > > -- > > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > > > datasyndrome.com > > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this c
[jira] [Updated] (PIG-3199) Provide a method to retriever name of loader/storer in PigServer
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3199: Summary: Provide a method to retriever name of loader/storer in PigServer (was: Expose LogicalPlan via PigServer API) > Provide a method to retriever name of loader/storer in PigServer > > > Key: PIG-3199 > URL: https://issues.apache.org/jira/browse/PIG-3199 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.10.0 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: PIG-3199_2.patch, PIG-3199.patch > > > LogicalPlan could be exposed to user in order for one to make validations > based on it. For eg, one could get Load/Store paths or other operators and be > able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-2672: Attachment: PIG-2672.patch > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-2672: Status: Patch Available (was: Open) > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Are we ready for Pig 0.12.0 release?
Can we get PIG-3199 in? It only exposes a few properties of LP (load/store paths and funcs) via a wrapper On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai wrote: > With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo > will probably commit PIG-3471. After that I will branch 0.12, hopefully > over the weekend. Anything I miss? > > Thanks, > Daniel > > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney > wrote: > > > +1, I need PIG-2417 too. > > > > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn > wrote: > > > > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I > > > would > > > like to get into 0.12 because we've had a number of people ask us about > > > getting it committed back to Apache. However, if it looks like too > much > > to > > > review and get committed in the next week or two it could probably be > > > pushed off. > > > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12. > > > I'm going to double check the submitted patches today because I think > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken the > > > currently submitted patches. > > > > > > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi < > prash1...@gmail.com > > > >wrote: > > > > > > > +1 for a 0.12 release. > > > > > > > > I have one outstanding JIRA > > > https://issues.apache.org/jira/browse/PIG-3199 > > > > . > > > > Cheolsoo was fine with the patch (except for a typo which I will > > correct) > > > > but wanted a second opinion. Can someone please take a look? > > > > > > > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho < > jar...@apache.org > > > > >wrote: > > > > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this > > > week, > > > > > to see if it can be included. > > > > > > > > > > Jarcec > > > > > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: > > > > > > +1. I will go through my jiras this week. > > > > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai < > da...@hortonworks.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi, All, > > > > > > > It has been more than half a year since initial Pig 0.11 > release. > > > I'd > > > > > like > > > > > > > roll a Pig 0.12 release around the end of September or the > > > beginning > > > > of > > > > > > > October. Let me know if it is possible. > > > > > > > > > > > > > > Proposed schedule: > > > > > > > 1. Commit all major features (1-2 weeks) > > > > > > > 2. Branching Pig 0.12 > > > > > > > 3. Commit remaining patches (1-2 weeks) > > > > > > > 4. Wrapping up, document (1 week) > > > > > > > > > > > > > > If you have patches want to get in, please make sure the Jira > > > ticket > > > > > has > > > > > > > fix version set to 0.12. If the patches originally set to 0.12 > > and > > > > you > > > > > > > think you can delay, please mark the fix version to either > 0.13.0 > > > or > > > > > > > 0.12.1. > > > > > > > > > > > > > > Thanks, > > > > > > > Daniel > > > > > > > > > > > > > > -- > > > > > > > CONFIDENTIALITY NOTICE > > > > > > > NOTICE: This message is intended for the use of the individual > or > > > > > entity to > > > > > > > which it is addressed and may contain information that is > > > > confidential, > > > > > > > privileged and exempt from disclosure under applicable law. If > > the > > > > > reader > > > > > > > of this message is not the intended recipient, you are hereby > > > > notified > > > > > that > > > > > > > any printing, copying, dissemination, distribution, disclosure > or > > > > > > > forwarding of this communication is strictly prohibited. If you > > > have > > > > > > > received this communication in error, please contact the sender > > > > > immediately > > > > > > > and delete it from your system. Thank You. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Jeremy Karn / Lead Developer > > > MORTAR DATA / 519 277 4391 / www.mortardata.com > > > > > > > > > > > -- > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > > datasyndrome.com > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
Re: Are we ready for Pig 0.12.0 release?
With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo will probably commit PIG-3471. After that I will branch 0.12, hopefully over the weekend. Anything I miss? Thanks, Daniel On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney wrote: > +1, I need PIG-2417 too. > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn wrote: > > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I > > would > > like to get into 0.12 because we've had a number of people ask us about > > getting it committed back to Apache. However, if it looks like too much > to > > review and get committed in the next week or two it could probably be > > pushed off. > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12. > > I'm going to double check the submitted patches today because I think > > https://issues.apache.org/jira/browse/PIG-3419 might have broken the > > currently submitted patches. > > > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi > >wrote: > > > > > +1 for a 0.12 release. > > > > > > I have one outstanding JIRA > > https://issues.apache.org/jira/browse/PIG-3199 > > > . > > > Cheolsoo was fine with the patch (except for a typo which I will > correct) > > > but wanted a second opinion. Can someone please take a look? > > > > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho > > >wrote: > > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this > > week, > > > > to see if it can be included. > > > > > > > > Jarcec > > > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote: > > > > > +1. I will go through my jiras this week. > > > > > > > > > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai > > > > > wrote: > > > > > > > > > > > Hi, All, > > > > > > It has been more than half a year since initial Pig 0.11 release. > > I'd > > > > like > > > > > > roll a Pig 0.12 release around the end of September or the > > beginning > > > of > > > > > > October. Let me know if it is possible. > > > > > > > > > > > > Proposed schedule: > > > > > > 1. Commit all major features (1-2 weeks) > > > > > > 2. Branching Pig 0.12 > > > > > > 3. Commit remaining patches (1-2 weeks) > > > > > > 4. Wrapping up, document (1 week) > > > > > > > > > > > > If you have patches want to get in, please make sure the Jira > > ticket > > > > has > > > > > > fix version set to 0.12. If the patches originally set to 0.12 > and > > > you > > > > > > think you can delay, please mark the fix version to either 0.13.0 > > or > > > > > > 0.12.1. > > > > > > > > > > > > Thanks, > > > > > > Daniel > > > > > > > > > > > > -- > > > > > > CONFIDENTIALITY NOTICE > > > > > > NOTICE: This message is intended for the use of the individual or > > > > entity to > > > > > > which it is addressed and may contain information that is > > > confidential, > > > > > > privileged and exempt from disclosure under applicable law. If > the > > > > reader > > > > > > of this message is not the intended recipient, you are hereby > > > notified > > > > that > > > > > > any printing, copying, dissemination, distribution, disclosure or > > > > > > forwarding of this communication is strictly prohibited. If you > > have > > > > > > received this communication in error, please contact the sender > > > > immediately > > > > > > and delete it from your system. Thank You. > > > > > > > > > > > > > > > > > > > > > -- > > > > Jeremy Karn / Lead Developer > > MORTAR DATA / 519 277 4391 / www.mortardata.com > > > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > datasyndrome.com > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (PIG-3472) Pig should avoid replicated join if size is greater than configured limit
Aniket Mokashi created PIG-3472: --- Summary: Pig should avoid replicated join if size is greater than configured limit Key: PIG-3472 URL: https://issues.apache.org/jira/browse/PIG-3472 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.11.1 Reporter: Aniket Mokashi Fix For: 0.12 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773228#comment-13773228 ] Daniel Dai commented on PIG-2417: - There is one more issue in the Hadoop 2: job.jar does not get unjared before launching map/reduce, so controller.py cannot find the udf script. Seems we need one more step to unjar script files before invoking controller.py. I'd like to commit this patch before we branch 0.12. There still several holes to get stream udf work under Hadoop2, I would suggest commit the patch first, mark e2e tests as not valid in hadoop 2, then fix them after branch. Thoughts? > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12 >Reporter: Jeremy Karn > Fix For: 0.12 > > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, > PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-9-1.patch, PIG-2417-9-2.patch, > PIG-2417-9.patch, PIG-2417-e2e.patch, streaming2.patch, streaming3.patch, > streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773168#comment-13773168 ] Cheolsoo Park edited comment on PIG-3471 at 9/20/13 4:51 PM: - [~daijy], I was thinking of committing it to trunk too. I will do it. Thank you! was (Author: cheolsoo): [~daniel dai], yes I was thinking of committing to trunk too. I will do it. Thank you! > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773168#comment-13773168 ] Cheolsoo Park commented on PIG-3471: [~daniel dai], yes I was thinking of committing to trunk too. I will do it. Thank you! > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773151#comment-13773151 ] Daniel Dai commented on PIG-3471: - Does it only goes to Tez branch? Sounds like a general restructure follows up PIG-3419. > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3448) Tez backend layout
[ https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3448: --- Attachment: PIG-3448-3.patch Minor clean ups. Note the latest patch depends on PIG-3471. > Tez backend layout > -- > > Key: PIG-3448 > URL: https://issues.apache.org/jira/browse/PIG-3448 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch > > > Design the high-level layout of Tez backend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3367: Resolution: Fixed Status: Resolved (was: Patch Available) > Add assert keyword (operator) in pig > > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3367-2.patch, PIG-3367.patch > > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773118#comment-13773118 ] Aniket Mokashi commented on PIG-3367: - Committed to trunk. Thanks Julien for the review! > Add assert keyword (operator) in pig > > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3367-2.patch, PIG-3367.patch > > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773148#comment-13773148 ] Daniel Dai commented on PIG-3471: - +1 > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3471: --- Attachment: PIG-3471-1.patch Attached includes the following changes: # Adds HExecutionEngine abstract class to o.a.p.backend.hadoop.executionengine. # Moves MRExecutionEngine from o.a.p.backend.hadoop.executionengine to o.a.p.backend.hadoop.executionengine.mapReduceLayer. # Converts some utility functions to public static functions in Utils.java. > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine
[ https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3471: --- Status: Patch Available (was: Open) > Add a base abstract class for ExecutionEngine > - > > Key: PIG-3471 > URL: https://issues.apache.org/jira/browse/PIG-3471 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3471-1.patch > > > While implementing TezExecutionEngine, I realized that a lot of code can be > shared between MRExecutionEngine and TezExecutionEngine because both use the > common Hadoop framework (hdfs, resource manager, etc). So it would make sense > to create a base abstract class for them (called HExecutionEngine) and have > them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3471) Add a base abstract class for ExecutionEngine
Cheolsoo Park created PIG-3471: -- Summary: Add a base abstract class for ExecutionEngine Key: PIG-3471 URL: https://issues.apache.org/jira/browse/PIG-3471 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: tez-branch While implementing TezExecutionEngine, I realized that a lot of code can be shared between MRExecutionEngine and TezExecutionEngine because both use the common Hadoop framework (hdfs, resource manager, etc). So it would make sense to create a base abstract class for them (called HExecutionEngine) and have them inherit common methods and fields from it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira