[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773742#comment-13773742
 ] 

Koji Noguchi commented on PIG-2672:
---

bq. Note: HADOOP-9639 has improved mechanism for this. 

I haven't read the patch but I thought HADOOP-9639 introduces a security hole 
unless NodeManager does the SHA-1 level checksumming.


> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773665#comment-13773665
 ] 

Aniket Mokashi commented on PIG-2672:
-

Oh, actually I just noticed, the config names are - 
pig.shared.cluster.cache.location, pig.shared.user.cache.location.

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773663#comment-13773663
 ] 

Aniket Mokashi commented on PIG-2672:
-

Thanks Dmitriy! I will make those changes.

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773662#comment-13773662
 ] 

Aniket Mokashi commented on PIG-2672:
-

RB: https://reviews.apache.org/r/14274/

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 14274: PIG-2672 Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14274/
---

Review request for pig, Cheolsoo Park, DanielWX DanielWX, Dmitriy Ryaboy, 
Julien Le Dem, and Rohini Palaniswamy.


Bugs: PIG-2672
https://issues.apache.org/jira/browse/PIG-2672


Repository: pig


Description
---

added jar.cache.location option


Diffs
-

  trunk/src/org/apache/pig/PigConstants.java 1525188 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1525188 
  trunk/src/org/apache/pig/impl/PigContext.java 1525188 
  trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1525188 
  trunk/test/org/apache/pig/test/TestJobControlCompiler.java 1525188 

Diff: https://reviews.apache.org/r/14274/diff/


Testing
---


Thanks,

Aniket Mokashi



[jira] Subscription: PIG patch available

2013-09-20 Thread jira
Issue Subscription
Filter: PIG patch available (14 issues)

Subscriber: pigdaily

Key Summary
PIG-3470Print configuration variables in grunt
https://issues.apache.org/jira/browse/PIG-3470
PIG-3461Rewrite PartitionFilterOptimizer to make it work for all the cases
https://issues.apache.org/jira/browse/PIG-3461
PIG-3451EvalFunc ctor reflection to determine value of type param T is 
brittle
https://issues.apache.org/jira/browse/PIG-3451
PIG-3449Move JobCreationException to 
org.apache.pig.backend.hadoop.executionengine
https://issues.apache.org/jira/browse/PIG-3449
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3434Null subexpression in bincond nullifies outer tuple (or bag)
https://issues.apache.org/jira/browse/PIG-3434
PIG-3388No support for Regex for row filter in 
org.apache.pig.backend.hadoop.hbase.HBaseStorage
https://issues.apache.org/jira/browse/PIG-3388
PIG-3325Adding a tuple to a bag is slow
https://issues.apache.org/jira/browse/PIG-3325
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021
PIG-2672Optimize the use of DistributedCache
https://issues.apache.org/jira/browse/PIG-2672
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773636#comment-13773636
 ] 

Dmitriy V. Ryaboy commented on PIG-2672:


Aniket, can we prefix the properties with "pig."? That way we won't conflict 
with potential properties from Hadoop, and it's a little easier to analyze 
stuff when looking at the jobconf.

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.

2013-09-20 Thread Jeremy Karn (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773632#comment-13773632
 ] 

Jeremy Karn commented on PIG-2417:
--

I agree.  

If we get this committed and open a new jira for the hadoop2 problems, that'll 
give me a bit of time to set up a hadoop2 cluster and work out any kinks.

> Streaming UDFs -  allow users to easily write UDFs in scripting languages 
> with no JVM implementation.
> -
>
> Key: PIG-2417
> URL: https://issues.apache.org/jira/browse/PIG-2417
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.12
>Reporter: Jeremy Karn
> Fix For: 0.12
>
> Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, 
> PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-9-1.patch, PIG-2417-9-2.patch, 
> PIG-2417-9.patch, PIG-2417-e2e.patch, streaming2.patch, streaming3.patch, 
> streaming.patch
>
>
> The goal of Streaming UDFs is to allow users to easily write UDFs in 
> scripting languages with no JVM implementation or a limited JVM 
> implementation.  The initial proposal is outlined here: 
> https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.
> In order to implement this we need new syntax to distinguish a streaming UDF 
> from an embedded JVM UDF.  I'd propose something like the following (although 
> I'm not sure 'language' is the best term to be using):
> {code}define my_streaming_udfs language('python') 
> ship('my_streaming_udfs.py'){code}
> We'll also need a language-specific controller script that gets shipped to 
> the cluster which is responsible for reading the input stream, deserializing 
> the input data, passing it to the user written script, serializing that 
> script output, and writing that to the output stream.
> Finally, we'll need to add a StreamingUDF class that extends evalFunc.  This 
> class will likely share some of the existing code in POStream and 
> ExecutableManager (where it make sense to pull out shared code) to stream 
> data to/from the controller script.
> One alternative approach to creating the StreamingUDF EvalFunc is to use the 
> POStream operator directly.  This would involve inserting the POStream 
> operator instead of the POUserFunc operator whenever we encountered a 
> streaming UDF while building the physical plan.  This approach seemed 
> problematic because there would need to be a lot of changes in order to 
> support POStream in all of the places we want to be able use UDFs (For 
> example - to operate on a single field inside of a for each statement).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3473) org.apache.pig.Expression should support "is null" and "not" operations

2013-09-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3473:


Description: 
Currently Expression only support BinaryExpressions and Constants. Most of the 
other logical expressions (cast, udf) need not be pushed down. But, it would 
make sense to be able to pushdown is null and not operations (possibly 
negativeexpression).
This change would have impact on LoadFunc's (hcatloader), we need to be careful 
and make sure we do this in a backwards compatible way.

> org.apache.pig.Expression should support "is null" and "not" operations
> ---
>
> Key: PIG-3473
> URL: https://issues.apache.org/jira/browse/PIG-3473
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Aniket Mokashi
> Fix For: 0.12.1
>
>
> Currently Expression only support BinaryExpressions and Constants. Most of 
> the other logical expressions (cast, udf) need not be pushed down. But, it 
> would make sense to be able to pushdown is null and not operations (possibly 
> negativeexpression).
> This change would have impact on LoadFunc's (hcatloader), we need to be 
> careful and make sure we do this in a backwards compatible way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3473) org.apache.pig.Expression should support "is null" and "not" operations

2013-09-20 Thread Aniket Mokashi (JIRA)
Aniket Mokashi created PIG-3473:
---

 Summary: org.apache.pig.Expression should support "is null" and 
"not" operations
 Key: PIG-3473
 URL: https://issues.apache.org/jira/browse/PIG-3473
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: Aniket Mokashi
 Fix For: 0.12.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Erik Selin
wooo :D 

On 2013-09-20, at 19:32 , Daniel Dai  wrote:

> PIG-3454 is already committed several days back.
> 
> 
> On Fri, Sep 20, 2013 at 4:29 PM, Erik Selin  wrote:
> 
>> Could we get PIG-3454 as well?
>> 
>> Thanks :)
>> 
>> Erik
>> 
>> On 2013-09-20, at 18:46 , Julien Le Dem  wrote:
>> 
>>> I'd like to get PIG-3445 in too.
>>> Julien
>>> 
>>> On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote:
>>> 
 With regard to branching 0.12, I will try to commit PIG-2417 and
>> Cheolsoo
 will probably commit PIG-3471. After that I will branch 0.12, hopefully
 over the weekend. Anything I miss?
 
 Thanks,
 Daniel
 
 
 On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
 wrote:
 
> +1, I need PIG-2417 too.
> 
> 
> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn 
>> wrote:
> 
>> I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I
>> would
>> like to get into 0.12 because we've had a number of people ask us
>> about
>> getting it committed back to Apache.  However, if it looks like too
>> much
> to
>> review and get committed in the next week or two it could probably be
>> pushed off.
>> 
>> I also have 3 small jiras (3426, 3430, 3431) I'd like to get into
>> 0.12.
>> I'm going to double check the submitted patches today because I think
>> https://issues.apache.org/jira/browse/PIG-3419 might have broken the
>> currently submitted patches.
>> 
>> 
>> On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi <
>> prash1...@gmail.com
>>> wrote:
>> 
>>> +1 for a 0.12 release.
>>> 
>>> I have one outstanding JIRA
>> https://issues.apache.org/jira/browse/PIG-3199
>>> .
>>> Cheolsoo was fine with the patch (except for a typo which I will
> correct)
>>> but wanted a second opinion. Can someone please take a look?
>>> 
>>> 
>>> On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho <
>> jar...@apache.org
 wrote:
>>> 
 I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this
>> week,
 to see if it can be included.
 
 Jarcec
 
 On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
> +1. I will go through my jiras this week.
> 
> 
> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai > 
 wrote:
> 
>> Hi, All,
>> It has been more than half a year since initial Pig 0.11 release.
>> I'd
 like
>> roll a Pig 0.12 release around the end of September or the
>> beginning
>>> of
>> October. Let me know if it is possible.
>> 
>> Proposed schedule:
>> 1. Commit all major features (1-2 weeks)
>> 2. Branching Pig 0.12
>> 3. Commit remaining patches (1-2 weeks)
>> 4. Wrapping up, document (1 week)
>> 
>> If you have patches want to get in, please make sure the Jira
>> ticket
 has
>> fix version set to 0.12. If the patches originally set to 0.12
> and
>>> you
>> think you can delay, please mark the fix version to either 0.13.0
>> or
>> 0.12.1.
>> 
>> Thanks,
>> Daniel
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
 entity to
>> which it is addressed and may contain information that is
>>> confidential,
>> privileged and exempt from disclosure under applicable law. If
> the
 reader
>> of this message is not the intended recipient, you are hereby
>>> notified
 that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you
>> have
>> received this communication in error, please contact the sender
 immediately
>> and delete it from your system. Thank You.
>> 
 
>>> 
>> 
>> 
>> 
>> --
>> 
>> Jeremy Karn / Lead Developer
>> MORTAR DATA / 519 277 4391 / www.mortardata.com
>> 
> 
> 
> 
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> datasyndrome.com
> 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
>> entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the
>> reader
 of this message is not the intended recipient, you are hereby notified
>> that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
>> immediately
 and delete it from your system. Thank You.
>>> 
>> 
>> 
> 
> -- 

Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Daniel Dai
PIG-3454 is already committed several days back.


On Fri, Sep 20, 2013 at 4:29 PM, Erik Selin  wrote:

> Could we get PIG-3454 as well?
>
> Thanks :)
>
> Erik
>
> On 2013-09-20, at 18:46 , Julien Le Dem  wrote:
>
> > I'd like to get PIG-3445 in too.
> > Julien
> >
> > On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote:
> >
> >> With regard to branching 0.12, I will try to commit PIG-2417 and
> Cheolsoo
> >> will probably commit PIG-3471. After that I will branch 0.12, hopefully
> >> over the weekend. Anything I miss?
> >>
> >> Thanks,
> >> Daniel
> >>
> >>
> >> On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
> >> wrote:
> >>
> >>> +1, I need PIG-2417 too.
> >>>
> >>>
> >>> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn 
> wrote:
> >>>
>  I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I
>  would
>  like to get into 0.12 because we've had a number of people ask us
> about
>  getting it committed back to Apache.  However, if it looks like too
> much
> >>> to
>  review and get committed in the next week or two it could probably be
>  pushed off.
> 
>  I also have 3 small jiras (3426, 3430, 3431) I'd like to get into
> 0.12.
>  I'm going to double check the submitted patches today because I think
>  https://issues.apache.org/jira/browse/PIG-3419 might have broken the
>  currently submitted patches.
> 
> 
>  On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi <
> prash1...@gmail.com
> > wrote:
> 
> > +1 for a 0.12 release.
> >
> > I have one outstanding JIRA
>  https://issues.apache.org/jira/browse/PIG-3199
> > .
> > Cheolsoo was fine with the patch (except for a typo which I will
> >>> correct)
> > but wanted a second opinion. Can someone please take a look?
> >
> >
> > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho <
> jar...@apache.org
> >> wrote:
> >
> >> I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this
>  week,
> >> to see if it can be included.
> >>
> >> Jarcec
> >>
> >> On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
> >>> +1. I will go through my jiras this week.
> >>>
> >>>
> >>> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai  
> >> wrote:
> >>>
>  Hi, All,
>  It has been more than half a year since initial Pig 0.11 release.
>  I'd
> >> like
>  roll a Pig 0.12 release around the end of September or the
>  beginning
> > of
>  October. Let me know if it is possible.
> 
>  Proposed schedule:
>  1. Commit all major features (1-2 weeks)
>  2. Branching Pig 0.12
>  3. Commit remaining patches (1-2 weeks)
>  4. Wrapping up, document (1 week)
> 
>  If you have patches want to get in, please make sure the Jira
>  ticket
> >> has
>  fix version set to 0.12. If the patches originally set to 0.12
> >>> and
> > you
>  think you can delay, please mark the fix version to either 0.13.0
>  or
>  0.12.1.
> 
>  Thanks,
>  Daniel
> 
>  --
>  CONFIDENTIALITY NOTICE
>  NOTICE: This message is intended for the use of the individual or
> >> entity to
>  which it is addressed and may contain information that is
> > confidential,
>  privileged and exempt from disclosure under applicable law. If
> >>> the
> >> reader
>  of this message is not the intended recipient, you are hereby
> > notified
> >> that
>  any printing, copying, dissemination, distribution, disclosure or
>  forwarding of this communication is strictly prohibited. If you
>  have
>  received this communication in error, please contact the sender
> >> immediately
>  and delete it from your system. Thank You.
> 
> >>
> >
> 
> 
> 
>  --
> 
>  Jeremy Karn / Lead Developer
>  MORTAR DATA / 519 277 4391 / www.mortardata.com
> 
> >>>
> >>>
> >>>
> >>> --
> >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> >>> datasyndrome.com
> >>>
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain 

[jira] [Updated] (PIG-3448) Tez backend layout

2013-09-20 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3448:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to tez branch.

> Tez backend layout
> --
>
> Key: PIG-3448
> URL: https://issues.apache.org/jira/browse/PIG-3448
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch
>
>
> Design the high-level layout of Tez backend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Erik Selin
Could we get PIG-3454 as well?

Thanks :)

Erik

On 2013-09-20, at 18:46 , Julien Le Dem  wrote:

> I'd like to get PIG-3445 in too.
> Julien
> 
> On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote:
> 
>> With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo
>> will probably commit PIG-3471. After that I will branch 0.12, hopefully
>> over the weekend. Anything I miss?
>> 
>> Thanks,
>> Daniel
>> 
>> 
>> On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
>> wrote:
>> 
>>> +1, I need PIG-2417 too.
>>> 
>>> 
>>> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn  wrote:
>>> 
 I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I
 would
 like to get into 0.12 because we've had a number of people ask us about
 getting it committed back to Apache.  However, if it looks like too much
>>> to
 review and get committed in the next week or two it could probably be
 pushed off.
 
 I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12.
 I'm going to double check the submitted patches today because I think
 https://issues.apache.org/jira/browse/PIG-3419 might have broken the
 currently submitted patches.
 
 
 On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi  wrote:
 
> +1 for a 0.12 release.
> 
> I have one outstanding JIRA
 https://issues.apache.org/jira/browse/PIG-3199
> .
> Cheolsoo was fine with the patch (except for a typo which I will
>>> correct)
> but wanted a second opinion. Can someone please take a look?
> 
> 
> On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho > wrote:
> 
>> I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this
 week,
>> to see if it can be included.
>> 
>> Jarcec
>> 
>> On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
>>> +1. I will go through my jiras this week.
>>> 
>>> 
>>> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai >>> 
>> wrote:
>>> 
 Hi, All,
 It has been more than half a year since initial Pig 0.11 release.
 I'd
>> like
 roll a Pig 0.12 release around the end of September or the
 beginning
> of
 October. Let me know if it is possible.
 
 Proposed schedule:
 1. Commit all major features (1-2 weeks)
 2. Branching Pig 0.12
 3. Commit remaining patches (1-2 weeks)
 4. Wrapping up, document (1 week)
 
 If you have patches want to get in, please make sure the Jira
 ticket
>> has
 fix version set to 0.12. If the patches originally set to 0.12
>>> and
> you
 think you can delay, please mark the fix version to either 0.13.0
 or
 0.12.1.
 
 Thanks,
 Daniel
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
>> entity to
 which it is addressed and may contain information that is
> confidential,
 privileged and exempt from disclosure under applicable law. If
>>> the
>> reader
 of this message is not the intended recipient, you are hereby
> notified
>> that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you
 have
 received this communication in error, please contact the sender
>> immediately
 and delete it from your system. Thank You.
 
>> 
> 
 
 
 
 --
 
 Jeremy Karn / Lead Developer
 MORTAR DATA / 519 277 4391 / www.mortardata.com
 
>>> 
>>> 
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
>>> datasyndrome.com
>>> 
>> 
>> -- 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to 
>> which it is addressed and may contain information that is confidential, 
>> privileged and exempt from disclosure under applicable law. If the reader 
>> of this message is not the intended recipient, you are hereby notified that 
>> any printing, copying, dissemination, distribution, disclosure or 
>> forwarding of this communication is strictly prohibited. If you have 
>> received this communication in error, please contact the sender immediately 
>> and delete it from your system. Thank You.
> 



[jira] [Commented] (PIG-3448) Tez backend layout

2013-09-20 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773571#comment-13773571
 ] 

Cheolsoo Park commented on PIG-3448:


[~aniket486], thank you for taking a look.

Regarding your comment on JVM heap size on unit test, I just found some unit 
test cases (e.g. TestPigServer) fail with OOM when running with 
Hadoop-2.1.0-beta so increased the heap size to keep my jenkins build happy for 
now.

> Tez backend layout
> --
>
> Key: PIG-3448
> URL: https://issues.apache.org/jira/browse/PIG-3448
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch
>
>
> Design the high-level layout of Tez backend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3446) Umbrella jira for Pig on Tez

2013-09-20 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773569#comment-13773569
 ] 

Julien Le Dem commented on PIG-3446:


Here is the work that Achal did for Pig-on-Tez
https://github.com/achalsoni81/pigeon

> Umbrella jira for Pig on Tez
> 
>
> Key: PIG-3446
> URL: https://issues.apache.org/jira/browse/PIG-3446
> Project: Pig
>  Issue Type: New Feature
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
>
> This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.
> More information can be found on the following wiki page:
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3471:
---

Affects Version/s: 0.12
Fix Version/s: 0.12

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: 0.12, tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.12, tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3471:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773560#comment-13773560
 ] 

Rohini Palaniswamy commented on PIG-2672:
-

I can take a look at this one. Can you put this up in review board please?

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773513#comment-13773513
 ] 

Aniket Mokashi commented on PIG-2672:
-

Note: HADOOP-9639 has improved mechanism for this. However, this is still 
somewhat useful for users that are on old versions of hadoop.

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi reassigned PIG-2672:
---

Assignee: Aniket Mokashi  (was: Rohini Palaniswamy)

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Julien Le Dem
I'd like to get PIG-3445 in too.
Julien

On Sep 20, 2013, at 3:03 PM, Daniel Dai wrote:

> With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo
> will probably commit PIG-3471. After that I will branch 0.12, hopefully
> over the weekend. Anything I miss?
> 
> Thanks,
> Daniel
> 
> 
> On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
> wrote:
> 
>> +1, I need PIG-2417 too.
>> 
>> 
>> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn  wrote:
>> 
>>> I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I
>>> would
>>> like to get into 0.12 because we've had a number of people ask us about
>>> getting it committed back to Apache.  However, if it looks like too much
>> to
>>> review and get committed in the next week or two it could probably be
>>> pushed off.
>>> 
>>> I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12.
>>> I'm going to double check the submitted patches today because I think
>>> https://issues.apache.org/jira/browse/PIG-3419 might have broken the
>>> currently submitted patches.
>>> 
>>> 
>>> On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi >>> wrote:
>>> 
 +1 for a 0.12 release.
 
 I have one outstanding JIRA
>>> https://issues.apache.org/jira/browse/PIG-3199
 .
 Cheolsoo was fine with the patch (except for a typo which I will
>> correct)
 but wanted a second opinion. Can someone please take a look?
 
 
 On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho  wrote:
 
> I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this
>>> week,
> to see if it can be included.
> 
> Jarcec
> 
> On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
>> +1. I will go through my jiras this week.
>> 
>> 
>> On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai >> 
> wrote:
>> 
>>> Hi, All,
>>> It has been more than half a year since initial Pig 0.11 release.
>>> I'd
> like
>>> roll a Pig 0.12 release around the end of September or the
>>> beginning
 of
>>> October. Let me know if it is possible.
>>> 
>>> Proposed schedule:
>>> 1. Commit all major features (1-2 weeks)
>>> 2. Branching Pig 0.12
>>> 3. Commit remaining patches (1-2 weeks)
>>> 4. Wrapping up, document (1 week)
>>> 
>>> If you have patches want to get in, please make sure the Jira
>>> ticket
> has
>>> fix version set to 0.12. If the patches originally set to 0.12
>> and
 you
>>> think you can delay, please mark the fix version to either 0.13.0
>>> or
>>> 0.12.1.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
> entity to
>>> which it is addressed and may contain information that is
 confidential,
>>> privileged and exempt from disclosure under applicable law. If
>> the
> reader
>>> of this message is not the intended recipient, you are hereby
 notified
> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you
>>> have
>>> received this communication in error, please contact the sender
> immediately
>>> and delete it from your system. Thank You.
>>> 
> 
 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Jeremy Karn / Lead Developer
>>> MORTAR DATA / 519 277 4391 / www.mortardata.com
>>> 
>> 
>> 
>> 
>> --
>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
>> datasyndrome.com
>> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.



[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773507#comment-13773507
 ] 

Aniket Mokashi commented on PIG-2672:
-

I have attached a patch that that adds 2 configuration parameters- 
cluster.cache.location and user.cache.location.

Jars are copied to /a/b/c/checksum-jarname.jar where a, b, c 
are first 3 characters of the checksum. When a new jar is registered, checksum 
is calculated and we check whether a jar with same name/checksum exists in the 
cache. If yes, copy to hdfs is avoided.

Permissions to write to cache is managed by HDFS permissions. Also, its not 
possible to overwrite a jar using this mechanism. If jar changes, its checksum 
will also change and it will be a new jar in the cache. Removal of old jars is 
manual step- admins/users can list jars under the cache location and remove the 
ones that are very old. Alternatively, you can delete all the jars in the cache 
or change jar cache location and cache will be repopulated by running jobs.

If this approach looks reasonable, I can add few more tests. Comments welcome!

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Aniket Mokashi
It would be nice if we can get these in too- pig-2672 and pig-3461.


On Fri, Sep 20, 2013 at 3:42 PM, Prashant Kommireddi wrote:

> Thanks Daniel.
>
>
> On Fri, Sep 20, 2013 at 3:37 PM, Daniel Dai  wrote:
>
> > I just committed PIG-3199.
> >
> >
> > On Fri, Sep 20, 2013 at 3:06 PM, Prashant Kommireddi <
> prash1...@gmail.com
> > >wrote:
> >
> > > Can we get PIG-3199 in? It only exposes a few properties of LP
> > (load/store
> > > paths and funcs) via a wrapper
> > >
> > >
> > > On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai 
> > wrote:
> > >
> > > > With regard to branching 0.12, I will try to commit PIG-2417 and
> > Cheolsoo
> > > > will probably commit PIG-3471. After that I will branch 0.12,
> hopefully
> > > > over the weekend. Anything I miss?
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > >
> > > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
> > > > wrote:
> > > >
> > > > > +1, I need PIG-2417 too.
> > > > >
> > > > >
> > > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn 
> > > > wrote:
> > > > >
> > > > > > I have one JIRA
> https://issues.apache.org/jira/browse/PIG-2417that
> > > I
> > > > > > would
> > > > > > like to get into 0.12 because we've had a number of people ask us
> > > about
> > > > > > getting it committed back to Apache.  However, if it looks like
> too
> > > > much
> > > > > to
> > > > > > review and get committed in the next week or two it could
> probably
> > be
> > > > > > pushed off.
> > > > > >
> > > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into
> > > 0.12.
> > > > > >  I'm going to double check the submitted patches today because I
> > > think
> > > > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken
> > the
> > > > > > currently submitted patches.
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi <
> > > > prash1...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > +1 for a 0.12 release.
> > > > > > >
> > > > > > > I have one outstanding JIRA
> > > > > > https://issues.apache.org/jira/browse/PIG-3199
> > > > > > > .
> > > > > > > Cheolsoo was fine with the patch (except for a typo which I
> will
> > > > > correct)
> > > > > > > but wanted a second opinion. Can someone please take a look?
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho <
> > > > jar...@apache.org
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support)
> > > this
> > > > > > week,
> > > > > > > > to see if it can be included.
> > > > > > > >
> > > > > > > > Jarcec
> > > > > > > >
> > > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park
> wrote:
> > > > > > > > > +1. I will go through my jiras this week.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai <
> > > > da...@hortonworks.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, All,
> > > > > > > > > > It has been more than half a year since initial Pig 0.11
> > > > release.
> > > > > > I'd
> > > > > > > > like
> > > > > > > > > > roll a Pig 0.12 release around the end of September or
> the
> > > > > > beginning
> > > > > > > of
> > > > > > > > > > October. Let me know if it is possible.
> > > > > > > > > >
> > > > > > > > > > Proposed schedule:
> > > > > > > > > > 1. Commit all major features (1-2 weeks)
> > > > > > > > > > 2. Branching Pig 0.12
> > > > > > > > > > 3. Commit remaining patches (1-2 weeks)
> > > > > > > > > > 4. Wrapping up, document (1 week)
> > > > > > > > > >
> > > > > > > > > > If you have patches want to get in, please make sure the
> > Jira
> > > > > > ticket
> > > > > > > > has
> > > > > > > > > > fix version set to 0.12. If the patches originally set to
> > > 0.12
> > > > > and
> > > > > > > you
> > > > > > > > > > think you can delay, please mark the fix version to
> either
> > > > 0.13.0
> > > > > > or
> > > > > > > > > > 0.12.1.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Daniel
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > CONFIDENTIALITY NOTICE
> > > > > > > > > > NOTICE: This message is intended for the use of the
> > > individual
> > > > or
> > > > > > > > entity to
> > > > > > > > > > which it is addressed and may contain information that is
> > > > > > > confidential,
> > > > > > > > > > privileged and exempt from disclosure under applicable
> law.
> > > If
> > > > > the
> > > > > > > > reader
> > > > > > > > > > of this message is not the intended recipient, you are
> > hereby
> > > > > > > notified
> > > > > > > > that
> > > > > > > > > > any printing, copying, dissemination, distribution,
> > > disclosure
> > > > or
> > > > > > > > > > forwarding of this communication is strictly prohibited.
> If
> > > you
> > > > > > have
> > > > > > > > > > received this communication in error, please contact the
> > > sender
> > > > > > > > immediately
> > > > > > > > > > and delete it from your system. Than

Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Prashant Kommireddi
Thanks Daniel.


On Fri, Sep 20, 2013 at 3:37 PM, Daniel Dai  wrote:

> I just committed PIG-3199.
>
>
> On Fri, Sep 20, 2013 at 3:06 PM, Prashant Kommireddi  >wrote:
>
> > Can we get PIG-3199 in? It only exposes a few properties of LP
> (load/store
> > paths and funcs) via a wrapper
> >
> >
> > On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai 
> wrote:
> >
> > > With regard to branching 0.12, I will try to commit PIG-2417 and
> Cheolsoo
> > > will probably commit PIG-3471. After that I will branch 0.12, hopefully
> > > over the weekend. Anything I miss?
> > >
> > > Thanks,
> > > Daniel
> > >
> > >
> > > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
> > > wrote:
> > >
> > > > +1, I need PIG-2417 too.
> > > >
> > > >
> > > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn 
> > > wrote:
> > > >
> > > > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417that
> > I
> > > > > would
> > > > > like to get into 0.12 because we've had a number of people ask us
> > about
> > > > > getting it committed back to Apache.  However, if it looks like too
> > > much
> > > > to
> > > > > review and get committed in the next week or two it could probably
> be
> > > > > pushed off.
> > > > >
> > > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into
> > 0.12.
> > > > >  I'm going to double check the submitted patches today because I
> > think
> > > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken
> the
> > > > > currently submitted patches.
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi <
> > > prash1...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > +1 for a 0.12 release.
> > > > > >
> > > > > > I have one outstanding JIRA
> > > > > https://issues.apache.org/jira/browse/PIG-3199
> > > > > > .
> > > > > > Cheolsoo was fine with the patch (except for a typo which I will
> > > > correct)
> > > > > > but wanted a second opinion. Can someone please take a look?
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho <
> > > jar...@apache.org
> > > > > > >wrote:
> > > > > >
> > > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support)
> > this
> > > > > week,
> > > > > > > to see if it can be included.
> > > > > > >
> > > > > > > Jarcec
> > > > > > >
> > > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
> > > > > > > > +1. I will go through my jiras this week.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai <
> > > da...@hortonworks.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, All,
> > > > > > > > > It has been more than half a year since initial Pig 0.11
> > > release.
> > > > > I'd
> > > > > > > like
> > > > > > > > > roll a Pig 0.12 release around the end of September or the
> > > > > beginning
> > > > > > of
> > > > > > > > > October. Let me know if it is possible.
> > > > > > > > >
> > > > > > > > > Proposed schedule:
> > > > > > > > > 1. Commit all major features (1-2 weeks)
> > > > > > > > > 2. Branching Pig 0.12
> > > > > > > > > 3. Commit remaining patches (1-2 weeks)
> > > > > > > > > 4. Wrapping up, document (1 week)
> > > > > > > > >
> > > > > > > > > If you have patches want to get in, please make sure the
> Jira
> > > > > ticket
> > > > > > > has
> > > > > > > > > fix version set to 0.12. If the patches originally set to
> > 0.12
> > > > and
> > > > > > you
> > > > > > > > > think you can delay, please mark the fix version to either
> > > 0.13.0
> > > > > or
> > > > > > > > > 0.12.1.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Daniel
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > CONFIDENTIALITY NOTICE
> > > > > > > > > NOTICE: This message is intended for the use of the
> > individual
> > > or
> > > > > > > entity to
> > > > > > > > > which it is addressed and may contain information that is
> > > > > > confidential,
> > > > > > > > > privileged and exempt from disclosure under applicable law.
> > If
> > > > the
> > > > > > > reader
> > > > > > > > > of this message is not the intended recipient, you are
> hereby
> > > > > > notified
> > > > > > > that
> > > > > > > > > any printing, copying, dissemination, distribution,
> > disclosure
> > > or
> > > > > > > > > forwarding of this communication is strictly prohibited. If
> > you
> > > > > have
> > > > > > > > > received this communication in error, please contact the
> > sender
> > > > > > > immediately
> > > > > > > > > and delete it from your system. Thank You.
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jeremy Karn / Lead Developer
> > > > > MORTAR DATA / 519 277 4391 / www.mortardata.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> > > > datasyndrome.com
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
>

[jira] [Commented] (PIG-3448) Tez backend layout

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773553#comment-13773553
 ] 

Aniket Mokashi commented on PIG-3448:
-

+1. LGTM. Thanks for doing this!

Minor: This is just the skeleton code to enable Tez implementation and there 
are no tests added, why do we allocate more memory for unittests now, is that 
because of Tez/Yarn?

> Tez backend layout
> --
>
> Key: PIG-3448
> URL: https://issues.apache.org/jira/browse/PIG-3448
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch
>
>
> Design the high-level layout of Tez backend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3199) Provide a method to retriever name of loader/storer in PigServer

2013-09-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3199:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to trunk.

> Provide a method to retriever name of loader/storer in PigServer
> 
>
> Key: PIG-3199
> URL: https://issues.apache.org/jira/browse/PIG-3199
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3199_2.patch, PIG-3199.patch
>
>
> LogicalPlan could be exposed to user in order for one to make validations 
> based on it. For eg, one could get Load/Store paths or other operators and be 
> able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-09-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773529#comment-13773529
 ] 

Daniel Dai commented on PIG-3199:
-

Looks fine for me. We are not exposing LogicalPlan, but only loader/storer name 
in the new patch. I will commit the patch with Cheolsoo's suggested change 
shortly.

> Expose LogicalPlan via PigServer API
> 
>
> Key: PIG-3199
> URL: https://issues.apache.org/jira/browse/PIG-3199
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3199_2.patch, PIG-3199.patch
>
>
> LogicalPlan could be exposed to user in order for one to make validations 
> based on it. For eg, one could get Load/Store paths or other operators and be 
> able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Daniel Dai
I just committed PIG-3199.


On Fri, Sep 20, 2013 at 3:06 PM, Prashant Kommireddi wrote:

> Can we get PIG-3199 in? It only exposes a few properties of LP (load/store
> paths and funcs) via a wrapper
>
>
> On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai  wrote:
>
> > With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo
> > will probably commit PIG-3471. After that I will branch 0.12, hopefully
> > over the weekend. Anything I miss?
> >
> > Thanks,
> > Daniel
> >
> >
> > On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
> > wrote:
> >
> > > +1, I need PIG-2417 too.
> > >
> > >
> > > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn 
> > wrote:
> > >
> > > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that
> I
> > > > would
> > > > like to get into 0.12 because we've had a number of people ask us
> about
> > > > getting it committed back to Apache.  However, if it looks like too
> > much
> > > to
> > > > review and get committed in the next week or two it could probably be
> > > > pushed off.
> > > >
> > > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into
> 0.12.
> > > >  I'm going to double check the submitted patches today because I
> think
> > > > https://issues.apache.org/jira/browse/PIG-3419 might have broken the
> > > > currently submitted patches.
> > > >
> > > >
> > > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi <
> > prash1...@gmail.com
> > > > >wrote:
> > > >
> > > > > +1 for a 0.12 release.
> > > > >
> > > > > I have one outstanding JIRA
> > > > https://issues.apache.org/jira/browse/PIG-3199
> > > > > .
> > > > > Cheolsoo was fine with the patch (except for a typo which I will
> > > correct)
> > > > > but wanted a second opinion. Can someone please take a look?
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho <
> > jar...@apache.org
> > > > > >wrote:
> > > > >
> > > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support)
> this
> > > > week,
> > > > > > to see if it can be included.
> > > > > >
> > > > > > Jarcec
> > > > > >
> > > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
> > > > > > > +1. I will go through my jiras this week.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai <
> > da...@hortonworks.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, All,
> > > > > > > > It has been more than half a year since initial Pig 0.11
> > release.
> > > > I'd
> > > > > > like
> > > > > > > > roll a Pig 0.12 release around the end of September or the
> > > > beginning
> > > > > of
> > > > > > > > October. Let me know if it is possible.
> > > > > > > >
> > > > > > > > Proposed schedule:
> > > > > > > > 1. Commit all major features (1-2 weeks)
> > > > > > > > 2. Branching Pig 0.12
> > > > > > > > 3. Commit remaining patches (1-2 weeks)
> > > > > > > > 4. Wrapping up, document (1 week)
> > > > > > > >
> > > > > > > > If you have patches want to get in, please make sure the Jira
> > > > ticket
> > > > > > has
> > > > > > > > fix version set to 0.12. If the patches originally set to
> 0.12
> > > and
> > > > > you
> > > > > > > > think you can delay, please mark the fix version to either
> > 0.13.0
> > > > or
> > > > > > > > 0.12.1.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Daniel
> > > > > > > >
> > > > > > > > --
> > > > > > > > CONFIDENTIALITY NOTICE
> > > > > > > > NOTICE: This message is intended for the use of the
> individual
> > or
> > > > > > entity to
> > > > > > > > which it is addressed and may contain information that is
> > > > > confidential,
> > > > > > > > privileged and exempt from disclosure under applicable law.
> If
> > > the
> > > > > > reader
> > > > > > > > of this message is not the intended recipient, you are hereby
> > > > > notified
> > > > > > that
> > > > > > > > any printing, copying, dissemination, distribution,
> disclosure
> > or
> > > > > > > > forwarding of this communication is strictly prohibited. If
> you
> > > > have
> > > > > > > > received this communication in error, please contact the
> sender
> > > > > > immediately
> > > > > > > > and delete it from your system. Thank You.
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jeremy Karn / Lead Developer
> > > > MORTAR DATA / 519 277 4391 / www.mortardata.com
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> > > datasyndrome.com
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this c

[jira] [Updated] (PIG-3199) Provide a method to retriever name of loader/storer in PigServer

2013-09-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3199:


Summary: Provide a method to retriever name of loader/storer in PigServer  
(was: Expose LogicalPlan via PigServer API)

> Provide a method to retriever name of loader/storer in PigServer
> 
>
> Key: PIG-3199
> URL: https://issues.apache.org/jira/browse/PIG-3199
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3199_2.patch, PIG-3199.patch
>
>
> LogicalPlan could be exposed to user in order for one to make validations 
> based on it. For eg, one could get Load/Store paths or other operators and be 
> able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2672:


Attachment: PIG-2672.patch

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache

2013-09-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2672:


Status: Patch Available  (was: Open)

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Prashant Kommireddi
Can we get PIG-3199 in? It only exposes a few properties of LP (load/store
paths and funcs) via a wrapper


On Fri, Sep 20, 2013 at 3:03 PM, Daniel Dai  wrote:

> With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo
> will probably commit PIG-3471. After that I will branch 0.12, hopefully
> over the weekend. Anything I miss?
>
> Thanks,
> Daniel
>
>
> On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
> wrote:
>
> > +1, I need PIG-2417 too.
> >
> >
> > On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn 
> wrote:
> >
> > > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I
> > > would
> > > like to get into 0.12 because we've had a number of people ask us about
> > > getting it committed back to Apache.  However, if it looks like too
> much
> > to
> > > review and get committed in the next week or two it could probably be
> > > pushed off.
> > >
> > > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12.
> > >  I'm going to double check the submitted patches today because I think
> > > https://issues.apache.org/jira/browse/PIG-3419 might have broken the
> > > currently submitted patches.
> > >
> > >
> > > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi <
> prash1...@gmail.com
> > > >wrote:
> > >
> > > > +1 for a 0.12 release.
> > > >
> > > > I have one outstanding JIRA
> > > https://issues.apache.org/jira/browse/PIG-3199
> > > > .
> > > > Cheolsoo was fine with the patch (except for a typo which I will
> > correct)
> > > > but wanted a second opinion. Can someone please take a look?
> > > >
> > > >
> > > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho <
> jar...@apache.org
> > > > >wrote:
> > > >
> > > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this
> > > week,
> > > > > to see if it can be included.
> > > > >
> > > > > Jarcec
> > > > >
> > > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
> > > > > > +1. I will go through my jiras this week.
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai <
> da...@hortonworks.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi, All,
> > > > > > > It has been more than half a year since initial Pig 0.11
> release.
> > > I'd
> > > > > like
> > > > > > > roll a Pig 0.12 release around the end of September or the
> > > beginning
> > > > of
> > > > > > > October. Let me know if it is possible.
> > > > > > >
> > > > > > > Proposed schedule:
> > > > > > > 1. Commit all major features (1-2 weeks)
> > > > > > > 2. Branching Pig 0.12
> > > > > > > 3. Commit remaining patches (1-2 weeks)
> > > > > > > 4. Wrapping up, document (1 week)
> > > > > > >
> > > > > > > If you have patches want to get in, please make sure the Jira
> > > ticket
> > > > > has
> > > > > > > fix version set to 0.12. If the patches originally set to 0.12
> > and
> > > > you
> > > > > > > think you can delay, please mark the fix version to either
> 0.13.0
> > > or
> > > > > > > 0.12.1.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Daniel
> > > > > > >
> > > > > > > --
> > > > > > > CONFIDENTIALITY NOTICE
> > > > > > > NOTICE: This message is intended for the use of the individual
> or
> > > > > entity to
> > > > > > > which it is addressed and may contain information that is
> > > > confidential,
> > > > > > > privileged and exempt from disclosure under applicable law. If
> > the
> > > > > reader
> > > > > > > of this message is not the intended recipient, you are hereby
> > > > notified
> > > > > that
> > > > > > > any printing, copying, dissemination, distribution, disclosure
> or
> > > > > > > forwarding of this communication is strictly prohibited. If you
> > > have
> > > > > > > received this communication in error, please contact the sender
> > > > > immediately
> > > > > > > and delete it from your system. Thank You.
> > > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Jeremy Karn / Lead Developer
> > > MORTAR DATA / 519 277 4391 / www.mortardata.com
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> > datasyndrome.com
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Are we ready for Pig 0.12.0 release?

2013-09-20 Thread Daniel Dai
With regard to branching 0.12, I will try to commit PIG-2417 and Cheolsoo
will probably commit PIG-3471. After that I will branch 0.12, hopefully
over the weekend. Anything I miss?

Thanks,
Daniel


On Tue, Sep 10, 2013 at 11:30 AM, Russell Jurney
wrote:

> +1, I need PIG-2417 too.
>
>
> On Wed, Sep 4, 2013 at 5:17 AM, Jeremy Karn  wrote:
>
> > I have one JIRA https://issues.apache.org/jira/browse/PIG-2417 that I
> > would
> > like to get into 0.12 because we've had a number of people ask us about
> > getting it committed back to Apache.  However, if it looks like too much
> to
> > review and get committed in the next week or two it could probably be
> > pushed off.
> >
> > I also have 3 small jiras (3426, 3430, 3431) I'd like to get into 0.12.
> >  I'm going to double check the submitted patches today because I think
> > https://issues.apache.org/jira/browse/PIG-3419 might have broken the
> > currently submitted patches.
> >
> >
> > On Tue, Sep 3, 2013 at 2:36 PM, Prashant Kommireddi  > >wrote:
> >
> > > +1 for a 0.12 release.
> > >
> > > I have one outstanding JIRA
> > https://issues.apache.org/jira/browse/PIG-3199
> > > .
> > > Cheolsoo was fine with the patch (except for a typo which I will
> correct)
> > > but wanted a second opinion. Can someone please take a look?
> > >
> > >
> > > On Tue, Sep 3, 2013 at 11:08 AM, Jarek Jarcec Cecho  > > >wrote:
> > >
> > > > I'll try to clean up and finish PIG-3390 (HBase 0.95 support) this
> > week,
> > > > to see if it can be included.
> > > >
> > > > Jarcec
> > > >
> > > > On Tue, Sep 03, 2013 at 10:56:42AM -0700, Cheolsoo Park wrote:
> > > > > +1. I will go through my jiras this week.
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2013 at 10:34 AM, Daniel Dai  >
> > > > wrote:
> > > > >
> > > > > > Hi, All,
> > > > > > It has been more than half a year since initial Pig 0.11 release.
> > I'd
> > > > like
> > > > > > roll a Pig 0.12 release around the end of September or the
> > beginning
> > > of
> > > > > > October. Let me know if it is possible.
> > > > > >
> > > > > > Proposed schedule:
> > > > > > 1. Commit all major features (1-2 weeks)
> > > > > > 2. Branching Pig 0.12
> > > > > > 3. Commit remaining patches (1-2 weeks)
> > > > > > 4. Wrapping up, document (1 week)
> > > > > >
> > > > > > If you have patches want to get in, please make sure the Jira
> > ticket
> > > > has
> > > > > > fix version set to 0.12. If the patches originally set to 0.12
> and
> > > you
> > > > > > think you can delay, please mark the fix version to either 0.13.0
> > or
> > > > > > 0.12.1.
> > > > > >
> > > > > > Thanks,
> > > > > > Daniel
> > > > > >
> > > > > > --
> > > > > > CONFIDENTIALITY NOTICE
> > > > > > NOTICE: This message is intended for the use of the individual or
> > > > entity to
> > > > > > which it is addressed and may contain information that is
> > > confidential,
> > > > > > privileged and exempt from disclosure under applicable law. If
> the
> > > > reader
> > > > > > of this message is not the intended recipient, you are hereby
> > > notified
> > > > that
> > > > > > any printing, copying, dissemination, distribution, disclosure or
> > > > > > forwarding of this communication is strictly prohibited. If you
> > have
> > > > > > received this communication in error, please contact the sender
> > > > immediately
> > > > > > and delete it from your system. Thank You.
> > > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Jeremy Karn / Lead Developer
> > MORTAR DATA / 519 277 4391 / www.mortardata.com
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> datasyndrome.com
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (PIG-3472) Pig should avoid replicated join if size is greater than configured limit

2013-09-20 Thread Aniket Mokashi (JIRA)
Aniket Mokashi created PIG-3472:
---

 Summary: Pig should avoid replicated join if size is greater than 
configured limit
 Key: PIG-3472
 URL: https://issues.apache.org/jira/browse/PIG-3472
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: Aniket Mokashi
 Fix For: 0.12




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.

2013-09-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773228#comment-13773228
 ] 

Daniel Dai commented on PIG-2417:
-

There is one more issue in the Hadoop 2: job.jar does not get unjared before 
launching map/reduce, so controller.py cannot find the udf script. Seems we 
need one more step to unjar script files before invoking controller.py.

I'd like to commit this patch before we branch 0.12. There still several holes 
to get stream udf work under Hadoop2, I would suggest commit the patch first, 
mark e2e tests as not valid in hadoop 2, then fix them after branch. Thoughts?

> Streaming UDFs -  allow users to easily write UDFs in scripting languages 
> with no JVM implementation.
> -
>
> Key: PIG-2417
> URL: https://issues.apache.org/jira/browse/PIG-2417
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.12
>Reporter: Jeremy Karn
> Fix For: 0.12
>
> Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, 
> PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-9-1.patch, PIG-2417-9-2.patch, 
> PIG-2417-9.patch, PIG-2417-e2e.patch, streaming2.patch, streaming3.patch, 
> streaming.patch
>
>
> The goal of Streaming UDFs is to allow users to easily write UDFs in 
> scripting languages with no JVM implementation or a limited JVM 
> implementation.  The initial proposal is outlined here: 
> https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.
> In order to implement this we need new syntax to distinguish a streaming UDF 
> from an embedded JVM UDF.  I'd propose something like the following (although 
> I'm not sure 'language' is the best term to be using):
> {code}define my_streaming_udfs language('python') 
> ship('my_streaming_udfs.py'){code}
> We'll also need a language-specific controller script that gets shipped to 
> the cluster which is responsible for reading the input stream, deserializing 
> the input data, passing it to the user written script, serializing that 
> script output, and writing that to the output stream.
> Finally, we'll need to add a StreamingUDF class that extends evalFunc.  This 
> class will likely share some of the existing code in POStream and 
> ExecutableManager (where it make sense to pull out shared code) to stream 
> data to/from the controller script.
> One alternative approach to creating the StreamingUDF EvalFunc is to use the 
> POStream operator directly.  This would involve inserting the POStream 
> operator instead of the POUserFunc operator whenever we encountered a 
> streaming UDF while building the physical plan.  This approach seemed 
> problematic because there would need to be a lot of changes in order to 
> support POStream in all of the places we want to be able use UDFs (For 
> example - to operate on a single field inside of a for each statement).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773168#comment-13773168
 ] 

Cheolsoo Park edited comment on PIG-3471 at 9/20/13 4:51 PM:
-

[~daijy], I was thinking of committing it to trunk too. I will do it. Thank you!

  was (Author: cheolsoo):
[~daniel dai], yes I was thinking of committing to trunk too. I will do it. 
Thank you!
  
> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773168#comment-13773168
 ] 

Cheolsoo Park commented on PIG-3471:


[~daniel dai], yes I was thinking of committing to trunk too. I will do it. 
Thank you!

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773151#comment-13773151
 ] 

Daniel Dai commented on PIG-3471:
-

Does it only goes to Tez branch? Sounds like a general restructure follows up 
PIG-3419.

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3448) Tez backend layout

2013-09-20 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3448:
---

Attachment: PIG-3448-3.patch

Minor clean ups. Note the latest patch depends on PIG-3471.

> Tez backend layout
> --
>
> Key: PIG-3448
> URL: https://issues.apache.org/jira/browse/PIG-3448
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3448-1.patch, PIG-3448-2.patch, PIG-3448-3.patch
>
>
> Design the high-level layout of Tez backend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3367) Add assert keyword (operator) in pig

2013-09-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3367:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add assert keyword (operator) in pig
> 
>
> Key: PIG-3367
> URL: https://issues.apache.org/jira/browse/PIG-3367
> Project: Pig
>  Issue Type: New Feature
>  Components: parser
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-3367-2.patch, PIG-3367.patch
>
>
> Assert operator can be used for data validation. With assert you can write 
> script as following-
> {code}
> a = load 'something' as (a0:int, a1:int);
> assert a by a0 > 0, 'a cant be negative for reasons';
> {code}
> This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-09-20 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773118#comment-13773118
 ] 

Aniket Mokashi commented on PIG-3367:
-

Committed to trunk. Thanks Julien for the review!

> Add assert keyword (operator) in pig
> 
>
> Key: PIG-3367
> URL: https://issues.apache.org/jira/browse/PIG-3367
> Project: Pig
>  Issue Type: New Feature
>  Components: parser
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-3367-2.patch, PIG-3367.patch
>
>
> Assert operator can be used for data validation. With assert you can write 
> script as following-
> {code}
> a = load 'something' as (a0:int, a1:int);
> assert a by a0 > 0, 'a cant be negative for reasons';
> {code}
> This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773148#comment-13773148
 ] 

Daniel Dai commented on PIG-3471:
-

+1

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3471:
---

Attachment: PIG-3471-1.patch

Attached includes the following changes:
# Adds HExecutionEngine abstract class to o.a.p.backend.hadoop.executionengine.
# Moves MRExecutionEngine from o.a.p.backend.hadoop.executionengine to 
o.a.p.backend.hadoop.executionengine.mapReduceLayer.
# Converts some utility functions to public static functions in Utils.java.

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3471:
---

Status: Patch Available  (was: Open)

> Add a base abstract class for ExecutionEngine
> -
>
> Key: PIG-3471
> URL: https://issues.apache.org/jira/browse/PIG-3471
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3471-1.patch
>
>
> While implementing TezExecutionEngine, I realized that a lot of code can be 
> shared between MRExecutionEngine and TezExecutionEngine because both use the 
> common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
> to create a base abstract class for them (called HExecutionEngine) and have 
> them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3471) Add a base abstract class for ExecutionEngine

2013-09-20 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3471:
--

 Summary: Add a base abstract class for ExecutionEngine
 Key: PIG-3471
 URL: https://issues.apache.org/jira/browse/PIG-3471
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch


While implementing TezExecutionEngine, I realized that a lot of code can be 
shared between MRExecutionEngine and TezExecutionEngine because both use the 
common Hadoop framework (hdfs, resource manager, etc). So it would make sense 
to create a base abstract class for them (called HExecutionEngine) and have 
them inherit common methods and fields from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira