[jira] Subscription: PIG patch available

2013-08-30 Thread jira
Issue Subscription
Filter: PIG patch available (16 issues)

Subscriber: pigdaily

Key Summary
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3434Null subexpression in bincond nullifies outer tuple (or bag)
https://issues.apache.org/jira/browse/PIG-3434
PIG-3431Return more information for parsing related exceptions.
https://issues.apache.org/jira/browse/PIG-3431
PIG-3430Add xml format for explaining MapReduce Plan.
https://issues.apache.org/jira/browse/PIG-3430
PIG-3426Add support for removing s3 files
https://issues.apache.org/jira/browse/PIG-3426
PIG-3346New property that controls the number of combined splits
https://issues.apache.org/jira/browse/PIG-3346
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3325Adding a tuple to a bag is slow
https://issues.apache.org/jira/browse/PIG-3325
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3255Avoid extra byte array copy in streaming deserialize
https://issues.apache.org/jira/browse/PIG-3255
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3117A debug mode in which pig does not delete temporary files
https://issues.apache.org/jira/browse/PIG-3117
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3349) Document ToString(Datetime, String) UDF

2013-08-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3349:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you Daniel for the review!

> Document ToString(Datetime, String) UDF
> ---
>
> Key: PIG-3349
> URL: https://issues.apache.org/jira/browse/PIG-3349
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.11.1
>Reporter: pat chan
>Assignee: Cheolsoo Park
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-3349.patch
>
>
> Currently you can't cast a datetimeobject into a chararray:
> grunt> B = foreach A generate (chararray)a; dump B;
> 2013-06-05 15:29:01,372 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1052: 
>  Cannot cast datetime to chararray
> Details at logfile: /Users/patc/projects/pig-0.11.1/pig_1370471270879.log
> Was this an oversight? The documented casting matrix does not show the 
> datetime object so I'm not sure if the current behavior is correct or not.
> My recommendation would be to support casting to and from strings. Casting 
> from a string would behave exactly like loading a datetime. Casting to a 
> string would be exactly the format you get when you dump a datetime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3374) CASE and IN fail when expression includes dereferencing operator

2013-08-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3374:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Got +1 from Daniel in RB. Committed to trunk.

Thank you Daniel for the review!

> CASE and IN fail when expression includes dereferencing operator
> 
>
> Key: PIG-3374
> URL: https://issues.apache.org/jira/browse/PIG-3374
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3374-2.patch, PIG-3374-3.patch, PIG-3374-4.patch, 
> PIG-3374.patch
>
>
> This is another bug that I discovered after deploying CASE/IN expressions 
> internally.
> The current implementation of CASE/IN expression assumes that the 1st operand 
> is a single expression. But this is not true, for example, if it contains a 
> dereferencing operator. The following example demonstrates the problem:
> {code}
> A = LOAD 'foo' AS (k1:chararray, k2:chararray, v:int);
> B = GROUP A BY (k1, k2);
> C = FILTER B BY group.k1 IN ('a', 'b');
> DUMP C;
> {code}
> This fails with the following error:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.in_eval(LogicalPlanGenerator.java:8624)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.cond(LogicalPlanGenerator.java:8405)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.filter_clause(LogicalPlanGenerator.java:7564)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1403)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:821)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:539)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:414)
> at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181)
> {code}
> Here is the relavant code that causes trouble:
> {code:title=QueryParser.g}
> if(tree.getType() == IN) {
>   Tree lhs = tree.getChild(0); // lhs is not a single node!
>   for(int i = 2; i < tree.getChildCount(); i = i + 2) {
> tree.insertChild(i, deepCopy(lhs));
>   }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Parquet support built in Pig

2013-08-30 Thread Daniel Dai
Is that a copy of the github code? How about the future development, will
it be on github and then merge into Apache?


On Thu, Aug 29, 2013 at 4:47 PM, Russell Jurney wrote:

> I think this is awesome. Best thing since diet sliced bread (they cut the
> slices thin).
>
>
> On Thu, Aug 29, 2013 at 4:36 PM, Julien Le Dem  wrote:
>
> > Hello fellow Pig developers
> > I have opened a JIRA to add Parquet as a buit-in format in Pig:
> > https://issues.apache.org/jira/browse/PIG-3445
> > Please let me know what you think.
> > Julien
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> datasyndrome.com
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Parquet support built in Pig

2013-08-30 Thread Julien Le Dem
The parquet code would stay on github. This would be a packaging integration.

Julien

On Aug 30, 2013, at 13:42, Daniel Dai  wrote:

> Is that a copy of the github code? How about the future development, will
> it be on github and then merge into Apache?
> 
> 
> On Thu, Aug 29, 2013 at 4:47 PM, Russell Jurney 
> wrote:
> 
>> I think this is awesome. Best thing since diet sliced bread (they cut the
>> slices thin).
>> 
>> 
>> On Thu, Aug 29, 2013 at 4:36 PM, Julien Le Dem  wrote:
>> 
>>> Hello fellow Pig developers
>>> I have opened a JIRA to add Parquet as a buit-in format in Pig:
>>> https://issues.apache.org/jira/browse/PIG-3445
>>> Please let me know what you think.
>>> Julien
>> 
>> 
>> 
>> 
>> --
>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
>> datasyndrome.com
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


dev@pig.apache.org

2013-08-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755129#comment-13755129
 ] 

Daniel Dai commented on PIG-3293:
-

Also improve the error message to indicate possible causes would help.

> Casting fails after Union from two data sources&loaders
> ---
>
> Key: PIG-3293
> URL: https://issues.apache.org/jira/browse/PIG-3293
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-3293-test-only-v01.patch
>
>
> Script similar to 
> {noformat}
> A = load 'data1' using MyLoader() as (a:bytearray);
> B = load 'data2' as (a:bytearray);
> C = union onschema A,B;
> D = foreach C generate (chararray)a;
> Store D into './out';
> {noformat}
> fails with 
>java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: 
> ERROR 1075: Received a bytearray from the UDF. Cannot determine how to 
> convert the bytearray to string.
> Both MyLoader and PigStorage use the default Utf8StorageConverter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 13210: PIG-3374 CASE and IN fail when expression includes dereferencing operator

2013-08-30 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13210/#review25805
---

Ship it!


- Daniel Dai


On Aug. 2, 2013, 1:50 a.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13210/
> ---
> 
> (Updated Aug. 2, 2013, 1:50 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3374
> https://issues.apache.org/jira/browse/PIG-3374
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> CASE/IN fail when the lhs expression contains a dereference operator. Please 
> see PIG-3374 for details:
> https://issues.apache.org/jira/browse/PIG-3374
> 
> 
> Diffs
> -
> 
>   src/org/apache/pig/parser/AliasMasker.g 98d94f7 
>   src/org/apache/pig/parser/AstPrinter.g d87 
>   src/org/apache/pig/parser/AstValidator.g d0ed0e8 
>   src/org/apache/pig/parser/LogicalPlanGenerator.g cc1f47e 
>   src/org/apache/pig/parser/QueryParser.g d4d9700 
>   test/org/apache/pig/test/TestCase.java 5d8f7f3 
>   test/org/apache/pig/test/TestIn.java c3a55de 
> 
> Diff: https://reviews.apache.org/r/13210/diff/
> 
> 
> Testing
> ---
> 
> Added new test cases. All the unit tests pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



dev@pig.apache.org

2013-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755073#comment-13755073
 ] 

Olga Natkovich commented on PIG-3293:
-

Would it help to document that typecasting needs to happen before any Union 
operation?

> Casting fails after Union from two data sources&loaders
> ---
>
> Key: PIG-3293
> URL: https://issues.apache.org/jira/browse/PIG-3293
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-3293-test-only-v01.patch
>
>
> Script similar to 
> {noformat}
> A = load 'data1' using MyLoader() as (a:bytearray);
> B = load 'data2' as (a:bytearray);
> C = union onschema A,B;
> D = foreach C generate (chararray)a;
> Store D into './out';
> {noformat}
> fails with 
>java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: 
> ERROR 1075: Received a bytearray from the UDF. Cannot determine how to 
> convert the bytearray to string.
> Both MyLoader and PigStorage use the default Utf8StorageConverter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3419) Pluggable Execution Engine

2013-08-30 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3419:
---

   Resolution: Fixed
Fix Version/s: 0.12
   Status: Resolved  (was: Patch Available)

Committed to trunk:
http://svn.apache.org/viewvc?view=revision&revision=1519062

Thank you Achal!

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Fix For: 0.12
>
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


dev@pig.apache.org

2013-08-30 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755016#comment-13755016
 ] 

Koji Noguchi commented on PIG-3293:
---

I hit a worse case today.

(1) Case I mentioned originally was with union between loaderA and loaderB in 
which both return the same loadCaster, Utf8StorageConverter.  Typecast failing 
after the union. 

One I saw today.
(2) Single Loader but with different argument resulting in a typecast error.
{noformat}
A = load 'data1' using LoaderA('col1') as (a:bytearray);
B = load 'data1' using LoaderA('col2') as (a:bytearray);
C = union ...; D = foreach C generate (chararray)a; store D ...
{noformat}


I wish I can simply check the classname of the loaders for the uniqueness of 
loadcaster.
But then, I saw HBaseStorage returning different loadcaster depending on its 
input parameters.

One other approach I'm thinking is, is it possible to push the typecast above 
the union so that we can perform loader.getLoadCaster().bytsToCharArray for 
each input to union ?

> Casting fails after Union from two data sources&loaders
> ---
>
> Key: PIG-3293
> URL: https://issues.apache.org/jira/browse/PIG-3293
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-3293-test-only-v01.patch
>
>
> Script similar to 
> {noformat}
> A = load 'data1' using MyLoader() as (a:bytearray);
> B = load 'data2' as (a:bytearray);
> C = union onschema A,B;
> D = foreach C generate (chararray)a;
> Store D into './out';
> {noformat}
> fails with 
>java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: 
> ERROR 1075: Received a bytearray from the UDF. Cannot determine how to 
> convert the bytearray to string.
> Both MyLoader and PigStorage use the default Utf8StorageConverter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754975#comment-13754975
 ] 

Julien Le Dem commented on PIG-3419:


+1
[~cheolsoo] LGTM!

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira