[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (16 issues) Subscriber: pigdaily Key Summary PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3434Null subexpression in bincond nullifies outer tuple (or bag) https://issues.apache.org/jira/browse/PIG-3434 PIG-3431Return more information for parsing related exceptions. https://issues.apache.org/jira/browse/PIG-3431 PIG-3430Add xml format for explaining MapReduce Plan. https://issues.apache.org/jira/browse/PIG-3430 PIG-3426Add support for removing s3 files https://issues.apache.org/jira/browse/PIG-3426 PIG-3346New property that controls the number of combined splits https://issues.apache.org/jira/browse/PIG-3346 PIG-Fix remaining Windows core unit test failures https://issues.apache.org/jira/browse/PIG- PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3255Avoid extra byte array copy in streaming deserialize https://issues.apache.org/jira/browse/PIG-3255 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3117A debug mode in which pig does not delete temporary files https://issues.apache.org/jira/browse/PIG-3117 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Updated] (PIG-3349) Document ToString(Datetime, String) UDF
[ https://issues.apache.org/jira/browse/PIG-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3349: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thank you Daniel for the review! > Document ToString(Datetime, String) UDF > --- > > Key: PIG-3349 > URL: https://issues.apache.org/jira/browse/PIG-3349 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.11.1 >Reporter: pat chan >Assignee: Cheolsoo Park >Priority: Minor > Fix For: 0.12 > > Attachments: PIG-3349.patch > > > Currently you can't cast a datetimeobject into a chararray: > grunt> B = foreach A generate (chararray)a; dump B; > 2013-06-05 15:29:01,372 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1052: > Cannot cast datetime to chararray > Details at logfile: /Users/patc/projects/pig-0.11.1/pig_1370471270879.log > Was this an oversight? The documented casting matrix does not show the > datetime object so I'm not sure if the current behavior is correct or not. > My recommendation would be to support casting to and from strings. Casting > from a string would behave exactly like loading a datetime. Casting to a > string would be exactly the format you get when you dump a datetime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3374) CASE and IN fail when expression includes dereferencing operator
[ https://issues.apache.org/jira/browse/PIG-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3374: --- Resolution: Fixed Status: Resolved (was: Patch Available) Got +1 from Daniel in RB. Committed to trunk. Thank you Daniel for the review! > CASE and IN fail when expression includes dereferencing operator > > > Key: PIG-3374 > URL: https://issues.apache.org/jira/browse/PIG-3374 > Project: Pig > Issue Type: Bug > Components: parser >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3374-2.patch, PIG-3374-3.patch, PIG-3374-4.patch, > PIG-3374.patch > > > This is another bug that I discovered after deploying CASE/IN expressions > internally. > The current implementation of CASE/IN expression assumes that the 1st operand > is a single expression. But this is not true, for example, if it contains a > dereferencing operator. The following example demonstrates the problem: > {code} > A = LOAD 'foo' AS (k1:chararray, k2:chararray, v:int); > B = GROUP A BY (k1, k2); > C = FILTER B BY group.k1 IN ('a', 'b'); > DUMP C; > {code} > This fails with the following error: > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 5, Size: 5 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at > org.apache.pig.parser.LogicalPlanGenerator.in_eval(LogicalPlanGenerator.java:8624) > at > org.apache.pig.parser.LogicalPlanGenerator.cond(LogicalPlanGenerator.java:8405) > at > org.apache.pig.parser.LogicalPlanGenerator.filter_clause(LogicalPlanGenerator.java:7564) > at > org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1403) > at > org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:821) > at > org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:539) > at > org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:414) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181) > {code} > Here is the relavant code that causes trouble: > {code:title=QueryParser.g} > if(tree.getType() == IN) { > Tree lhs = tree.getChild(0); // lhs is not a single node! > for(int i = 2; i < tree.getChildCount(); i = i + 2) { > tree.insertChild(i, deepCopy(lhs)); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Parquet support built in Pig
Is that a copy of the github code? How about the future development, will it be on github and then merge into Apache? On Thu, Aug 29, 2013 at 4:47 PM, Russell Jurney wrote: > I think this is awesome. Best thing since diet sliced bread (they cut the > slices thin). > > > On Thu, Aug 29, 2013 at 4:36 PM, Julien Le Dem wrote: > > > Hello fellow Pig developers > > I have opened a JIRA to add Parquet as a buit-in format in Pig: > > https://issues.apache.org/jira/browse/PIG-3445 > > Please let me know what you think. > > Julien > > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > datasyndrome.com > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Parquet support built in Pig
The parquet code would stay on github. This would be a packaging integration. Julien On Aug 30, 2013, at 13:42, Daniel Dai wrote: > Is that a copy of the github code? How about the future development, will > it be on github and then merge into Apache? > > > On Thu, Aug 29, 2013 at 4:47 PM, Russell Jurney > wrote: > >> I think this is awesome. Best thing since diet sliced bread (they cut the >> slices thin). >> >> >> On Thu, Aug 29, 2013 at 4:36 PM, Julien Le Dem wrote: >> >>> Hello fellow Pig developers >>> I have opened a JIRA to add Parquet as a buit-in format in Pig: >>> https://issues.apache.org/jira/browse/PIG-3445 >>> Please let me know what you think. >>> Julien >> >> >> >> >> -- >> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >> datasyndrome.com > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
dev@pig.apache.org
[ https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755129#comment-13755129 ] Daniel Dai commented on PIG-3293: - Also improve the error message to indicate possible causes would help. > Casting fails after Union from two data sources&loaders > --- > > Key: PIG-3293 > URL: https://issues.apache.org/jira/browse/PIG-3293 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-3293-test-only-v01.patch > > > Script similar to > {noformat} > A = load 'data1' using MyLoader() as (a:bytearray); > B = load 'data2' as (a:bytearray); > C = union onschema A,B; > D = foreach C generate (chararray)a; > Store D into './out'; > {noformat} > fails with >java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: > ERROR 1075: Received a bytearray from the UDF. Cannot determine how to > convert the bytearray to string. > Both MyLoader and PigStorage use the default Utf8StorageConverter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 13210: PIG-3374 CASE and IN fail when expression includes dereferencing operator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13210/#review25805 --- Ship it! - Daniel Dai On Aug. 2, 2013, 1:50 a.m., Cheolsoo Park wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13210/ > --- > > (Updated Aug. 2, 2013, 1:50 a.m.) > > > Review request for pig. > > > Bugs: PIG-3374 > https://issues.apache.org/jira/browse/PIG-3374 > > > Repository: pig-git > > > Description > --- > > CASE/IN fail when the lhs expression contains a dereference operator. Please > see PIG-3374 for details: > https://issues.apache.org/jira/browse/PIG-3374 > > > Diffs > - > > src/org/apache/pig/parser/AliasMasker.g 98d94f7 > src/org/apache/pig/parser/AstPrinter.g d87 > src/org/apache/pig/parser/AstValidator.g d0ed0e8 > src/org/apache/pig/parser/LogicalPlanGenerator.g cc1f47e > src/org/apache/pig/parser/QueryParser.g d4d9700 > test/org/apache/pig/test/TestCase.java 5d8f7f3 > test/org/apache/pig/test/TestIn.java c3a55de > > Diff: https://reviews.apache.org/r/13210/diff/ > > > Testing > --- > > Added new test cases. All the unit tests pass. > > > Thanks, > > Cheolsoo Park > >
dev@pig.apache.org
[ https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755073#comment-13755073 ] Olga Natkovich commented on PIG-3293: - Would it help to document that typecasting needs to happen before any Union operation? > Casting fails after Union from two data sources&loaders > --- > > Key: PIG-3293 > URL: https://issues.apache.org/jira/browse/PIG-3293 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-3293-test-only-v01.patch > > > Script similar to > {noformat} > A = load 'data1' using MyLoader() as (a:bytearray); > B = load 'data2' as (a:bytearray); > C = union onschema A,B; > D = foreach C generate (chararray)a; > Store D into './out'; > {noformat} > fails with >java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: > ERROR 1075: Received a bytearray from the UDF. Cannot determine how to > convert the bytearray to string. > Both MyLoader and PigStorage use the default Utf8StorageConverter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3419: --- Resolution: Fixed Fix Version/s: 0.12 Status: Resolved (was: Patch Available) Committed to trunk: http://svn.apache.org/viewvc?view=revision&revision=1519062 Thank you Achal! > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Fix For: 0.12 > > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, > updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, > updated-8-29-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
dev@pig.apache.org
[ https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755016#comment-13755016 ] Koji Noguchi commented on PIG-3293: --- I hit a worse case today. (1) Case I mentioned originally was with union between loaderA and loaderB in which both return the same loadCaster, Utf8StorageConverter. Typecast failing after the union. One I saw today. (2) Single Loader but with different argument resulting in a typecast error. {noformat} A = load 'data1' using LoaderA('col1') as (a:bytearray); B = load 'data1' using LoaderA('col2') as (a:bytearray); C = union ...; D = foreach C generate (chararray)a; store D ... {noformat} I wish I can simply check the classname of the loaders for the uniqueness of loadcaster. But then, I saw HBaseStorage returning different loadcaster depending on its input parameters. One other approach I'm thinking is, is it possible to push the typecast above the union so that we can perform loader.getLoadCaster().bytsToCharArray for each input to union ? > Casting fails after Union from two data sources&loaders > --- > > Key: PIG-3293 > URL: https://issues.apache.org/jira/browse/PIG-3293 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-3293-test-only-v01.patch > > > Script similar to > {noformat} > A = load 'data1' using MyLoader() as (a:bytearray); > B = load 'data2' as (a:bytearray); > C = union onschema A,B; > D = foreach C generate (chararray)a; > Store D into './out'; > {noformat} > fails with >java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: > ERROR 1075: Received a bytearray from the UDF. Cannot determine how to > convert the bytearray to string. > Both MyLoader and PigStorage use the default Utf8StorageConverter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754975#comment-13754975 ] Julien Le Dem commented on PIG-3419: +1 [~cheolsoo] LGTM! > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, > updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, > updated-8-29-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira