[jira] [Updated] (PIG-3317) disable optimizations via pig properties
[ https://issues.apache.org/jira/browse/PIG-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-3317: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed, thanks Travis! > disable optimizations via pig properties > > > Key: PIG-3317 > URL: https://issues.apache.org/jira/browse/PIG-3317 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12 >Reporter: Travis Crawford >Assignee: Travis Crawford > Attachments: PIG-3317_disable_opts.1.patch, > PIG-3317_disable_opts.2.patch, PIG-3317_disable_opts.3.patch, > PIG-3317_disable_opts.4.patch > > > Pig provides a number of optimizations which are described at > [http://pig.apache.org/docs/r0.11.1/perf.html#optimization-rules]. As is > described in the docs, all or specific optimizations can be disabled via the > command-line. > Currently the caller of a pig script must know which optimizations to disable > when running because that information cannot be set in the script itself. Nor > can optimizations be disabled site-wide through pig.properties. > Pig should allow disabling optimizations via properties so that pig scripts > themselves can disable optimizations as needed, rather than the caller > needing to know what optimizations to disable on the command-line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2378) macros don't accept references to items within tuples as arguments
[ https://issues.apache.org/jira/browse/PIG-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-2378: -- Attachment: PIG-2378.patch.txt latest patch improve minor issue (comments in code) > macros don't accept references to items within tuples as arguments > -- > > Key: PIG-2378 > URL: https://issues.apache.org/jira/browse/PIG-2378 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.9.1 >Reporter: Joseph Adler >Assignee: Johnny Zhang > Attachments: PIG-2378.patch.txt, PIG-2378.patch.txt > > > I'd like to be able to pass a reference to an item within a parameter to a > Pig Macro. > For example, suppose that I had a relation A with the schema A:{id:long, > header:(time:long, type:chararray)}. I'd like to call a macro by typing: >B = MY_MACRO(A, header.time); > but this does not currently work. Obviously, I could define a new relation as > a workaround, for example I could use some pig code like > AA = FOREACH a GENERATE *, header.time as time; > B = MY_MACRO(AA, time); > But that's ugly and clunky -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2378) macros don't accept references to items within tuples as arguments
[ https://issues.apache.org/jira/browse/PIG-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-2378: -- Status: Patch Available (was: Open) > macros don't accept references to items within tuples as arguments > -- > > Key: PIG-2378 > URL: https://issues.apache.org/jira/browse/PIG-2378 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.9.1 >Reporter: Joseph Adler >Assignee: Johnny Zhang > Attachments: PIG-2378.patch.txt > > > I'd like to be able to pass a reference to an item within a parameter to a > Pig Macro. > For example, suppose that I had a relation A with the schema A:{id:long, > header:(time:long, type:chararray)}. I'd like to call a macro by typing: >B = MY_MACRO(A, header.time); > but this does not currently work. Obviously, I could define a new relation as > a workaround, for example I could use some pig code like > AA = FOREACH a GENERATE *, header.time as time; > B = MY_MACRO(AA, time); > But that's ugly and clunky -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3329) RANK operator failed when working with SPLIT
Redis Liu created PIG-3329: -- Summary: RANK operator failed when working with SPLIT Key: PIG-3329 URL: https://issues.apache.org/jira/browse/PIG-3329 Project: Pig Issue Type: Bug Affects Versions: 0.11.1 Reporter: Redis Liu Priority: Critical input.txt: 1 2 3 4 5 6 7 8 9 script: a = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int); SPLIT a into b if a > 0, c if a > 5; d = RANK b; dump d; job will fail with error message: java.lang.RuntimeException: Unable to read counter pig.counters.counter_4929375455335572575_-1 at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:161) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNext(PORank.java:134) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) at org.apache.hadoop.mapred.Child$4.run(Child.java:275) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1340) at org.apache.hadoop.mapred.Child.main(Child.java:269) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat reopened PIG-3322: - Hi Egil, The issue here is that the field "t" from the original data "studentcomplextab10k" set contains nulls. (fred hernandez,73,1.87) (fred hernandez,20,2.11) (calvin allen,60,2.49) (yuri zipper,76,2.05) So when this is stored via the AvroStorage, nulls are stored for the record. When you read it out the written avro from the previous store, it fails with a null pointer exception. The following snippet below works without any problems. {code} a = load 'studentcomplextab10k' using PigStorage() as (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)}); b = foreach a generate t; c = filter b by t is not null; store c into 'singltupleavronotnull' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); exec; b = load 'singltupleavronotnull' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); describe b; dump b; {code} Kindly note: This issue is different from PIG-2330 > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Egil Sorensen resolved PIG-3322. Resolution: Duplicate Fix Version/s: (was: 0.11.2) (was: 0.12) The test was only storing one field, and as such seems to duplicate PIG-2330. > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (18 issues) Subscriber: pigdaily Key Summary PIG-3328DataBags created with an initial list of tuples don't get registered as spillable https://issues.apache.org/jira/browse/PIG-3328 PIG-3318AVRO: 'default value' not honored when merging schemas on load with AvroStorage https://issues.apache.org/jira/browse/PIG-3318 PIG-3317disable optimizations via pig properties https://issues.apache.org/jira/browse/PIG-3317 PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3285Jobs using HBaseStorage fail to ship dependency jars https://issues.apache.org/jira/browse/PIG-3285 PIG-3258Patch to allow MultiStorage to use more than one index to generate output tree https://issues.apache.org/jira/browse/PIG-3258 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-2248Pig parser does not detect when a macro name masks a UDF name https://issues.apache.org/jira/browse/PIG-2248 PIG-2244Macros cannot be passed relation names https://issues.apache.org/jira/browse/PIG-2244 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660033#comment-13660033 ] Julien Le Dem commented on PIG-3307: https://reviews.apache.org/r/11203/diff/#index_header thanks [~cheolsoo] and [~daijy]! > Refactor physical operators to remove methods parameters that are always null > - > > Key: PIG-3307 > URL: https://issues.apache.org/jira/browse/PIG-3307 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch > > > The physical operators are sometimes overly complex. I'm trying to cleanup > some unnecessary code. > in particular there is an array of getNext(*T* v) where the value v does not > seem to have any importance and is just used to pick the correct method. > I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Refactor physical operators to remove methods parameters that are always null
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11203/ --- Review request for pig, Daniel Dai, Dmitriy Ryaboy, Cheolsoo Park, and Bill Graham. Description --- Refactor physical operators to remove methods parameters that are always null This addresses bug PIG-3307. https://issues.apache.org/jira/browse/PIG-3307 Diffs - src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MergeJoinIndexer.java d5aff3d src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigCombiner.java 6cfc8c0 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 7c499f6 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 6145214 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java fc0112a src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java 5bceca6 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/BinaryComparisonOperator.java 3e434f3 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ComparisonOperator.java 51d9f34 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java 7e4cffa src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java bdcc72b src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java a767c36 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 9cca2c3 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java b5e3c83 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java f3b5d44 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 35786c0 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java c9b3157 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java 1108846 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java 2795b78 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 294f84a src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POAnd.java f24c2ac src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java 312f3ac src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java 987cc21 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java 9ea89f7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POMapLookUp.java fd5573f src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java 8d3fcb1 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONot.java 973dfc5 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POOr.java 498eb12 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 8886df7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORegexp.java 6634915 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java e400a95 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java 1aa1671 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 167cf06 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java 495 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCollectedGroup.java a5adaf7 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java 4a58a7e src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java 30dcea2 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCross.java b90b0a2 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODemux.java e26c611 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODistinct.java ed2d39e src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java a4abdd8 sr
[jira] [Updated] (PIG-2378) macros don't accept references to items within tuples as arguments
[ https://issues.apache.org/jira/browse/PIG-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-2378: -- Attachment: PIG-2378.patch.txt this is a working patch resolving the issue. Now you don't have to use the quote hacky working around, you can use relation.filed directly as macro argument. For the test data I described above, it finally generate results {noformat} ((2,3),{(1,(2,3))}) ((2,4),{(4,(2,4))}) ((7,8),{(6,(7,8))}) {noformat} I will run full unit tests see if any regression it brings, since it touches file LogicalSchema.java, which is used by many other places. Meanwhile, improve the code efficiency as much as possible. > macros don't accept references to items within tuples as arguments > -- > > Key: PIG-2378 > URL: https://issues.apache.org/jira/browse/PIG-2378 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.9.1 >Reporter: Joseph Adler >Assignee: Johnny Zhang > Attachments: PIG-2378.patch.txt > > > I'd like to be able to pass a reference to an item within a parameter to a > Pig Macro. > For example, suppose that I had a relation A with the schema A:{id:long, > header:(time:long, type:chararray)}. I'd like to call a macro by typing: >B = MY_MACRO(A, header.time); > but this does not currently work. Obviously, I could define a new relation as > a workaround, for example I could use some pig code like > AA = FOREACH a GENERATE *, header.time as time; > B = MY_MACRO(AA, time); > But that's ugly and clunky -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat resolved PIG-3320. - Resolution: Invalid > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 2, > # storing using writer schema > # loading using reader schema with extra field that > has no default > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > -- Store Avro file w. schema > b1 = foreach a generate id, intnum5; > c1 = filter b1 by 10 <= id and id < 20; > describe c1; > dump c1; > store c1 into ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"schema" : { > "name" : "schema_writing", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > exec; > -- Read back what was stored with Avro adding extra field to reader schema > u = load ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_reading", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"string" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > describe u; > dump u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b = filter a by (10 <= id and id < 20); > c = foreach b generate id, intnum5, ''; > store c into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659803#comment-13659803 ] Viraj Bhat commented on PIG-3320: - With PIG-3321 committed, the above script throws an error which is listed in Comment 2 of this Jira. Suppose we want AvroStorage() to return an extra field "intnum100" with null instead of throwing an error in Comment 2; you have to do the following: 1) Pass with a null reader schema PigAvroDatumReader 2) Construct an mProtoTuple with field size equal to readerSchema 3) Reconcile the schemas manually by using the logic in getSchemaToMergedSchemaMap() 4) Populate mProtoTuple using the map keeping track of new to old position By doing all the above we are undoing the changes done in PIG-3321, where the readerSchema is not passed to PigAvroDatumReader(). We want Avro to handle the schema merges in this case and it does it correctly by throwing an error. Currently closing this Jira as invalid. > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 2, > # storing using writer schema > # loading using reader schema with extra field that > has no default > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > -- Store Avro file w. schema > b1 = foreach a generate id, intnum5; > c1 = filter b1 by 10 <= id and id < 20; > describe c1; > dump c1; > store c1 into ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"schema" : { > "name" : "schema_writing", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > exec; > -- Read back what was stored with Avro adding extra field to reader schema > u = load ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_reading", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"string" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > describe u; > dump u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b = filter a by (10 <= id and id < 20); > c = foreach b generate id, intnum5, ''; > store c into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3328) DataBags created with an initial list of tuples don't get registered as spillable
[ https://issues.apache.org/jira/browse/PIG-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated PIG-3328: - Fix Version/s: 0.11.2 0.12 Affects Version/s: 0.12 Status: Patch Available (was: Open) > DataBags created with an initial list of tuples don't get registered as > spillable > - > > Key: PIG-3328 > URL: https://issues.apache.org/jira/browse/PIG-3328 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1, 0.12, 0.11.2 >Reporter: Mark Wagner >Assignee: Mark Wagner > Fix For: 0.12, 0.11.2 > > Attachments: PIG-3328.1.patch > > > DefaultDataBag has a constructor to take ownership of an existing list of > tuples as its own contents, but registration for spilling only occurs when > adding elements. If a bag starts out big enough to consider spilling, but no > new tuples are added to it, it will never be spilled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3328) DataBags created with an initial list of tuples don't get registered as spillable
[ https://issues.apache.org/jira/browse/PIG-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated PIG-3328: - Affects Version/s: 0.11.2 0.11.1 > DataBags created with an initial list of tuples don't get registered as > spillable > - > > Key: PIG-3328 > URL: https://issues.apache.org/jira/browse/PIG-3328 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1, 0.11.2 >Reporter: Mark Wagner >Assignee: Mark Wagner > Attachments: PIG-3328.1.patch > > > DefaultDataBag has a constructor to take ownership of an existing list of > tuples as its own contents, but registration for spilling only occurs when > adding elements. If a bag starts out big enough to consider spilling, but no > new tuples are added to it, it will never be spilled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail
[ https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659587#comment-13659587 ] Paul Mazak commented on PIG-2684: - Better formatted solution: https://issues.apache.org/jira/browse/PIG-3015?focusedCommentId=13659573&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13659573 > :: in field name causes AvroStorage to fail > --- > > Key: PIG-2684 > URL: https://issues.apache.org/jira/browse/PIG-2684 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Fabian Alenius > > There appears to be a bug in AvroStorage which causes it to fail when there > are field names that contain :: > For example, the following will fail: > data = load 'test.txt' as (one, two); > grp = GROUP data by (one, two); > result = foreach grp generate FLATTEN(group); > > > store result into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > ERROR 2999: Unexpected internal error. Illegal character in: group::one > While the following will succeed: > data = load 'test.txt' as (one, two); > grp = GROUP data by (one, two); > result = foreach grp generate FLATTEN(group) as (one,two); > > store result into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > Here is a minimal test case: > data = load 'test.txt' as (one::two, three); > > > store data into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659573#comment-13659573 ] Paul Mazak commented on PIG-3015: - One simple workaround for us was to override AvroStorage's checkSchema this way. {code} /** * In Pig script do: * REGISTER 'lib/this.jar' * DEFINE AvroStorage com.this.JoinableAvroStorage; */ public class JoinableAvroStorage extends AvroStorage { @Override public void checkSchema(ResourceSchema s) throws IOException { try { super.checkSchema(s); } catch (SchemaParseException spe) { ResourceFieldSchema[] pigFields = s.getFields(); for (int i = 0; i < pigFields.length; i++) { String outname = pigFields[i].getName(); if (outname.contains("::")) { String newOutname = outname.split("::")[1]; pigFields[i].setName(newOutname); } } super.checkSchema(s); } } } {code} > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: bad.avro, good.avro, PIG-3015-10.patch, > PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, > PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, > PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, > with_dates.pig > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail
[ https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659566#comment-13659566 ] Paul Mazak commented on PIG-2684: - One simple workaround for us was to override AvroStorage's checkSchema this way. /** * In Pig script do: * REGISTER 'lib/this.jar' * DEFINE AvroStorage com.this.JoinableAvroStorage; */ public class JoinableAvroStorage extends AvroStorage { @Override public void checkSchema(ResourceSchema s) throws IOException { try { super.checkSchema(s); } catch (SchemaParseException spe) { ResourceFieldSchema[] pigFields = s.getFields(); for (int i = 0; i < pigFields.length; i++) { String outname = pigFields[i].getName(); if (outname.contains("::")) { String newOutname = outname.split("::")[1]; pigFields[i].setName(newOutname); } } super.checkSchema(s); } } } > :: in field name causes AvroStorage to fail > --- > > Key: PIG-2684 > URL: https://issues.apache.org/jira/browse/PIG-2684 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Fabian Alenius > > There appears to be a bug in AvroStorage which causes it to fail when there > are field names that contain :: > For example, the following will fail: > data = load 'test.txt' as (one, two); > grp = GROUP data by (one, two); > result = foreach grp generate FLATTEN(group); > > > store result into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > ERROR 2999: Unexpected internal error. Illegal character in: group::one > While the following will succeed: > data = load 'test.txt' as (one, two); > grp = GROUP data by (one, two); > result = foreach grp generate FLATTEN(group) as (one,two); > > store result into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > Here is a minimal test case: > data = load 'test.txt' as (one::two, three); > > > store data into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira