[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Egil Sorensen updated PIG-3322: ------------------------------- Description: I am getting NPE when loading a file with AvroStorage a file that has schema like: {code} ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated from Pig Field Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated from Pig Field Schema"}]}] {code} E.g. see the e2e style test, which fails on this: {code} { 'num' => 4, # storing file with Pig type tuple relying on conversion to record # loading using stored schemas 'notmq' => 1, 'pig' => q\ a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)}); b = foreach a generate t; describe b; store b into ':OUTPATH:.intermediate' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); exec; -- Read back what was stored with Avro u = load ':OUTPATH:.intermediate' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); describe u; store u into ':OUTPATH:'; \, 'verify_pig_script' => q\ a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)}); b = foreach a generate t; describe b; store b into ':OUTPATH:'; \, }, {code} was: Somewhat different use case than PIG-3318: Loading with AvroStorage giving a loader schema that relative to the schema in the Avro file had an extra filed w/o default and expected to see an extra empty column, but the schema is as in the avro file w/o the extra column. E.g. see the e2e style test, which fails on this: {code} { 'num' => 2, # storing using writer schema # loading using reader schema with extra field that has no default 'notmq' => 1, 'pig' => q\ a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: float,doublenum: double); -- Store Avro file w. schema b1 = foreach a generate id, intnum5; c1 = filter b1 by 10 <= id and id < 20; describe c1; dump c1; store c1 into ':OUTPATH:.intermediate_1' USING org.apache.pig.piggybank.storage.avro.AvroStorage(' { "schema" : { "name" : "schema_writing", "type" : "record", "fields" : [ { "name" : "id", "type" : [ "null", "int" ] }, { "name" : "intnum5", "type" : [ "null", "int" ] } ] } } '); exec; -- Read back what was stored with Avro adding extra field to reader schema u = load ':OUTPATH:.intermediate_1' USING org.apache.pig.piggybank.storage.avro.AvroStorage(' { "debug" : 5, "schema" : { "name" : "schema_reading", "type" : "record", "fields" : [ { "name" : "id", "type" : [ "null", "int" ] }, { "name" : "intnum5", "type" : [ "null", "string" ] }, { "name" : "intnum100", "type" : [ "null", "int" ] } ] } } '); describe u; dump u; store u into ':OUTPATH:'; \, 'verify_pig_script' => q\ a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: float,doublenum: double); b = filter a by (10 <= id and id < 20); c = foreach b generate id, intnum5, ''; store c into ':OUTPATH:'; \, }, {code} > AVRO: AvroStorage give NPE on reading file with union as top level schema > ------------------------------------------------------------------------- > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank > Affects Versions: 0.11.2 > Reporter: Egil Sorensen > Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira