[ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat reopened PIG-3322:
-----------------------------


Hi Egil,
 The issue here is that the field "t" from the original data 
"studentcomplextab10k" set contains nulls. 
(fred hernandez,73,1.87)
(fred hernandez,20,2.11)

(calvin allen,60,2.49)
(yuri zipper,76,2.05)


So when this is stored via the AvroStorage, nulls are stored for the record.

When you read it out the written avro from the previous store, it fails with a 
null pointer exception.

The following snippet below works without any problems.
{code}
a = load 'studentcomplextab10k' using PigStorage() as (m:[], t:(name:chararray, 
age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)});
b = foreach a generate t;
c = filter b by t is not null;
store c into 'singltupleavronotnull' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();
exec;
b = load 'singltupleavronotnull' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();
describe b;
dump b;
{code}

Kindly note: This issue is different from PIG-2330 

                
> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -------------------------------------------------------------------------
>
>                 Key: PIG-3322
>                 URL: https://issues.apache.org/jira/browse/PIG-3322
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.11.2
>            Reporter: Egil Sorensen
>            Assignee: Viraj Bhat
>              Labels: patch
>
> I am getting NPE when loading a file with AvroStorage a file that has schema 
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
>  from Pig Field 
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated 
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
>                         {
>                         'num' => 4,
>                         # storing file with Pig type tuple relying on 
> conversion to record
>                         # loading using stored schemas 
>                         'notmq' => 1,
>                         'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
>                         'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
>                         },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to