[ 
https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659803#comment-13659803
 ] 

Viraj Bhat commented on PIG-3320:
---------------------------------

With PIG-3321 committed, the above script throws an error which is listed in 
Comment 2 of this Jira.

Suppose we want AvroStorage() to return an extra field "intnum100" with null 
instead of throwing an error in Comment 2; you have to do the following:
1) Pass with a null reader schema PigAvroDatumReader
2) Construct an mProtoTuple with field size equal to readerSchema
3) Reconcile the schemas manually by using the logic in 
getSchemaToMergedSchemaMap() 
4) Populate mProtoTuple using the map keeping track of new to old position

By doing all the above we are undoing the changes done in PIG-3321, where the 
readerSchema is not passed to PigAvroDatumReader(). We want Avro to handle the 
schema merges in this case and it does it correctly by throwing an error.

Currently closing this Jira as invalid.
                
> AVRO: no empty field expressed when loading with AvroStorage using reader 
> schema with extra field that has no default
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3320
>                 URL: https://issues.apache.org/jira/browse/PIG-3320
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.11.2
>            Reporter: Egil Sorensen
>            Assignee: Viraj Bhat
>              Labels: patch
>             Fix For: 0.12, 0.11.2
>
>
> Somewhat different use case than PIG-3318:
> Loading with AvroStorage giving a loader schema that relative to the schema 
> in the Avro file had an extra filed w/o default and expected to see an extra 
> empty column, but the schema is as in the avro file w/o the extra column.
> E.g. see the e2e style test, which fails on this:
> {code}
>                         {
>                         'num' => 2,
>                         # storing using writer schema
>                         # loading using reader schema with extra field that 
> has no default
>                         'notmq' => 1,
>                         'pig' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> -- Store Avro file w. schema
> b1 = foreach a generate id, intnum5;
> c1 = filter b1 by 10 <= id and id < 20;
> describe c1;
> dump c1;
> store c1 into ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>    "schema" : {  
>       "name" : "schema_writing",
>       "type" : "record",
>       "fields" : [
>          {  
>             "name" : "id",
>             "type" : [
>                "null",
>                "int"
>             ]
>          },
>          {  
>             "name" : "intnum5",
>             "type" : [
>                "null",
>                "int"
>             ]
>          }
>       ]
>    }
> }
> ');
> exec;
> -- Read back what was stored with Avro adding extra field to reader schema
> u = load ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>    "debug" : 5,
>    "schema" : {  
>       "name" : "schema_reading",
>       "type" : "record",
>       "fields" : [
>          {  
>             "name" : "id",
>             "type" : [
>                "null",
>                "int"
>             ]
>          },
>          {  
>             "name" : "intnum5",
>             "type" : [
>                "null",
>                "string"
>             ]
>          },
>          {
>             "name" : "intnum100",
>             "type" : [
>                "null",
>                "int"
>             ]
>          }
>       ]
>    }
> }
> ');
> describe u;
> dump u;
> store u into ':OUTPATH:';
> \,
>                         'verify_pig_script' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> b = filter a by (10 <= id and id < 20);
> c = foreach b generate id, intnum5, '';
> store c into ':OUTPATH:';
> \,
>                         },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to