[ 
https://issues.apache.org/jira/browse/PIG-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Talas updated PIG-4423:
------------------------------
    Description: 
Pig does not validate Avro schema when using AvroStorage(). I tried to validate 
schema both by adding schema_file input parameter and by providing schema 
explicitly as well. Both cases Avro file received the schema of Pig data set 
instead of validating schema from Avro file. When i have used the same Avro 
schema for Hive, it validated data successfully (if data has different schema 
compared to Avro then threw an error)

{code}
store data into '$TARGET'
USING AvroStorage(
'schema', '{
"type": "record",
"name": "test",
"fields": [
{"name": "partner_name", "type": "string"},
{"name": "partner_id", "type": "int"},
{"name": "name", "type": "string"} ,
{"name": "id", "type": "int"}
]
}');
{code}

or

{code}
STORE data INTO '$TARGET' 
USING AvroStorage('schema_file','$AVRO_SCHEMA');
{code}

I have registered the following jars (downloaded from Maven repo)

{code}
REGISTER piggybank-0.12.0.jar;
REGISTER avro-1.7.7.jar;
REGISTER avro-mapred-1.7.7.jar;
REGISTER json-simple-1.1.1.jar;
{code}


  was:
Pig does not validate Avro schema when using AvroStorage(). I tried to validate 
schema both by adding schema_file input parameter and by providing schema 
explicitly as well. Both cases Avro file received the schema of Pig data set 
instead of validating schema from Avro file. When i have used the same Avro 
schema for Hive, it validated data successfully (if data has different schema 
compared to Avro then threw an error)

{code}
store data into '$TARGET'
USING AvroStorage(
'schema', '{
"type": "record",
"name": "test",
"fields": [
{"name": "partner_name", "type": "string"},
{"name": "partner_id", "type": "int"},
{"name": "name", "type": "string"} ,
{"name": "id", "type": "int"}
]
}');
{code}

or

{code}
STORE data INTO '$TARGET' 
USING AvroStorage('schema_file','$AVRO_SCHEMA');
{code}

I have registered the following jars (downloaded from Maven repo)

REGISTER piggybank-0.12.0.jar;
REGISTER avro-1.7.7.jar;
REGISTER avro-mapred-1.7.7.jar;
REGISTER json-simple-1.1.1.jar;




> AvroStorage() does not validate schema at storing.
> --------------------------------------------------
>
>                 Key: PIG-4423
>                 URL: https://issues.apache.org/jira/browse/PIG-4423
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.12.0
>         Environment: EMR AMI 3.3.2
>            Reporter: Zoltan Talas
>
> Pig does not validate Avro schema when using AvroStorage(). I tried to 
> validate schema both by adding schema_file input parameter and by providing 
> schema explicitly as well. Both cases Avro file received the schema of Pig 
> data set instead of validating schema from Avro file. When i have used the 
> same Avro schema for Hive, it validated data successfully (if data has 
> different schema compared to Avro then threw an error)
> {code}
> store data into '$TARGET'
> USING AvroStorage(
> 'schema', '{
> "type": "record",
> "name": "test",
> "fields": [
> {"name": "partner_name", "type": "string"},
> {"name": "partner_id", "type": "int"},
> {"name": "name", "type": "string"} ,
> {"name": "id", "type": "int"}
> ]
> }');
> {code}
> or
> {code}
> STORE data INTO '$TARGET' 
> USING AvroStorage('schema_file','$AVRO_SCHEMA');
> {code}
> I have registered the following jars (downloaded from Maven repo)
> {code}
> REGISTER piggybank-0.12.0.jar;
> REGISTER avro-1.7.7.jar;
> REGISTER avro-mapred-1.7.7.jar;
> REGISTER json-simple-1.1.1.jar;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to