[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

Anthony Hsu (JIRA) Thu, 17 Apr 2014 17:22:51 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973615#comment-13973615
 ]


Anthony Hsu commented on HIVE-6835:
-----------------------------------

What happens is Hive tries to build ObjectInspectorConverters from the 
partition schema to the table schema.  If the partition schema is different 
from the table schema, you may get a ClassCastException like above.

When you add new columns at the end, this is not a problem because these new 
columns are chopped off.  See ObjectInspectorConverters:StructConverter:
{code}
int minFields = Math.min(inputFields.size(), outputFields.size());
fieldConverters = new ArrayList<Converter>(minFields);
{code}
It's only when you insert new columns at the beginning or in the middle that 
you might run into ClassCastExceptions.

For the AvroSerDe, if it always uses the latest schema (which should be the 
table-level schema), Hive will not get confused when constructing its 
ObjectInspectorConverters.  Then, later, when the AvroSerDe actually goes to 
read the Avro files, it can compare the latest schema with the (possibly old) 
schemas stored in the Avro data files themselves, and do the proper schema 
resolution, omitting fields or substituting default values, following the 
[schema resolution 
rules|http://avro.apache.org/docs/current/spec.html#Schema+Resolution].

> Reading of partitioned Avro data fails if partition schema does not match 
> table schema
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-6835
>                 URL: https://issues.apache.org/jira/browse/HIVE-6835
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>         Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch
>
>
> To reproduce:
> {code}
> create table testarray (a array<string>);
> load data local inpath '/home/ahsu/test/array.txt' into table testarray;
> # create partitioned Avro table with one array column
> create table avroarray partitioned by (y string) row format serde 
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties 
> ('avro.schema.literal'='{"namespace":"test","name":"avroarray","type": 
> "record", "fields": [ { "name":"a", "type":{"type":"array","items":"string"} 
> } ] }')  STORED as INPUTFORMAT  
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
> insert into table avroarray partition(y=1) select * from testarray;
> # add an int column with a default value of 0
> alter table avroarray set serde 
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with 
> serdeproperties('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
>  "record", "fields": [ {"name":"intfield","type":"int","default":0},{ 
> "name":"a", "type":{"type":"array","items":"string"} } ] }');
> # fails with ClassCastException
> select * from avroarray;
> {code}
> The select * fails with:
> {code}
> Failed with exception java.io.IOException:java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6835) Reading of partitioned Avro data fails if partition schema does not match table schema

Reply via email to