[
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974192#comment-13974192
]
Anthony Hsu commented on HIVE-6835:
-----------------------------------
I'm guessing the schema was specified in the SERDEPROPERTIES to work around
HIVE-3953. However, one issue with storing the schema in TBLPROPERTIES instead
is that for partitioned tables, when you do a {{describe \[extended]
<table_name> partition(...);}}, you get
{code}
error_error_error_error_error_error_error string from
deserializer
cannot_determine_schema string from deserializer
check string from deserializer
schema string from deserializer
url string from deserializer
and string from deserializer
literal string from deserializer
{code}
because the AvroSerDe cannot find "avro.schema.literal" or "avro.schema.url".
If you store the schema in SERDEPROPERTIES, you don't get this issue, since the
SERDEPROPERTIES get copied to the partition when it is created.
I do think it is useful to make both the table-level properties and the
partition-level properties available separately to the SerDe when it's doing
its .initalize(). The SerDe should be able to decide which set of properties
it wants to use. From this point of view, I think my change is still useful and
valid.
> Reading of partitioned Avro data fails if partition schema does not match
> table schema
> --------------------------------------------------------------------------------------
>
> Key: HIVE-6835
> URL: https://issues.apache.org/jira/browse/HIVE-6835
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Anthony Hsu
> Assignee: Anthony Hsu
> Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch
>
>
> To reproduce:
> {code}
> create table testarray (a array<string>);
> load data local inpath '/home/ahsu/test/array.txt' into table testarray;
> # create partitioned Avro table with one array column
> create table avroarray partitioned by (y string) row format serde
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties
> ('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
> "record", "fields": [ { "name":"a", "type":{"type":"array","items":"string"}
> } ] }') STORED as INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
> insert into table avroarray partition(y=1) select * from testarray;
> # add an int column with a default value of 0
> alter table avroarray set serde
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with
> serdeproperties('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
> "record", "fields": [ {"name":"intfield","type":"int","default":0},{
> "name":"a", "type":{"type":"array","items":"string"} } ] }');
> # fails with ClassCastException
> select * from avroarray;
> {code}
> The select * fails with:
> {code}
> Failed with exception java.io.IOException:java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
> cannot be cast to
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)